Introduction
In this post I will present example SPARQL queries against the Pseudomonas fluorescens SBW25 knowledge graph (SBW25KG). The knowlegde graph was derived from the manually created annotation in gff3 format, as explained in a previous post.
The queries are run against a local instance of the apache-jena-fuseki triplestore. First, I set the endpoint URL and the maximum number of returned records:
%endpoint http://micropop046:3030/plu/
%show 50
Retrieve 10 CDS’s from the SBW25KG.
In this first example, I query for CDS features, list their primary name, locus tag, and protein ID.
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX gffo: <https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#>
PREFIX so: <http://purl.obolibrary.org/obo/SO_>
SELECT DISTINCT * WHERE {
?cds a so:0000316 ,
gffo:Feature ;
gffo:primary_name ?primary_name ;
gffo:locus_tag ?locus_tag ;
gffo:protein_id ?protein .
}
ORDER BY ?locus_tag
LIMIT 10
cds | primary_name | locus_tag | protein |
---|---|---|---|
http://pflu.evolbio.mpg.de/bio_data/14291 | macB | PFLU_2556 | CAY48791.1 |
http://pflu.evolbio.mpg.de/bio_data/14293 | mdeA | PFLU_2558 | CAY48793.1 |
http://pflu.evolbio.mpg.de/bio_data/14298 | potF | PFLU_2564 | CAY48798.1 |
http://pflu.evolbio.mpg.de/bio_data/14314 | idh | PFLU_2580 | CAY48814.1 |
http://pflu.evolbio.mpg.de/bio_data/14326 | proX | PFLU_2592 | CAY48826.1 |
http://pflu.evolbio.mpg.de/bio_data/14327 | bfrG | PFLU_2593 | CAY48827.1 |
http://pflu.evolbio.mpg.de/bio_data/14332 | bfrH | PFLU_2598 | CAY48832.1 |
http://pflu.evolbio.mpg.de/bio_data/14334 | cynR | PFLU_2600 | CAY48834.1 |
http://pflu.evolbio.mpg.de/bio_data/14342 | yaaX | PFLU_2608 | CAY48842.1 |
http://pflu.evolbio.mpg.de/bio_data/14348 | sohB | PFLU_2614 | CAY48848.1 |
A click on the CDS URL would open the respective page on the Pseudomonas fluorescens SBW25 genome database
Which properties are listed for CDS features?
One of my first exploratory queries against new (to me) SPARQL endpoints is to list the porperties of a given node type. Here, I list all properties of the gffo:Feature
subject class.
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX gffo: <https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#>
PREFIX so: <http://purl.obolibrary.org/obo/SO_>
SELECT DISTINCT ?prop WHERE {
?cds a so:0000316 ,
gffo:Feature ;
?prop ?val .
}
Retrieve uniprot entry via federated query using protein id to get hold of the uniprot graph.
Here we use the protein ID to link a node on the SBW25KG to the UniProt KG. On the UniProt KG, the protein ID is linked to the protein (type up:Protein
) node through the property rdfs:seeAlso
and it’s value is a URI anchored on the emblcds
prefix http://purl.uniprot.org/embl-cds/. Hence, we need to construct the full emblcds
URI from the variable ?protein_id
. This is achieved by concatenating the emblcds:
prefix (converted to str
) and ?protein_id
, casting the resulting string to a URI and bind that URI to the variable ?emblcdsuri
. The query than matches against the UniProt node that has emblcdsuri
as a rdfs:seeAlso
property value. To limit the search on the UniProt side, we also demand that the UniProt node is connected to the up:organism
resource that designates Pseudomonas fluorescens SBW25 (taxon:216595
). We list the first 25 CDSs.
PREFIX up: <http://purl.uniprot.org/core/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX gffo: <https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#>
PREFIX so: <http://purl.obolibrary.org/obo/SO_>
PREFIX emblcds: <http://purl.uniprot.org/embl-cds/>
PREFIX upbase: <http://purl.uniprot.org/uniprot/>
PREFIX taxon: <http://purl.uniprot.org/taxonomy/>
SELECT DISTINCT ?cds ?id ?protein_id ?protein
WHERE {
?cds a so:0000316 ;
gffo:ID ?id ;
gffo:protein_id ?protein_id .
BIND(IRI(CONCAT(STR(emblcds:), ?protein_id)) AS ?emblcdsuri)
SERVICE <http://sparql.uniprot.org/sparql> {
?protein a up:Protein .
?protein up:organism taxon:216595 .
?protein rdfs:seeAlso ?emblcdsuri .
?protein rdfs:seeAlso ?xref .
}
}
LIMIT 25
Query the UniProt annotations for a given PFLU SBW25 CDS.
Using the same strategy as before, we perform a federated query on the UniProt KG and list the rdfs:comment
property values for all up:annotation
property values.
PREFIX up: <http://purl.uniprot.org/core/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX gffo: <https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#>
PREFIX so: <http://purl.obolibrary.org/obo/SO_>
PREFIX emblcds: <http://purl.uniprot.org/embl-cds/>
PREFIX taxon: <http://purl.uniprot.org/taxonomy/>
SELECT DISTINCT ?cds ?protein ?comment
WHERE {
?cds a so:0000316 ;
gffo:ID ?id ;
gffo:protein_id ?protein_id .
BIND(IRI(CONCAT(STR(emblcds:), ?protein_id)) AS ?emblcdsuri)
SERVICE <http://sparql.uniprot.org/sparql> {
?protein a up:Protein .
?protein up:organism taxon:216595 .
?protein rdfs:seeAlso ?emblcdsuri .
?protein up:annotation ?annotation .
?annotation rdfs:comment ?comment
}
}
LIMIT 25
cds | protein | comment |
---|---|---|
http://pflu.evolbio.mpg.de/bio_data/11845 | http://purl.uniprot.org/uniprot/B0B0A5 | Polar residues |
http://pflu.evolbio.mpg.de/bio_data/11845 | http://purl.uniprot.org/uniprot/B0B0A5 | ATP |
http://pflu.evolbio.mpg.de/bio_data/11845 | http://purl.uniprot.org/uniprot/B0B0A5 | Disordered |
http://pflu.evolbio.mpg.de/bio_data/11845 | http://purl.uniprot.org/uniprot/B0B0A5 | Plays an important role in the initiation and regulation of chromosomal replication. Binds to the origin of replication; it binds specifically double-stranded DNA at a 9 bp consensus (dnaA box): 5'-TTATC[CA]A[CA]A-3'. DnaA binds to ATP and to acidic phospholipids. |
http://pflu.evolbio.mpg.de/bio_data/11845 | http://purl.uniprot.org/uniprot/B0B0A5 | Belongs to the DnaA family. |
http://pflu.evolbio.mpg.de/bio_data/11845 | http://purl.uniprot.org/uniprot/B0B0A5 | Chromosomal replication initiator protein DnaA |
http://pflu.evolbio.mpg.de/bio_data/11846 | http://purl.uniprot.org/uniprot/C3KDU3 | Confers DNA tethering and processivity to DNA polymerases and other proteins. Acts as a clamp, forming a ring around DNA (a reaction catalyzed by the clamp-loading complex) which diffuses in an ATP-independent manner freely and bidirectionally along dsDNA. Initially characterized for its ability to contact the catalytic subunit of DNA polymerase III (Pol III), a complex, multichain enzyme responsible for most of the replicative synthesis in bacteria; Pol III exhibits 3'-5' exonuclease proofreading activity. The beta chain is required for initiation of replication as well as for processivity of DNA replication. |
http://pflu.evolbio.mpg.de/bio_data/11846 | http://purl.uniprot.org/uniprot/C3KDU3 | Forms a ring-shaped head-to-tail homodimer around DNA. |
http://pflu.evolbio.mpg.de/bio_data/11846 | http://purl.uniprot.org/uniprot/C3KDU3 | DNA_pol3_beta_3 |
http://pflu.evolbio.mpg.de/bio_data/11846 | http://purl.uniprot.org/uniprot/C3KDU3 | DNA_pol3_beta_2 |
http://pflu.evolbio.mpg.de/bio_data/11846 | http://purl.uniprot.org/uniprot/C3KDU3 | DNA_pol3_beta |
http://pflu.evolbio.mpg.de/bio_data/11846 | http://purl.uniprot.org/uniprot/C3KDU3 | Belongs to the beta sliding clamp family. |
http://pflu.evolbio.mpg.de/bio_data/11847 | http://purl.uniprot.org/uniprot/C3KDU4 | DNA replication and repair protein RecF |
http://pflu.evolbio.mpg.de/bio_data/11847 | http://purl.uniprot.org/uniprot/C3KDU4 | Belongs to the RecF family. |
http://pflu.evolbio.mpg.de/bio_data/11847 | http://purl.uniprot.org/uniprot/C3KDU4 | ATP |
http://pflu.evolbio.mpg.de/bio_data/11847 | http://purl.uniprot.org/uniprot/C3KDU4 | The RecF protein is involved in DNA metabolism; it is required for DNA replication and normal SOS inducibility. RecF binds preferentially to single-stranded, linear DNA. It also seems to bind ATP. |
http://pflu.evolbio.mpg.de/bio_data/11848 | http://purl.uniprot.org/uniprot/C3KDU5 | Toprim |
http://pflu.evolbio.mpg.de/bio_data/11848 | http://purl.uniprot.org/uniprot/C3KDU5 | Heterotetramer, composed of two GyrA and two GyrB chains. In the heterotetramer, GyrA contains the active site tyrosine that forms a transient covalent intermediate with DNA, while GyrB binds cofactors and catalyzes ATP hydrolysis. |
http://pflu.evolbio.mpg.de/bio_data/11848 | http://purl.uniprot.org/uniprot/C3KDU5 | A type II topoisomerase that negatively supercoils closed circular double-stranded (ds) DNA in an ATP-dependent manner to modulate DNA topology and maintain chromosomes in an underwound state. Negative supercoiling favors strand separation, and DNA replication, transcription, recombination and repair, all of which involve strand separation. Also able to catalyze the interconversion of other topological isomers of dsDNA rings, including catenanes and knotted rings. Type II topoisomerases break and join 2 DNA strands simultaneously in an ATP-dependent manner. |
http://pflu.evolbio.mpg.de/bio_data/11848 | http://purl.uniprot.org/uniprot/C3KDU5 | Belongs to the type II topoisomerase family. |
http://pflu.evolbio.mpg.de/bio_data/11848 | http://purl.uniprot.org/uniprot/C3KDU5 | Belongs to the type II topoisomerase GyrB family. |
http://pflu.evolbio.mpg.de/bio_data/11848 | http://purl.uniprot.org/uniprot/C3KDU5 | Few gyrases are as efficient as E.coli at forming negative supercoils. Not all organisms have 2 type II topoisomerases; in organisms with a single type II topoisomerase this enzyme also has to decatenate newly replicated chromosomes. |
http://pflu.evolbio.mpg.de/bio_data/11849 | http://purl.uniprot.org/uniprot/C3KDU6 | 4-aspartylphosphate |
http://pflu.evolbio.mpg.de/bio_data/11849 | http://purl.uniprot.org/uniprot/C3KDU6 | OmpR/PhoB-type |
http://pflu.evolbio.mpg.de/bio_data/11849 | http://purl.uniprot.org/uniprot/C3KDU6 | Response regulatory |
Find proteins that have the same annotation as wssH
Next, we try to find proteins that may have a similar function as a focus gene in Pflu SBW25. Here, we connect to other proteins through the up:annotation
property.
PREFIX up: <http://purl.uniprot.org/core/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX gffo: <https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#>
PREFIX so: <http://purl.obolibrary.org/obo/SO_>
PREFIX emblcds: <http://purl.uniprot.org/embl-cds/>
PREFIX taxon: <http://purl.uniprot.org/taxonomy/>
SELECT DISTINCT ?id ?protein_id ?protein ?annotation ?protein2 ?comment
WHERE {
?cds a so:0000316 ;
gffo:ID ?id .
FILTER(REGEX(?id, 'CDS:PFLU_0307-0'))
?cds gffo:protein_id ?protein_id .
BIND(IRI(CONCAT(STR(emblcds:), ?protein_id)) AS ?emblcdsuri)
SERVICE <http://sparql.uniprot.org/sparql> {
?protein a up:Protein .
?protein up:organism taxon:216595 .
?protein rdfs:seeAlso ?emblcdsuri .
?protein up:annotation ?annotation .
?protein2 up:annotation ?annotation .
?annotation rdfs:comment ?comment .
}
}
LIMIT 25
This is not exactly what I wanted, since the queried “other” protein (?protein2
) is identical to the focal protein.
It seems that the annotation ID encodes the protein ID itself, so every annotation is unique to a given protein.
Now, instead of matching the up:annotation
property value, we try to match the annotation comment. All but one annotation comments say “Helical”, which is probably
too generic to serve as a matching criterion. So let’s focus on the comment “Belongs to the membrane-bound acyltransferase family” and see if we find other proteins that share this annotation comment. The query below selects proteins that share the annotation comment with our focal protein and belong to the Pseudomonas genus. We list only the scientific name of the organism and the protein ID of the selected protein on the UniProt KG.
PREFIX up: <http://purl.uniprot.org/core/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX gffo: <https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#>
PREFIX so: <http://purl.obolibrary.org/obo/SO_>
PREFIX emblcds: <http://purl.uniprot.org/embl-cds/>
PREFIX upbase: <http://purl.uniprot.org/uniprot/>
PREFIX taxon: <http://purl.uniprot.org/taxonomy/>
SELECT DISTINCT ?protein2 ?scName
WHERE {
?cds a so:0000316 ;
gffo:ID ?id .
FILTER(REGEX(?id, 'CDS:PFLU_0307-0'))
?cds gffo:protein_id ?protein_id .
BIND(IRI(CONCAT(STR(emblcds:), ?protein_id)) AS ?emblcdsuri)
SERVICE <http://sparql.uniprot.org/sparql> {
?protein a up:Protein .
?protein up:organism taxon:216595 .
?protein rdfs:seeAlso ?emblcdsuri .
?protein up:annotation ?annotation .
?annotation rdfs:comment ?antncmnt .
FILTER(REGEX(?antncmnt, "^Belongs"))
?protein2 up:annotation ?annotation2 .
?annotation2 rdfs:comment ?antncmnt .
?protein2 up:organism ?taxon .
?taxon up:scientificName ?scName .
FILTER(REGEX(?scName, "Pseudomonas"))
}
}
LIMIT 25
Find citations from UniProt KB.
The same could also be done by searching proteins that share the GO term of our focal protein. GO terms are connected to protein subjects through the up:classifiedWith
property.
PREFIX up: <http://purl.uniprot.org/core/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX gffo: <https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#>
PREFIX so: <http://purl.obolibrary.org/obo/SO_>
PREFIX emblcds: <http://purl.uniprot.org/embl-cds/>
PREFIX upbase: <http://purl.uniprot.org/uniprot/>
PREFIX taxon: <http://purl.uniprot.org/taxonomy/>
SELECT DISTINCT *
WHERE {
?cds a so:0000316 ;
gffo:ID ?id .
FILTER(REGEX(?id, 'CDS:PFLU_0307-0'))
?cds gffo:protein_id ?protein_id .
BIND(IRI(CONCAT(STR(emblcds:), ?protein_id)) AS ?emblcdsuri)
SERVICE <http://sparql.uniprot.org/sparql> {
?protein a up:Protein .
?protein up:organism taxon:216595 .
# Connect to up protein via protein id (as uri).
?protein rdfs:seeAlso ?emblcdsuri .
# Get GO terms.
?protein up:classifiedWith ?keyword .
}
}
LIMIT 10
Find papers referenced by proteins that share a GO or keyword with wssH.
Since we now know the Uniprot ID of our focal protein, we may as well submit our query directly to the Uniprot SPARQL endpoint.
%endpoint https://sparql.uniprot.org/sparql
%show 100
In this query, we list all citations of proteins that share a GO term with C3K6B6 (the WssH protein). In addition, we also list the encoding gene’s name and locus tag.
PREFIX up: <http://purl.uniprot.org/core/>
PREFIX uniprotkb: <http://purl.uniprot.org/uniprot/>
PREFIX taxon: <http://purl.uniprot.org/taxonomy/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dc: <http://purl.org/dc/terms/>
SELECT DISTINCT ?scName ?title ?protein2_recName ?gene2_label ?gene2_locus ?url WHERE {
uniprotkb:C3K6B6 up:classifiedWith ?annotation .
?annotation rdfs:comment ?comment .
?protein2 a up:Protein .
?protein2 rdfs:label ?protein2_recName .
?protein2 up:encodedBy ?gene2 .
?gene2 skos:prefLabel ?gene2_label ;
up:locusName ?gene2_locus.
?protein2 up:classifiedWith ?annotation .
?protein2 up:organism ?taxon .
?taxon up:scientificName ?scName .
FILTER(REGEX(?scName, "^Pseudomonas"))
?protein2 up:citation ?pub .
?pub up:title ?title .
?pub dc:identifier ?doi .
BIND(IRI(CONCAT("https://dx.doi.org/", ?doi)) as ?url )
}
LIMIT 10
scName | title | protein2_recName | gene2_label | gene2_locus | url |
---|---|---|---|---|---|
Pseudomonas aeruginosa (strain ATCC 15692 / DSM 22644 / CIP 104116 / JCM 14847 / LMG 12228 / 1C / PRS 101 / PAO1) | Pseudomonas aeruginosa Genome Database and PseudoCAP: facilitating community-based, continually updated, genome annotation. | Coat protein B of bacteriophage Pf1 | coaB | PA0723 | https://dx.doi.org/doi:10.1093/nar/gki047 |
Pseudomonas aeruginosa (strain ATCC 15692 / DSM 22644 / CIP 104116 / JCM 14847 / LMG 12228 / 1C / PRS 101 / PAO1) | Complete genome sequence of Pseudomonas aeruginosa PAO1, an opportunistic pathogen. | Coat protein B of bacteriophage Pf1 | coaB | PA0723 | https://dx.doi.org/doi:10.1038/35023079 |
Pseudomonas fluorescens (strain SBW25) | Genomic and genetic analyses of diversity and plant interactions of Pseudomonas fluorescens. | Cellulose synthase catalytic subunit [UDP-forming] | bcsA | PFLU_0301 | https://dx.doi.org/doi:10.1186/gb-2009-10-5-r51 |
Pseudomonas fluorescens (strain SBW25) | Genomic and genetic analyses of diversity and plant interactions of Pseudomonas fluorescens. | Cyclic di-GMP-binding protein | bcsB | PFLU_0302 | https://dx.doi.org/doi:10.1186/gb-2009-10-5-r51 |
Pseudomonas fluorescens (strain SBW25) | Adaptive divergence in experimental populations of Pseudomonas fluorescens. I. Genetic and phenotypic bases of wrinkly spreader fitness. | Cellulose synthase catalytic subunit [UDP-forming] | bcsA | PFLU_0301 | https://dx.doi.org/doi:10.1093/genetics/161.1.33 |
Pseudomonas fluorescens (strain SBW25) | Adaptive divergence in experimental populations of Pseudomonas fluorescens. I. Genetic and phenotypic bases of wrinkly spreader fitness. | Cyclic di-GMP-binding protein | bcsB | PFLU_0302 | https://dx.doi.org/doi:10.1093/genetics/161.1.33 |
Pseudomonas aeruginosa (strain ATCC 15692 / DSM 22644 / CIP 104116 / JCM 14847 / LMG 12228 / 1C / PRS 101 / PAO1) | Genome diversity of Pseudomonas aeruginosa PAO1 laboratory strains. | Coat protein B of bacteriophage Pf1 | coaB | PA0723 | https://dx.doi.org/doi:10.1128/jb.01515-09 |
Pseudomonas stutzeri (strain ATCC 17588 / DSM 5190 / CCUG 11256 / JCM 5965 / LMG 11199 / NBRC 14165 / NCIMB 11358 / Stanier 221) | Complete genome sequence of the type strain Pseudomonas stutzeri CGMCC 1.1803. | Sulfate transporter CysZ | cysZ | PSTAB_1036 | https://dx.doi.org/doi:10.1128/jb.06061-11 |
Pseudomonas stutzeri (strain ATCC 17588 / DSM 5190 / CCUG 11256 / JCM 5965 / LMG 11199 / NBRC 14165 / NCIMB 11358 / Stanier 221) | Complete Genome Sequence of the Type Strain Pseudomonas stutzeri CGMCC 1.1803. | Sulfate transporter CysZ | cysZ | PSTAB_1036 | https://dx.doi.org/doi:10.1128/jb.06061-11 |
Pseudomonas stutzeri (strain ATCC 17588 / DSM 5190 / CCUG 11256 / JCM 5965 / LMG 11199 / NBRC 14165 / NCIMB 11358 / Stanier 221) | Complete genome sequence of the type strain Pseudomonas stutzeri CGMCC 1.1803. | Amino acid transporter LysE | lysE | PSTAB_3885 | https://dx.doi.org/doi:10.1128/jb.06061-11 |
Now this is interesting: The UniProt protein encoded by Pflu SBW252’s wssH gene shares GO terms with the genes bcsA and bcsB which have the same locus tag as wssA and wssB. In other words, we have just found alternative names for wssA and wssB.
All properties of a protein
For further endeavors with the Uniprot SPARQL endpoint, let’s find out all the properties of a Uniprot protein node.
PREFIX uniprotkb: <http://purl.uniprot.org/uniprot/>
select distinct ?prop where {
uniprotkb:C3K6B6 ?prop ?val
}
limit 100
Gene properties
… and the same for gene properties.
PREFIX up: <http://purl.uniprot.org/core/>
PREFIX uniprotkb: <http://purl.uniprot.org/uniprot/>
select distinct ?prop where {
uniprotkb:C3K6B6 up:encodedBy ?gene .
?gene ?prop ?val .
}
limit 100
prop |
---|
http://www.w3.org/1999/02/22-rdf-syntax-ns#type |
http://www.w3.org/2004/02/skos/core#prefLabel |
http://purl.uniprot.org/core/locusName |
UniRef
Uniprot provides UniRef clusters, i.e. groups of proteins that share a certain percentage of similarity with a given focal protein. The three cluster groups UniRef100, UniRef90, and UniRef50 contain protein sequences that are 100%, over 90% or over 50% identical to the focal protein, respectively. Here, we utilize these clusters to find literature citations linked to proteins in a given similarity cluster.
Proteins are linked to a given UniRef cluster via their UniParc sequence. Hence, we need to query those UniParc nodes that contain our focus protein, find the UniRef cluster of that UniParc, get the other UniParc sequences in that cluster and finally traverse to the other proteins in those UniParc sequences. The returned data is sorted in descending order of the UniRef matching percentage.
PREFIX up: <http://purl.uniprot.org/core/>
PREFIX uniprotkb: <http://purl.uniprot.org/uniprot/>
PREFIX taxon: <http://purl.uniprot.org/taxonomy/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dc: <http://purl.org/dc/terms/>
SELECT DISTINCT ?pct ?protein ?scName ?title ?url WHERE {
# Find uniref cluster and uniparcs.
BIND(uniprotkb:C3K6B6 as ?focus_protein)
?uniref a up:Cluster ;
up:member ?uniparc .
?uniparc a up:Sequence ;
up:sequenceFor ?focus_protein .
BIND(STRBEFORE(STRAFTER(STR(?uniref), "UniRef"), "_") AS ?pct)
# For each uniparc, find all members and list their references.
?uniref up:member ?uniparc2 .
FILTER(?uniparc2 NOT IN (?uniparc))
?uniparc2 up:sequenceFor ?protein .
?protein a up:Protein .
FILTER(?protein NOT IN (?focus_protein))
?protein up:organism ?taxon .
?taxon up:scientificName ?scName .
FILTER(REGEX(?scName, "^Pseudomonas"))
?protein up:citation ?pub .
?pub up:title ?title .
?pub dc:identifier ?doi .
BIND(IRI(CONCAT("https://dx.doi.org/", ?doi)) AS ?url )
}
ORDER BY DESC(?pct)
LIMIT 100
pct | protein | scName | title | url |
---|---|---|---|---|
90 | http://purl.uniprot.org/uniprot/Q8RSY5 | Pseudomonas fluorescens | Adaptive divergence in experimental populations of Pseudomonas fluorescens. I. Genetic and phenotypic bases of wrinkly spreader fitness. | https://dx.doi.org/doi:10.1093/genetics/161.1.33 |
90 | http://purl.uniprot.org/uniprot/A0A5M9ICB8 | Pseudomonas panacis | Genetic Organization of the <i>aprX-lipA2</i> Operon Affects the Proteolytic Potential of <i>Pseudomonas</i> Species in Milk. | https://dx.doi.org/doi:10.3389/fmicb.2020.01190 |
90 | http://purl.uniprot.org/uniprot/A0A5M9ICB8 | Pseudomonas panacis | Genetic Organization of the aprX-lipA2 Operon Affects the Proteolytic Potential of Pseudomonas Species in Milk. | https://dx.doi.org/doi:10.3389/fmicb.2020.01190 |
90 | http://purl.uniprot.org/uniprot/A0A0H5ARJ4 | Pseudomonas trivialis | Complete Genome Sequence of the Rhizobacterium Pseudomonas trivialis Strain IHBB745 with Multiple Plant Growth-Promoting Activities and Tolerance to Desiccation and Alkalinity. | https://dx.doi.org/doi:10.1128/genomea.00943-15 |
90 | http://purl.uniprot.org/uniprot/W2DZG2 | Pseudomonas sp. FH1 | The rulB gene of plasmid pWW0 is a hotspot for the site-specific insertion of integron-like elements found in the chromosomes of environmental Pseudomonas fluorescens group bacteria. | https://dx.doi.org/doi:10.1111/1462-2920.12345 |
90 | http://purl.uniprot.org/uniprot/A0A829MQ51 | Pseudomonas fluorescens BBc6R8 | Genome Sequence of the Mycorrhizal Helper Bacterium Pseudomonas fluorescens BBc6R8. | https://dx.doi.org/doi:10.1128/genomea.01152-13 |
50 | http://purl.uniprot.org/uniprot/Q888J1 | Pseudomonas syringae pv. tomato (strain ATCC BAA-871 / DC3000) | The complete genome sequence of the Arabidopsis and tomato pathogen Pseudomonas syringae pv. tomato DC3000. | https://dx.doi.org/doi:10.1073/pnas.1731982100 |
50 | http://purl.uniprot.org/uniprot/J2WKT6 | Pseudomonas sp. GM79 | Twenty-one genome sequences from Pseudomonas species and 19 genome sequences from diverse bacteria isolated from the rhizosphere and endosphere of Populus deltoides. | https://dx.doi.org/doi:10.1128/jb.01243-12 |
50 | http://purl.uniprot.org/uniprot/J2RZY5 | Pseudomonas sp. GM48 | Twenty-one genome sequences from Pseudomonas species and 19 genome sequences from diverse bacteria isolated from the rhizosphere and endosphere of Populus deltoides. | https://dx.doi.org/doi:10.1128/jb.01243-12 |
50 | http://purl.uniprot.org/uniprot/J3GHR5 | Pseudomonas sp. GM50 | Twenty-one genome sequences from Pseudomonas species and 19 genome sequences from diverse bacteria isolated from the rhizosphere and endosphere of Populus deltoides. | https://dx.doi.org/doi:10.1128/jb.01243-12 |
50 | http://purl.uniprot.org/uniprot/J2NRJ0 | Pseudomonas sp. GM18 | Twenty-one genome sequences from Pseudomonas species and 19 genome sequences from diverse bacteria isolated from the rhizosphere and endosphere of Populus deltoides. | https://dx.doi.org/doi:10.1128/jb.01243-12 |
50 | http://purl.uniprot.org/uniprot/J3BN19 | Pseudomonas sp. GM67 | Twenty-one genome sequences from Pseudomonas species and 19 genome sequences from diverse bacteria isolated from the rhizosphere and endosphere of Populus deltoides. | https://dx.doi.org/doi:10.1128/jb.01243-12 |
50 | http://purl.uniprot.org/uniprot/J2M3Y6 | Pseudomonas sp. GM102 | Twenty-one genome sequences from Pseudomonas species and 19 genome sequences from diverse bacteria isolated from the rhizosphere and endosphere of Populus deltoides. | https://dx.doi.org/doi:10.1128/jb.01243-12 |
50 | http://purl.uniprot.org/uniprot/Q8RSY5 | Pseudomonas fluorescens | Adaptive divergence in experimental populations of Pseudomonas fluorescens. I. Genetic and phenotypic bases of wrinkly spreader fitness. | https://dx.doi.org/doi:10.1093/genetics/161.1.33 |
50 | http://purl.uniprot.org/uniprot/A0A161ZG69 | Pseudomonas fluorescens | Mutant phenotypes for thousands of bacterial genes of unknown function. | https://dx.doi.org/doi:10.1038/s41586-018-0124-0 |
50 | http://purl.uniprot.org/uniprot/A0A147GXL8 | Pseudomonas psychrotolerans | Genomic Resource of Rice Seed Associated Bacteria. | https://dx.doi.org/doi:10.3389/fmicb.2015.01551 |
50 | http://purl.uniprot.org/uniprot/A0A2W0EWR7 | Pseudomonas jessenii | Characterization of the caprolactam degradation pathway in Pseudomonas jessenii using mass spectrometry-based proteomics. | https://dx.doi.org/doi:10.1007/s00253-018-9073-7 |
50 | http://purl.uniprot.org/uniprot/A0A8E6ERY0 | Pseudomonas qingdaonensis | A newly isolated Pseudomonas putida S-1 strain for batch-mode-propanethiol degradation and continuous treatment of propanethiol-containing waste gas. | https://dx.doi.org/doi:10.1016/j.jhazmat.2015.09.063 |
50 | http://purl.uniprot.org/uniprot/A0A7Y1N448 | Pseudomonas oryzihabitans | Genetic Organization of the <i>aprX-lipA2</i> Operon Affects the Proteolytic Potential of <i>Pseudomonas</i> Species in Milk. | https://dx.doi.org/doi:10.3389/fmicb.2020.01190 |
50 | http://purl.uniprot.org/uniprot/A0A7Y1N448 | Pseudomonas oryzihabitans | Genetic Organization of the aprX-lipA2 Operon Affects the Proteolytic Potential of Pseudomonas Species in Milk. | https://dx.doi.org/doi:10.3389/fmicb.2020.01190 |
50 | http://purl.uniprot.org/uniprot/A0A5M9ICB8 | Pseudomonas panacis | Genetic Organization of the <i>aprX-lipA2</i> Operon Affects the Proteolytic Potential of <i>Pseudomonas</i> Species in Milk. | https://dx.doi.org/doi:10.3389/fmicb.2020.01190 |
50 | http://purl.uniprot.org/uniprot/A0A5M9ICB8 | Pseudomonas panacis | Genetic Organization of the aprX-lipA2 Operon Affects the Proteolytic Potential of Pseudomonas Species in Milk. | https://dx.doi.org/doi:10.3389/fmicb.2020.01190 |
50 | http://purl.uniprot.org/uniprot/A0A077LSB0 | Pseudomonas sp. StFLB209 | Complete Genome Sequence of N-Acylhomoserine Lactone-Producing Pseudomonas sp. Strain StFLB209, Isolated from Potato Phyllosphere. | https://dx.doi.org/doi:10.1128/genomea.01037-14 |
50 | http://purl.uniprot.org/uniprot/F3IQG6 | Pseudomonas amygdali pv. lachrymans str. M302278 | Dynamic evolution of pathogenicity revealed by sequencing and comparative genomics of 19 Pseudomonas syringae isolates. | https://dx.doi.org/doi:10.1371/journal.ppat.1002132 |
50 | http://purl.uniprot.org/uniprot/S6SJ46 | Pseudomonas syringae pv. actinidiae ICMP 18807 | Genomic analysis of the Kiwifruit pathogen Pseudomonas syringae pv. actinidiae provides insight into the origins of an emergent plant disease. | https://dx.doi.org/doi:10.1371/journal.ppat.1003503 |
50 | http://purl.uniprot.org/uniprot/S6SPB9 | Pseudomonas syringae pv. actinidiae ICMP 18807 | Genomic analysis of the Kiwifruit pathogen Pseudomonas syringae pv. actinidiae provides insight into the origins of an emergent plant disease. | https://dx.doi.org/doi:10.1371/journal.ppat.1003503 |
50 | http://purl.uniprot.org/uniprot/A0A261WJQ2 | Pseudomonas avellanae | Genome analysis of the kiwifruit canker pathogen Pseudomonas syringae pv. actinidiae biovar 5. | https://dx.doi.org/doi:10.1038/srep21399 |
50 | http://purl.uniprot.org/uniprot/A0A4P6QNS5 | Pseudomonas syringae | The <i>Ptr1</i> Locus of <i>Solanum lycopersicoides</i> Confers Resistance to Race 1 Strains of <i>Pseudomonas syringae</i> pv. <i>tomato</i> and to <i>Ralstonia pseudosolanacearum</i> by Recognizing the Type III Effectors AvrRpt2 and RipBN. | https://dx.doi.org/doi:10.1094/mpmi-01-19-0018-r |
50 | http://purl.uniprot.org/uniprot/A0A4P6QNS5 | Pseudomonas syringae | The Ptr1 locus of Solanum lycopersicoides confers resistance to race 1 strains of Pseudomonas syringae pv. tomato and to Ralstonia pseudosolanacearum by recognizing the type III effectors AvrRpt2/RipBN. | https://dx.doi.org/doi:10.1094/mpmi-01-19-0018-r |
50 | http://purl.uniprot.org/uniprot/A0A0H5ARJ4 | Pseudomonas trivialis | Complete Genome Sequence of the Rhizobacterium Pseudomonas trivialis Strain IHBB745 with Multiple Plant Growth-Promoting Activities and Tolerance to Desiccation and Alkalinity. | https://dx.doi.org/doi:10.1128/genomea.00943-15 |
50 | http://purl.uniprot.org/uniprot/A0A5B7VT39 | Pseudomonas sp. MPC6 | Exploiting the natural poly(3-hydroxyalkanoates) production capacity of Antarctic Pseudomonas strains: from unique phenotypes to novel biopolymers. | https://dx.doi.org/doi:10.1007/s10295-019-02186-2 |
50 | http://purl.uniprot.org/uniprot/A0A5B7VT39 | Pseudomonas sp. MPC6 | In-Depth Genomic and Phenotypic Characterization of the Antarctic Psychrotolerant Strain <i>Pseudomonas</i> sp. MPC6 Reveals Unique Metabolic Features, Plasticity, and Biotechnological Potential. | https://dx.doi.org/doi:10.3389/fmicb.2019.01154 |
50 | http://purl.uniprot.org/uniprot/A0A5B7VT39 | Pseudomonas sp. MPC6 | In-Depth Genomic and Phenotypic Characterization of the Antarctic Psychrotolerant Strain Pseudomonas sp. MPC6 Reveals Unique Metabolic Features, Plasticity, and Biotechnological Potential. | https://dx.doi.org/doi:10.3389/fmicb.2019.01154 |
50 | http://purl.uniprot.org/uniprot/W2DZG2 | Pseudomonas sp. FH1 | The rulB gene of plasmid pWW0 is a hotspot for the site-specific insertion of integron-like elements found in the chromosomes of environmental Pseudomonas fluorescens group bacteria. | https://dx.doi.org/doi:10.1111/1462-2920.12345 |
50 | http://purl.uniprot.org/uniprot/A0A829MQ51 | Pseudomonas fluorescens BBc6R8 | Genome Sequence of the Mycorrhizal Helper Bacterium Pseudomonas fluorescens BBc6R8. | https://dx.doi.org/doi:10.1128/genomea.01152-13 |
50 | http://purl.uniprot.org/uniprot/A0A0N0VKM3 | Pseudomonas fuscovaginae | Rice-Infecting Pseudomonas Genomes Are Highly Accessorized and Harbor Multiple Putative Virulence Mechanisms to Cause Sheath Brown Rot. | https://dx.doi.org/doi:10.1371/journal.pone.0139256 |
From here, we could now try to extract additional information on our focal protein (gene) from the cited papers. Recently, Yu et al (http://arxiv.org/abs/2203.09975) described a promising approach to automated knowledge extraction from scientific papers and knowledge graph construction. Unfortunately, their online service (https://bios.idea.edu.cn/) does not provide a SPARQL endpoint.