Example queries against the Pseudomonas fluorescens SBW25 knowledge graph.

Introduction

In this post I will present example SPARQL queries against the Pseudomonas fluorescens SBW25 knowledge graph (SBW25KG). The knowlegde graph was derived from the manually created annotation in gff3 format, as explained in a previous post.

The queries are run against a local instance of the apache-jena-fuseki triplestore. First, I set the endpoint URL and the maximum number of returned records:

%endpoint http://micropop046:3030/plu/
%show 50
Endpoint set to: http://micropop046:3030/plu/
Result maximum size: 50

Retrieve 10 CDS’s from the SBW25KG.

In this first example, I query for CDS features, list their primary name, locus tag, and protein ID.

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX gffo: <https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#>
PREFIX so: <http://purl.obolibrary.org/obo/SO_>

SELECT DISTINCT  *  WHERE {
    ?cds a so:0000316 ,
           gffo:Feature ;
           
        gffo:primary_name ?primary_name ;
        gffo:locus_tag ?locus_tag ;
        gffo:protein_id ?protein .
        
}

ORDER BY ?locus_tag
LIMIT 10
cds primary_name locus_tag protein
http://pflu.evolbio.mpg.de/bio_data/14291 macB PFLU_2556 CAY48791.1
http://pflu.evolbio.mpg.de/bio_data/14293 mdeA PFLU_2558 CAY48793.1
http://pflu.evolbio.mpg.de/bio_data/14298 potF PFLU_2564 CAY48798.1
http://pflu.evolbio.mpg.de/bio_data/14314 idh PFLU_2580 CAY48814.1
http://pflu.evolbio.mpg.de/bio_data/14326 proX PFLU_2592 CAY48826.1
http://pflu.evolbio.mpg.de/bio_data/14327 bfrG PFLU_2593 CAY48827.1
http://pflu.evolbio.mpg.de/bio_data/14332 bfrH PFLU_2598 CAY48832.1
http://pflu.evolbio.mpg.de/bio_data/14334 cynR PFLU_2600 CAY48834.1
http://pflu.evolbio.mpg.de/bio_data/14342 yaaX PFLU_2608 CAY48842.1
http://pflu.evolbio.mpg.de/bio_data/14348 sohB PFLU_2614 CAY48848.1
Total: 10, Shown: 10

A click on the CDS URL would open the respective page on the Pseudomonas fluorescens SBW25 genome database

Which properties are listed for CDS features?

One of my first exploratory queries against new (to me) SPARQL endpoints is to list the porperties of a given node type. Here, I list all properties of the gffo:Feature subject class.

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX gffo: <https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#>
PREFIX so: <http://purl.obolibrary.org/obo/SO_>

SELECT DISTINCT  ?prop  WHERE {
    ?cds a so:0000316 ,
           gffo:Feature ;
           
        ?prop ?val .
        
}
prop
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://www.w3.org/2000/01/rdf-schema#seeAlso
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#ID
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#Parent
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#end
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#frame
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#locus_tag
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#score
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#seqid
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#source
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#start
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#strand
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#product
http://purl.uniprot.org/core/reviewed
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#Ontology_term
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#annotated_protein_regions
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#codon_start
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#confidence_level
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#features
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#gene
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#inference
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#protein_families
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#protein_id
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#similarity
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#uniprot_annotation_score
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#ft_domain
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#note
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#pathway
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#cc_domain
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#motif
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#ortholog
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#primary_name
Total: 32, Shown: 32

Retrieve uniprot entry via federated query using protein id to get hold of the uniprot graph.

Here we use the protein ID to link a node on the SBW25KG to the UniProt KG. On the UniProt KG, the protein ID is linked to the protein (type up:Protein) node through the property rdfs:seeAlso and it’s value is a URI anchored on the emblcds prefix http://purl.uniprot.org/embl-cds/. Hence, we need to construct the full emblcds URI from the variable ?protein_id. This is achieved by concatenating the emblcds: prefix (converted to str) and ?protein_id, casting the resulting string to a URI and bind that URI to the variable ?emblcdsuri. The query than matches against the UniProt node that has emblcdsuri as a rdfs:seeAlso property value. To limit the search on the UniProt side, we also demand that the UniProt node is connected to the up:organism resource that designates Pseudomonas fluorescens SBW25 (taxon:216595). We list the first 25 CDSs.

PREFIX up: <http://purl.uniprot.org/core/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX gffo: <https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#>
PREFIX so: <http://purl.obolibrary.org/obo/SO_>
PREFIX emblcds: <http://purl.uniprot.org/embl-cds/>
PREFIX upbase: <http://purl.uniprot.org/uniprot/>
PREFIX taxon: <http://purl.uniprot.org/taxonomy/>

SELECT DISTINCT ?cds ?id ?protein_id ?protein
WHERE {
  ?cds a so:0000316 ;
       gffo:ID ?id ;
       gffo:protein_id ?protein_id .
  BIND(IRI(CONCAT(STR(emblcds:), ?protein_id)) AS ?emblcdsuri)

  SERVICE <http://sparql.uniprot.org/sparql> {
    ?protein a up:Protein .
    ?protein up:organism taxon:216595 .
    
    ?protein  rdfs:seeAlso ?emblcdsuri .
    ?protein  rdfs:seeAlso ?xref .
    
  }
}
LIMIT 25
cds id protein_id protein
http://pflu.evolbio.mpg.de/bio_data/11845 CDS:PFLU_0001-0 CAJ57270.1 http://purl.uniprot.org/uniprot/B0B0A5
http://pflu.evolbio.mpg.de/bio_data/11846 CDS:PFLU_0002-0 CAY46287.1 http://purl.uniprot.org/uniprot/C3KDU3
http://pflu.evolbio.mpg.de/bio_data/11847 CDS:PFLU_0003-0 CAY46288.1 http://purl.uniprot.org/uniprot/C3KDU4
http://pflu.evolbio.mpg.de/bio_data/11848 CDS:PFLU_0004-0 CAY46289.1 http://purl.uniprot.org/uniprot/C3KDU5
http://pflu.evolbio.mpg.de/bio_data/11849 CDS:PFLU_0005-0 CAY46290.1 http://purl.uniprot.org/uniprot/C3KDU6
http://pflu.evolbio.mpg.de/bio_data/11850 CDS:PFLU_0006-0 CAY46291.1 http://purl.uniprot.org/uniprot/C3KE36
http://pflu.evolbio.mpg.de/bio_data/11851 CDS:PFLU_0007-0 CAY46292.1 http://purl.uniprot.org/uniprot/C3KE37
http://pflu.evolbio.mpg.de/bio_data/11852 CDS:PFLU_0008-0 CAY46293.1 http://purl.uniprot.org/uniprot/C3KE38
http://pflu.evolbio.mpg.de/bio_data/11853 CDS:PFLU_0009-0 CAY46294.1 http://purl.uniprot.org/uniprot/C3KE39
http://pflu.evolbio.mpg.de/bio_data/11854 CDS:PFLU_0010-0 CAY46295.1 http://purl.uniprot.org/uniprot/C3KE40
http://pflu.evolbio.mpg.de/bio_data/11855 CDS:PFLU_0011-0 CAY46296.1 http://purl.uniprot.org/uniprot/C3KE41
http://pflu.evolbio.mpg.de/bio_data/11856 CDS:PFLU_0012-0 CAY46297.1 http://purl.uniprot.org/uniprot/C3KE42
http://pflu.evolbio.mpg.de/bio_data/11857 CDS:PFLU_0013-0 CAY46298.1 http://purl.uniprot.org/uniprot/C3KE43
http://pflu.evolbio.mpg.de/bio_data/11858 CDS:PFLU_0014-0 CAY46299.1 http://purl.uniprot.org/uniprot/C3KE44
http://pflu.evolbio.mpg.de/bio_data/11859 CDS:PFLU_0015-0 CAY46300.1 http://purl.uniprot.org/uniprot/C3KE45
http://pflu.evolbio.mpg.de/bio_data/11860 CDS:PFLU_0016-0 CAY46301.1 http://purl.uniprot.org/uniprot/C3KE46
http://pflu.evolbio.mpg.de/bio_data/11861 CDS:PFLU_0017-0 CAY46302.1 http://purl.uniprot.org/uniprot/C3KE47
http://pflu.evolbio.mpg.de/bio_data/11862 CDS:PFLU_0018-0 CAY46303.1 http://purl.uniprot.org/uniprot/C3KE48
http://pflu.evolbio.mpg.de/bio_data/11863 CDS:PFLU_0019-0 CAY46304.1 http://purl.uniprot.org/uniprot/C3KE49
http://pflu.evolbio.mpg.de/bio_data/11864 CDS:PFLU_0020-0 CAY46305.1 http://purl.uniprot.org/uniprot/C3K4H7
http://pflu.evolbio.mpg.de/bio_data/11865 CDS:PFLU_0021-0 CAY46306.1 http://purl.uniprot.org/uniprot/C3K4H8
http://pflu.evolbio.mpg.de/bio_data/11866 CDS:PFLU_0022-0 CAY46307.1 http://purl.uniprot.org/uniprot/C3K4H9
http://pflu.evolbio.mpg.de/bio_data/11867 CDS:PFLU_0023-0 CAY46308.1 http://purl.uniprot.org/uniprot/C3K4I0
http://pflu.evolbio.mpg.de/bio_data/11868 CDS:PFLU_0024-0 CAY46309.1 http://purl.uniprot.org/uniprot/C3K4I1
http://pflu.evolbio.mpg.de/bio_data/11869 CDS:PFLU_0025-0 CAY46310.1 http://purl.uniprot.org/uniprot/C3K4I2
Total: 25, Shown: 25

Query the UniProt annotations for a given PFLU SBW25 CDS.

Using the same strategy as before, we perform a federated query on the UniProt KG and list the rdfs:comment property values for all up:annotation property values.

PREFIX up: <http://purl.uniprot.org/core/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX gffo: <https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#>
PREFIX so: <http://purl.obolibrary.org/obo/SO_>
PREFIX emblcds: <http://purl.uniprot.org/embl-cds/>
PREFIX taxon: <http://purl.uniprot.org/taxonomy/>

SELECT DISTINCT ?cds ?protein ?comment 
WHERE {
  ?cds a so:0000316 ;
       gffo:ID ?id ;
       gffo:protein_id ?protein_id .
  BIND(IRI(CONCAT(STR(emblcds:), ?protein_id)) AS ?emblcdsuri)

  SERVICE <http://sparql.uniprot.org/sparql> {
    ?protein a up:Protein .
    ?protein up:organism taxon:216595 .
      
   
    ?protein  rdfs:seeAlso ?emblcdsuri .
    
    ?protein up:annotation ?annotation .
    ?annotation rdfs:comment ?comment
                
  }
}
LIMIT 25
cds protein comment
http://pflu.evolbio.mpg.de/bio_data/11845 http://purl.uniprot.org/uniprot/B0B0A5 Polar residues
http://pflu.evolbio.mpg.de/bio_data/11845 http://purl.uniprot.org/uniprot/B0B0A5 ATP
http://pflu.evolbio.mpg.de/bio_data/11845 http://purl.uniprot.org/uniprot/B0B0A5 Disordered
http://pflu.evolbio.mpg.de/bio_data/11845 http://purl.uniprot.org/uniprot/B0B0A5 Plays an important role in the initiation and regulation of chromosomal replication. Binds to the origin of replication; it binds specifically double-stranded DNA at a 9 bp consensus (dnaA box): 5'-TTATC[CA]A[CA]A-3'. DnaA binds to ATP and to acidic phospholipids.
http://pflu.evolbio.mpg.de/bio_data/11845 http://purl.uniprot.org/uniprot/B0B0A5 Belongs to the DnaA family.
http://pflu.evolbio.mpg.de/bio_data/11845 http://purl.uniprot.org/uniprot/B0B0A5 Chromosomal replication initiator protein DnaA
http://pflu.evolbio.mpg.de/bio_data/11846 http://purl.uniprot.org/uniprot/C3KDU3 Confers DNA tethering and processivity to DNA polymerases and other proteins. Acts as a clamp, forming a ring around DNA (a reaction catalyzed by the clamp-loading complex) which diffuses in an ATP-independent manner freely and bidirectionally along dsDNA. Initially characterized for its ability to contact the catalytic subunit of DNA polymerase III (Pol III), a complex, multichain enzyme responsible for most of the replicative synthesis in bacteria; Pol III exhibits 3'-5' exonuclease proofreading activity. The beta chain is required for initiation of replication as well as for processivity of DNA replication.
http://pflu.evolbio.mpg.de/bio_data/11846 http://purl.uniprot.org/uniprot/C3KDU3 Forms a ring-shaped head-to-tail homodimer around DNA.
http://pflu.evolbio.mpg.de/bio_data/11846 http://purl.uniprot.org/uniprot/C3KDU3 DNA_pol3_beta_3
http://pflu.evolbio.mpg.de/bio_data/11846 http://purl.uniprot.org/uniprot/C3KDU3 DNA_pol3_beta_2
http://pflu.evolbio.mpg.de/bio_data/11846 http://purl.uniprot.org/uniprot/C3KDU3 DNA_pol3_beta
http://pflu.evolbio.mpg.de/bio_data/11846 http://purl.uniprot.org/uniprot/C3KDU3 Belongs to the beta sliding clamp family.
http://pflu.evolbio.mpg.de/bio_data/11847 http://purl.uniprot.org/uniprot/C3KDU4 DNA replication and repair protein RecF
http://pflu.evolbio.mpg.de/bio_data/11847 http://purl.uniprot.org/uniprot/C3KDU4 Belongs to the RecF family.
http://pflu.evolbio.mpg.de/bio_data/11847 http://purl.uniprot.org/uniprot/C3KDU4 ATP
http://pflu.evolbio.mpg.de/bio_data/11847 http://purl.uniprot.org/uniprot/C3KDU4 The RecF protein is involved in DNA metabolism; it is required for DNA replication and normal SOS inducibility. RecF binds preferentially to single-stranded, linear DNA. It also seems to bind ATP.
http://pflu.evolbio.mpg.de/bio_data/11848 http://purl.uniprot.org/uniprot/C3KDU5 Toprim
http://pflu.evolbio.mpg.de/bio_data/11848 http://purl.uniprot.org/uniprot/C3KDU5 Heterotetramer, composed of two GyrA and two GyrB chains. In the heterotetramer, GyrA contains the active site tyrosine that forms a transient covalent intermediate with DNA, while GyrB binds cofactors and catalyzes ATP hydrolysis.
http://pflu.evolbio.mpg.de/bio_data/11848 http://purl.uniprot.org/uniprot/C3KDU5 A type II topoisomerase that negatively supercoils closed circular double-stranded (ds) DNA in an ATP-dependent manner to modulate DNA topology and maintain chromosomes in an underwound state. Negative supercoiling favors strand separation, and DNA replication, transcription, recombination and repair, all of which involve strand separation. Also able to catalyze the interconversion of other topological isomers of dsDNA rings, including catenanes and knotted rings. Type II topoisomerases break and join 2 DNA strands simultaneously in an ATP-dependent manner.
http://pflu.evolbio.mpg.de/bio_data/11848 http://purl.uniprot.org/uniprot/C3KDU5 Belongs to the type II topoisomerase family.
http://pflu.evolbio.mpg.de/bio_data/11848 http://purl.uniprot.org/uniprot/C3KDU5 Belongs to the type II topoisomerase GyrB family.
http://pflu.evolbio.mpg.de/bio_data/11848 http://purl.uniprot.org/uniprot/C3KDU5 Few gyrases are as efficient as E.coli at forming negative supercoils. Not all organisms have 2 type II topoisomerases; in organisms with a single type II topoisomerase this enzyme also has to decatenate newly replicated chromosomes.
http://pflu.evolbio.mpg.de/bio_data/11849 http://purl.uniprot.org/uniprot/C3KDU6 4-aspartylphosphate
http://pflu.evolbio.mpg.de/bio_data/11849 http://purl.uniprot.org/uniprot/C3KDU6 OmpR/PhoB-type
http://pflu.evolbio.mpg.de/bio_data/11849 http://purl.uniprot.org/uniprot/C3KDU6 Response regulatory
Total: 25, Shown: 25

Find proteins that have the same annotation as wssH

Next, we try to find proteins that may have a similar function as a focus gene in Pflu SBW25. Here, we connect to other proteins through the up:annotation property.

PREFIX up: <http://purl.uniprot.org/core/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX gffo: <https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#>
PREFIX so: <http://purl.obolibrary.org/obo/SO_>
PREFIX emblcds: <http://purl.uniprot.org/embl-cds/>
PREFIX taxon: <http://purl.uniprot.org/taxonomy/>

SELECT DISTINCT ?id ?protein_id ?protein ?annotation ?protein2 ?comment
WHERE {
  ?cds a so:0000316 ;
       gffo:ID ?id .
       FILTER(REGEX(?id, 'CDS:PFLU_0307-0'))
  ?cds gffo:protein_id ?protein_id .
  BIND(IRI(CONCAT(STR(emblcds:), ?protein_id)) AS ?emblcdsuri)

  SERVICE <http://sparql.uniprot.org/sparql> {
    ?protein a up:Protein .
    ?protein up:organism taxon:216595 .
    
    ?protein  rdfs:seeAlso ?emblcdsuri .
    
    ?protein up:annotation ?annotation .
    ?protein2 up:annotation ?annotation .
      
    ?annotation rdfs:comment ?comment .
  }
}
LIMIT 25
id protein_id protein annotation protein2 comment
CDS:PFLU_0307-0 CAY46584.1 http://purl.uniprot.org/uniprot/C3K6B6 http://purl.uniprot.org/uniprot/C3K6B6#SIP016432E29D5FA475 http://purl.uniprot.org/uniprot/C3K6B6 Helical
CDS:PFLU_0307-0 CAY46584.1 http://purl.uniprot.org/uniprot/C3K6B6 http://purl.uniprot.org/uniprot/C3K6B6#SIP2D0C3C41F62290BE http://purl.uniprot.org/uniprot/C3K6B6 Helical
CDS:PFLU_0307-0 CAY46584.1 http://purl.uniprot.org/uniprot/C3K6B6 http://purl.uniprot.org/uniprot/C3K6B6#SIP5876DA8DDDFD7798 http://purl.uniprot.org/uniprot/C3K6B6 Belongs to the membrane-bound acyltransferase family.
CDS:PFLU_0307-0 CAY46584.1 http://purl.uniprot.org/uniprot/C3K6B6 http://purl.uniprot.org/uniprot/C3K6B6#SIP5E60A310F8922ADA http://purl.uniprot.org/uniprot/C3K6B6 Helical
CDS:PFLU_0307-0 CAY46584.1 http://purl.uniprot.org/uniprot/C3K6B6 http://purl.uniprot.org/uniprot/C3K6B6#SIP65059D303A47C9AA http://purl.uniprot.org/uniprot/C3K6B6 Helical
CDS:PFLU_0307-0 CAY46584.1 http://purl.uniprot.org/uniprot/C3K6B6 http://purl.uniprot.org/uniprot/C3K6B6#SIPD5D9816EB7014D97 http://purl.uniprot.org/uniprot/C3K6B6 Helical
CDS:PFLU_0307-0 CAY46584.1 http://purl.uniprot.org/uniprot/C3K6B6 http://purl.uniprot.org/uniprot/C3K6B6#SIPD76729EFF3A86A35 http://purl.uniprot.org/uniprot/C3K6B6 Helical
CDS:PFLU_0307-0 CAY46584.1 http://purl.uniprot.org/uniprot/C3K6B6 http://purl.uniprot.org/uniprot/C3K6B6#SIPE3AEBFE73B122FEE http://purl.uniprot.org/uniprot/C3K6B6 Helical
CDS:PFLU_0307-0 CAY46584.1 http://purl.uniprot.org/uniprot/C3K6B6 http://purl.uniprot.org/uniprot/C3K6B6#SIPEAFA32F39A331071 http://purl.uniprot.org/uniprot/C3K6B6 Helical
Total: 9, Shown: 9

This is not exactly what I wanted, since the queried “other” protein (?protein2) is identical to the focal protein. It seems that the annotation ID encodes the protein ID itself, so every annotation is unique to a given protein.

Now, instead of matching the up:annotation property value, we try to match the annotation comment. All but one annotation comments say “Helical”, which is probably too generic to serve as a matching criterion. So let’s focus on the comment “Belongs to the membrane-bound acyltransferase family” and see if we find other proteins that share this annotation comment. The query below selects proteins that share the annotation comment with our focal protein and belong to the Pseudomonas genus. We list only the scientific name of the organism and the protein ID of the selected protein on the UniProt KG.

PREFIX up: <http://purl.uniprot.org/core/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX gffo: <https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#>
PREFIX so: <http://purl.obolibrary.org/obo/SO_>
PREFIX emblcds: <http://purl.uniprot.org/embl-cds/>
PREFIX upbase: <http://purl.uniprot.org/uniprot/>
PREFIX taxon: <http://purl.uniprot.org/taxonomy/>

SELECT DISTINCT ?protein2 ?scName
WHERE {
  ?cds a so:0000316 ;
       gffo:ID ?id .
       FILTER(REGEX(?id, 'CDS:PFLU_0307-0'))
  ?cds gffo:protein_id ?protein_id .
  BIND(IRI(CONCAT(STR(emblcds:), ?protein_id)) AS ?emblcdsuri)

  SERVICE <http://sparql.uniprot.org/sparql> {
    ?protein a up:Protein .
    ?protein up:organism taxon:216595 .
    
    ?protein  rdfs:seeAlso ?emblcdsuri .
    
    ?protein up:annotation ?annotation .
    ?annotation rdfs:comment ?antncmnt .
      
    FILTER(REGEX(?antncmnt, "^Belongs"))
      
    ?protein2 up:annotation ?annotation2 .
    ?annotation2 rdfs:comment ?antncmnt .
      
    ?protein2 up:organism ?taxon .
    ?taxon up:scientificName ?scName .
      
    FILTER(REGEX(?scName, "Pseudomonas"))
           
    
  }
}
LIMIT 25
protein2 scName
http://purl.uniprot.org/uniprot/L8MBF1 Pseudomonas furukawaii
http://purl.uniprot.org/uniprot/L8MM14 Pseudomonas furukawaii
http://purl.uniprot.org/uniprot/Q88ND2 Pseudomonas putida (strain ATCC 47054 / DSM 6125 / CFBP 8728 / NCIMB 11950 / KT2440)
http://purl.uniprot.org/uniprot/Q4ZXL1 Pseudomonas syringae pv. syringae (strain B728a)
http://purl.uniprot.org/uniprot/Q3K778 Pseudomonas fluorescens (strain Pf0-1)
http://purl.uniprot.org/uniprot/Q3KHR1 Pseudomonas fluorescens (strain Pf0-1)
http://purl.uniprot.org/uniprot/Q51392 Pseudomonas aeruginosa (strain ATCC 15692 / DSM 22644 / CIP 104116 / JCM 14847 / LMG 12228 / 1C / PRS 101 / PAO1)
http://purl.uniprot.org/uniprot/Q887Q6 Pseudomonas syringae pv. tomato (strain ATCC BAA-871 / DC3000)
http://purl.uniprot.org/uniprot/A0A072ZRH8 Pseudomonas aeruginosa
http://purl.uniprot.org/uniprot/A0A1Y3LHY0 Pseudomonas putida
http://purl.uniprot.org/uniprot/A0A7U6M686 Pseudomonas putida
http://purl.uniprot.org/uniprot/A0A1Q9QVW4 Pseudomonas putida
http://purl.uniprot.org/uniprot/A0A2S3W6C3 Pseudomonas putida
http://purl.uniprot.org/uniprot/A0A2S3WIC5 Pseudomonas putida
http://purl.uniprot.org/uniprot/A0A0D1PC37 Pseudomonas putida
http://purl.uniprot.org/uniprot/A0A177SSH9 Pseudomonas putida
http://purl.uniprot.org/uniprot/A0A2S3WW87 Pseudomonas putida
http://purl.uniprot.org/uniprot/A0A3M8TKL8 Pseudomonas putida
http://purl.uniprot.org/uniprot/A0A0N8HF79 Pseudomonas putida
http://purl.uniprot.org/uniprot/A0A5P1R2W6 Pseudomonas putida
http://purl.uniprot.org/uniprot/A0A1B2FDE2 Pseudomonas putida
http://purl.uniprot.org/uniprot/A0A0M3CIH1 Pseudomonas putida
http://purl.uniprot.org/uniprot/A0A7W2QI71 Pseudomonas putida
http://purl.uniprot.org/uniprot/A0A6I6XTZ1 Pseudomonas putida
http://purl.uniprot.org/uniprot/A0A3M8SAS1 Pseudomonas putida
Total: 25, Shown: 25

Find citations from UniProt KB.

The same could also be done by searching proteins that share the GO term of our focal protein. GO terms are connected to protein subjects through the up:classifiedWith property.

PREFIX up: <http://purl.uniprot.org/core/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX gffo: <https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#>
PREFIX so: <http://purl.obolibrary.org/obo/SO_>
PREFIX emblcds: <http://purl.uniprot.org/embl-cds/>
PREFIX upbase: <http://purl.uniprot.org/uniprot/>
PREFIX taxon: <http://purl.uniprot.org/taxonomy/>

SELECT DISTINCT *
WHERE {
  ?cds a so:0000316 ;
       gffo:ID ?id .
       FILTER(REGEX(?id, 'CDS:PFLU_0307-0'))
  ?cds gffo:protein_id ?protein_id .
  BIND(IRI(CONCAT(STR(emblcds:), ?protein_id)) AS ?emblcdsuri)

  SERVICE <http://sparql.uniprot.org/sparql> {
    ?protein a up:Protein .
    ?protein up:organism taxon:216595 .
    
    # Connect to up protein via protein id (as uri).
    ?protein  rdfs:seeAlso ?emblcdsuri .
    
    # Get GO terms.
    ?protein up:classifiedWith ?keyword .
     
  }
}
LIMIT 10
cds id protein_id emblcdsuri protein keyword
http://pflu.evolbio.mpg.de/bio_data/12135 CDS:PFLU_0307-0 CAY46584.1 http://purl.uniprot.org/embl-cds/CAY46584.1 http://purl.uniprot.org/uniprot/C3K6B6 http://purl.obolibrary.org/obo/GO_0005886
http://pflu.evolbio.mpg.de/bio_data/12135 CDS:PFLU_0307-0 CAY46584.1 http://purl.uniprot.org/embl-cds/CAY46584.1 http://purl.uniprot.org/uniprot/C3K6B6 http://purl.obolibrary.org/obo/GO_0016021
http://pflu.evolbio.mpg.de/bio_data/12135 CDS:PFLU_0307-0 CAY46584.1 http://purl.uniprot.org/embl-cds/CAY46584.1 http://purl.uniprot.org/uniprot/C3K6B6 http://purl.obolibrary.org/obo/GO_0016746
http://pflu.evolbio.mpg.de/bio_data/12135 CDS:PFLU_0307-0 CAY46584.1 http://purl.uniprot.org/embl-cds/CAY46584.1 http://purl.uniprot.org/uniprot/C3K6B6 http://purl.obolibrary.org/obo/GO_0042121
http://pflu.evolbio.mpg.de/bio_data/12135 CDS:PFLU_0307-0 CAY46584.1 http://purl.uniprot.org/embl-cds/CAY46584.1 http://purl.uniprot.org/uniprot/C3K6B6 http://purl.uniprot.org/keywords/1133
http://pflu.evolbio.mpg.de/bio_data/12135 CDS:PFLU_0307-0 CAY46584.1 http://purl.uniprot.org/embl-cds/CAY46584.1 http://purl.uniprot.org/uniprot/C3K6B6 http://purl.uniprot.org/keywords/1185
http://pflu.evolbio.mpg.de/bio_data/12135 CDS:PFLU_0307-0 CAY46584.1 http://purl.uniprot.org/embl-cds/CAY46584.1 http://purl.uniprot.org/uniprot/C3K6B6 http://purl.uniprot.org/keywords/12
http://pflu.evolbio.mpg.de/bio_data/12135 CDS:PFLU_0307-0 CAY46584.1 http://purl.uniprot.org/embl-cds/CAY46584.1 http://purl.uniprot.org/uniprot/C3K6B6 http://purl.uniprot.org/keywords/16
http://pflu.evolbio.mpg.de/bio_data/12135 CDS:PFLU_0307-0 CAY46584.1 http://purl.uniprot.org/embl-cds/CAY46584.1 http://purl.uniprot.org/uniprot/C3K6B6 http://purl.uniprot.org/keywords/997
Total: 9, Shown: 9

Find papers referenced by proteins that share a GO or keyword with wssH.

Since we now know the Uniprot ID of our focal protein, we may as well submit our query directly to the Uniprot SPARQL endpoint.

%endpoint https://sparql.uniprot.org/sparql
%show 100
Endpoint set to: https://sparql.uniprot.org/sparql
Result maximum size: 100

In this query, we list all citations of proteins that share a GO term with C3K6B6 (the WssH protein). In addition, we also list the encoding gene’s name and locus tag.

PREFIX up: <http://purl.uniprot.org/core/>
PREFIX uniprotkb: <http://purl.uniprot.org/uniprot/>
PREFIX taxon: <http://purl.uniprot.org/taxonomy/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dc: <http://purl.org/dc/terms/>

SELECT DISTINCT ?scName ?title ?protein2_recName ?gene2_label ?gene2_locus ?url WHERE {
  uniprotkb:C3K6B6 up:classifiedWith ?annotation .
  ?annotation rdfs:comment ?comment .
  
  ?protein2 a up:Protein .
  ?protein2 rdfs:label ?protein2_recName .
  ?protein2 up:encodedBy ?gene2 .
  ?gene2 skos:prefLabel ?gene2_label ;
         up:locusName ?gene2_locus.
  ?protein2 up:classifiedWith ?annotation .
    

  ?protein2 up:organism ?taxon .
  ?taxon up:scientificName ?scName .
  
  FILTER(REGEX(?scName, "^Pseudomonas"))
  
  ?protein2 up:citation ?pub .
  ?pub up:title ?title .
  ?pub dc:identifier ?doi .
  
  BIND(IRI(CONCAT("https://dx.doi.org/", ?doi)) as ?url )
  
}
LIMIT 10
scName title protein2_recName gene2_label gene2_locus url
Pseudomonas aeruginosa (strain ATCC 15692 / DSM 22644 / CIP 104116 / JCM 14847 / LMG 12228 / 1C / PRS 101 / PAO1) Pseudomonas aeruginosa Genome Database and PseudoCAP: facilitating community-based, continually updated, genome annotation. Coat protein B of bacteriophage Pf1 coaB PA0723 https://dx.doi.org/doi:10.1093/nar/gki047
Pseudomonas aeruginosa (strain ATCC 15692 / DSM 22644 / CIP 104116 / JCM 14847 / LMG 12228 / 1C / PRS 101 / PAO1) Complete genome sequence of Pseudomonas aeruginosa PAO1, an opportunistic pathogen. Coat protein B of bacteriophage Pf1 coaB PA0723 https://dx.doi.org/doi:10.1038/35023079
Pseudomonas fluorescens (strain SBW25) Genomic and genetic analyses of diversity and plant interactions of Pseudomonas fluorescens. Cellulose synthase catalytic subunit [UDP-forming] bcsA PFLU_0301 https://dx.doi.org/doi:10.1186/gb-2009-10-5-r51
Pseudomonas fluorescens (strain SBW25) Genomic and genetic analyses of diversity and plant interactions of Pseudomonas fluorescens. Cyclic di-GMP-binding protein bcsB PFLU_0302 https://dx.doi.org/doi:10.1186/gb-2009-10-5-r51
Pseudomonas fluorescens (strain SBW25) Adaptive divergence in experimental populations of Pseudomonas fluorescens. I. Genetic and phenotypic bases of wrinkly spreader fitness. Cellulose synthase catalytic subunit [UDP-forming] bcsA PFLU_0301 https://dx.doi.org/doi:10.1093/genetics/161.1.33
Pseudomonas fluorescens (strain SBW25) Adaptive divergence in experimental populations of Pseudomonas fluorescens. I. Genetic and phenotypic bases of wrinkly spreader fitness. Cyclic di-GMP-binding protein bcsB PFLU_0302 https://dx.doi.org/doi:10.1093/genetics/161.1.33
Pseudomonas aeruginosa (strain ATCC 15692 / DSM 22644 / CIP 104116 / JCM 14847 / LMG 12228 / 1C / PRS 101 / PAO1) Genome diversity of Pseudomonas aeruginosa PAO1 laboratory strains. Coat protein B of bacteriophage Pf1 coaB PA0723 https://dx.doi.org/doi:10.1128/jb.01515-09
Pseudomonas stutzeri (strain ATCC 17588 / DSM 5190 / CCUG 11256 / JCM 5965 / LMG 11199 / NBRC 14165 / NCIMB 11358 / Stanier 221) Complete genome sequence of the type strain Pseudomonas stutzeri CGMCC 1.1803. Sulfate transporter CysZ cysZ PSTAB_1036 https://dx.doi.org/doi:10.1128/jb.06061-11
Pseudomonas stutzeri (strain ATCC 17588 / DSM 5190 / CCUG 11256 / JCM 5965 / LMG 11199 / NBRC 14165 / NCIMB 11358 / Stanier 221) Complete Genome Sequence of the Type Strain Pseudomonas stutzeri CGMCC 1.1803. Sulfate transporter CysZ cysZ PSTAB_1036 https://dx.doi.org/doi:10.1128/jb.06061-11
Pseudomonas stutzeri (strain ATCC 17588 / DSM 5190 / CCUG 11256 / JCM 5965 / LMG 11199 / NBRC 14165 / NCIMB 11358 / Stanier 221) Complete genome sequence of the type strain Pseudomonas stutzeri CGMCC 1.1803. Amino acid transporter LysE lysE PSTAB_3885 https://dx.doi.org/doi:10.1128/jb.06061-11
Total: 10, Shown: 10

Now this is interesting: The UniProt protein encoded by Pflu SBW252’s wssH gene shares GO terms with the genes bcsA and bcsB which have the same locus tag as wssA and wssB. In other words, we have just found alternative names for wssA and wssB.

All properties of a protein

For further endeavors with the Uniprot SPARQL endpoint, let’s find out all the properties of a Uniprot protein node.

PREFIX uniprotkb: <http://purl.uniprot.org/uniprot/>


select distinct ?prop where {
  uniprotkb:C3K6B6 ?prop ?val
 
}
limit 100

Gene properties

… and the same for gene properties.

PREFIX up: <http://purl.uniprot.org/core/>
PREFIX uniprotkb: <http://purl.uniprot.org/uniprot/>


select distinct ?prop where {
  uniprotkb:C3K6B6 up:encodedBy ?gene .
  ?gene ?prop ?val .
 
}
limit 100

UniRef

Uniprot provides UniRef clusters, i.e. groups of proteins that share a certain percentage of similarity with a given focal protein. The three cluster groups UniRef100, UniRef90, and UniRef50 contain protein sequences that are 100%, over 90% or over 50% identical to the focal protein, respectively. Here, we utilize these clusters to find literature citations linked to proteins in a given similarity cluster.

Proteins are linked to a given UniRef cluster via their UniParc sequence. Hence, we need to query those UniParc nodes that contain our focus protein, find the UniRef cluster of that UniParc, get the other UniParc sequences in that cluster and finally traverse to the other proteins in those UniParc sequences. The returned data is sorted in descending order of the UniRef matching percentage.

PREFIX up: <http://purl.uniprot.org/core/>
PREFIX uniprotkb: <http://purl.uniprot.org/uniprot/>
PREFIX taxon: <http://purl.uniprot.org/taxonomy/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dc: <http://purl.org/dc/terms/>

SELECT DISTINCT ?pct ?protein ?scName ?title ?url WHERE {
    # Find uniref cluster and uniparcs.
    BIND(uniprotkb:C3K6B6 as ?focus_protein)
    ?uniref a up:Cluster ;
              up:member ?uniparc .
    ?uniparc a up:Sequence ;
               up:sequenceFor ?focus_protein .
    BIND(STRBEFORE(STRAFTER(STR(?uniref), "UniRef"), "_") AS ?pct) 
    # For each uniparc, find all members and list their references.
    ?uniref up:member ?uniparc2 .
    FILTER(?uniparc2 NOT IN (?uniparc))
    
    ?uniparc2 up:sequenceFor ?protein .
    ?protein a up:Protein .
    FILTER(?protein NOT IN (?focus_protein))
    
    ?protein up:organism ?taxon .
    ?taxon up:scientificName ?scName .

    FILTER(REGEX(?scName, "^Pseudomonas"))

    ?protein up:citation ?pub .
    ?pub up:title ?title .
    ?pub dc:identifier ?doi .

    BIND(IRI(CONCAT("https://dx.doi.org/", ?doi)) AS ?url )

  
}
ORDER BY DESC(?pct)
LIMIT 100
pct protein scName title url
90 http://purl.uniprot.org/uniprot/Q8RSY5 Pseudomonas fluorescens Adaptive divergence in experimental populations of Pseudomonas fluorescens. I. Genetic and phenotypic bases of wrinkly spreader fitness. https://dx.doi.org/doi:10.1093/genetics/161.1.33
90 http://purl.uniprot.org/uniprot/A0A5M9ICB8 Pseudomonas panacis Genetic Organization of the <i>aprX-lipA2</i> Operon Affects the Proteolytic Potential of <i>Pseudomonas</i> Species in Milk. https://dx.doi.org/doi:10.3389/fmicb.2020.01190
90 http://purl.uniprot.org/uniprot/A0A5M9ICB8 Pseudomonas panacis Genetic Organization of the aprX-lipA2 Operon Affects the Proteolytic Potential of Pseudomonas Species in Milk. https://dx.doi.org/doi:10.3389/fmicb.2020.01190
90 http://purl.uniprot.org/uniprot/A0A0H5ARJ4 Pseudomonas trivialis Complete Genome Sequence of the Rhizobacterium Pseudomonas trivialis Strain IHBB745 with Multiple Plant Growth-Promoting Activities and Tolerance to Desiccation and Alkalinity. https://dx.doi.org/doi:10.1128/genomea.00943-15
90 http://purl.uniprot.org/uniprot/W2DZG2 Pseudomonas sp. FH1 The rulB gene of plasmid pWW0 is a hotspot for the site-specific insertion of integron-like elements found in the chromosomes of environmental Pseudomonas fluorescens group bacteria. https://dx.doi.org/doi:10.1111/1462-2920.12345
90 http://purl.uniprot.org/uniprot/A0A829MQ51 Pseudomonas fluorescens BBc6R8 Genome Sequence of the Mycorrhizal Helper Bacterium Pseudomonas fluorescens BBc6R8. https://dx.doi.org/doi:10.1128/genomea.01152-13
50 http://purl.uniprot.org/uniprot/Q888J1 Pseudomonas syringae pv. tomato (strain ATCC BAA-871 / DC3000) The complete genome sequence of the Arabidopsis and tomato pathogen Pseudomonas syringae pv. tomato DC3000. https://dx.doi.org/doi:10.1073/pnas.1731982100
50 http://purl.uniprot.org/uniprot/J2WKT6 Pseudomonas sp. GM79 Twenty-one genome sequences from Pseudomonas species and 19 genome sequences from diverse bacteria isolated from the rhizosphere and endosphere of Populus deltoides. https://dx.doi.org/doi:10.1128/jb.01243-12
50 http://purl.uniprot.org/uniprot/J2RZY5 Pseudomonas sp. GM48 Twenty-one genome sequences from Pseudomonas species and 19 genome sequences from diverse bacteria isolated from the rhizosphere and endosphere of Populus deltoides. https://dx.doi.org/doi:10.1128/jb.01243-12
50 http://purl.uniprot.org/uniprot/J3GHR5 Pseudomonas sp. GM50 Twenty-one genome sequences from Pseudomonas species and 19 genome sequences from diverse bacteria isolated from the rhizosphere and endosphere of Populus deltoides. https://dx.doi.org/doi:10.1128/jb.01243-12
50 http://purl.uniprot.org/uniprot/J2NRJ0 Pseudomonas sp. GM18 Twenty-one genome sequences from Pseudomonas species and 19 genome sequences from diverse bacteria isolated from the rhizosphere and endosphere of Populus deltoides. https://dx.doi.org/doi:10.1128/jb.01243-12
50 http://purl.uniprot.org/uniprot/J3BN19 Pseudomonas sp. GM67 Twenty-one genome sequences from Pseudomonas species and 19 genome sequences from diverse bacteria isolated from the rhizosphere and endosphere of Populus deltoides. https://dx.doi.org/doi:10.1128/jb.01243-12
50 http://purl.uniprot.org/uniprot/J2M3Y6 Pseudomonas sp. GM102 Twenty-one genome sequences from Pseudomonas species and 19 genome sequences from diverse bacteria isolated from the rhizosphere and endosphere of Populus deltoides. https://dx.doi.org/doi:10.1128/jb.01243-12
50 http://purl.uniprot.org/uniprot/Q8RSY5 Pseudomonas fluorescens Adaptive divergence in experimental populations of Pseudomonas fluorescens. I. Genetic and phenotypic bases of wrinkly spreader fitness. https://dx.doi.org/doi:10.1093/genetics/161.1.33
50 http://purl.uniprot.org/uniprot/A0A161ZG69 Pseudomonas fluorescens Mutant phenotypes for thousands of bacterial genes of unknown function. https://dx.doi.org/doi:10.1038/s41586-018-0124-0
50 http://purl.uniprot.org/uniprot/A0A147GXL8 Pseudomonas psychrotolerans Genomic Resource of Rice Seed Associated Bacteria. https://dx.doi.org/doi:10.3389/fmicb.2015.01551
50 http://purl.uniprot.org/uniprot/A0A2W0EWR7 Pseudomonas jessenii Characterization of the caprolactam degradation pathway in Pseudomonas jessenii using mass spectrometry-based proteomics. https://dx.doi.org/doi:10.1007/s00253-018-9073-7
50 http://purl.uniprot.org/uniprot/A0A8E6ERY0 Pseudomonas qingdaonensis A newly isolated Pseudomonas putida S-1 strain for batch-mode-propanethiol degradation and continuous treatment of propanethiol-containing waste gas. https://dx.doi.org/doi:10.1016/j.jhazmat.2015.09.063
50 http://purl.uniprot.org/uniprot/A0A7Y1N448 Pseudomonas oryzihabitans Genetic Organization of the <i>aprX-lipA2</i> Operon Affects the Proteolytic Potential of <i>Pseudomonas</i> Species in Milk. https://dx.doi.org/doi:10.3389/fmicb.2020.01190
50 http://purl.uniprot.org/uniprot/A0A7Y1N448 Pseudomonas oryzihabitans Genetic Organization of the aprX-lipA2 Operon Affects the Proteolytic Potential of Pseudomonas Species in Milk. https://dx.doi.org/doi:10.3389/fmicb.2020.01190
50 http://purl.uniprot.org/uniprot/A0A5M9ICB8 Pseudomonas panacis Genetic Organization of the <i>aprX-lipA2</i> Operon Affects the Proteolytic Potential of <i>Pseudomonas</i> Species in Milk. https://dx.doi.org/doi:10.3389/fmicb.2020.01190
50 http://purl.uniprot.org/uniprot/A0A5M9ICB8 Pseudomonas panacis Genetic Organization of the aprX-lipA2 Operon Affects the Proteolytic Potential of Pseudomonas Species in Milk. https://dx.doi.org/doi:10.3389/fmicb.2020.01190
50 http://purl.uniprot.org/uniprot/A0A077LSB0 Pseudomonas sp. StFLB209 Complete Genome Sequence of N-Acylhomoserine Lactone-Producing Pseudomonas sp. Strain StFLB209, Isolated from Potato Phyllosphere. https://dx.doi.org/doi:10.1128/genomea.01037-14
50 http://purl.uniprot.org/uniprot/F3IQG6 Pseudomonas amygdali pv. lachrymans str. M302278 Dynamic evolution of pathogenicity revealed by sequencing and comparative genomics of 19 Pseudomonas syringae isolates. https://dx.doi.org/doi:10.1371/journal.ppat.1002132
50 http://purl.uniprot.org/uniprot/S6SJ46 Pseudomonas syringae pv. actinidiae ICMP 18807 Genomic analysis of the Kiwifruit pathogen Pseudomonas syringae pv. actinidiae provides insight into the origins of an emergent plant disease. https://dx.doi.org/doi:10.1371/journal.ppat.1003503
50 http://purl.uniprot.org/uniprot/S6SPB9 Pseudomonas syringae pv. actinidiae ICMP 18807 Genomic analysis of the Kiwifruit pathogen Pseudomonas syringae pv. actinidiae provides insight into the origins of an emergent plant disease. https://dx.doi.org/doi:10.1371/journal.ppat.1003503
50 http://purl.uniprot.org/uniprot/A0A261WJQ2 Pseudomonas avellanae Genome analysis of the kiwifruit canker pathogen Pseudomonas syringae pv. actinidiae biovar 5. https://dx.doi.org/doi:10.1038/srep21399
50 http://purl.uniprot.org/uniprot/A0A4P6QNS5 Pseudomonas syringae The <i>Ptr1</i> Locus of <i>Solanum lycopersicoides</i> Confers Resistance to Race 1 Strains of <i>Pseudomonas syringae</i> pv. <i>tomato</i> and to <i>Ralstonia pseudosolanacearum</i> by Recognizing the Type III Effectors AvrRpt2 and RipBN. https://dx.doi.org/doi:10.1094/mpmi-01-19-0018-r
50 http://purl.uniprot.org/uniprot/A0A4P6QNS5 Pseudomonas syringae The Ptr1 locus of Solanum lycopersicoides confers resistance to race 1 strains of Pseudomonas syringae pv. tomato and to Ralstonia pseudosolanacearum by recognizing the type III effectors AvrRpt2/RipBN. https://dx.doi.org/doi:10.1094/mpmi-01-19-0018-r
50 http://purl.uniprot.org/uniprot/A0A0H5ARJ4 Pseudomonas trivialis Complete Genome Sequence of the Rhizobacterium Pseudomonas trivialis Strain IHBB745 with Multiple Plant Growth-Promoting Activities and Tolerance to Desiccation and Alkalinity. https://dx.doi.org/doi:10.1128/genomea.00943-15
50 http://purl.uniprot.org/uniprot/A0A5B7VT39 Pseudomonas sp. MPC6 Exploiting the natural poly(3-hydroxyalkanoates) production capacity of Antarctic Pseudomonas strains: from unique phenotypes to novel biopolymers. https://dx.doi.org/doi:10.1007/s10295-019-02186-2
50 http://purl.uniprot.org/uniprot/A0A5B7VT39 Pseudomonas sp. MPC6 In-Depth Genomic and Phenotypic Characterization of the Antarctic Psychrotolerant Strain <i>Pseudomonas</i> sp. MPC6 Reveals Unique Metabolic Features, Plasticity, and Biotechnological Potential. https://dx.doi.org/doi:10.3389/fmicb.2019.01154
50 http://purl.uniprot.org/uniprot/A0A5B7VT39 Pseudomonas sp. MPC6 In-Depth Genomic and Phenotypic Characterization of the Antarctic Psychrotolerant Strain Pseudomonas sp. MPC6 Reveals Unique Metabolic Features, Plasticity, and Biotechnological Potential. https://dx.doi.org/doi:10.3389/fmicb.2019.01154
50 http://purl.uniprot.org/uniprot/W2DZG2 Pseudomonas sp. FH1 The rulB gene of plasmid pWW0 is a hotspot for the site-specific insertion of integron-like elements found in the chromosomes of environmental Pseudomonas fluorescens group bacteria. https://dx.doi.org/doi:10.1111/1462-2920.12345
50 http://purl.uniprot.org/uniprot/A0A829MQ51 Pseudomonas fluorescens BBc6R8 Genome Sequence of the Mycorrhizal Helper Bacterium Pseudomonas fluorescens BBc6R8. https://dx.doi.org/doi:10.1128/genomea.01152-13
50 http://purl.uniprot.org/uniprot/A0A0N0VKM3 Pseudomonas fuscovaginae Rice-Infecting Pseudomonas Genomes Are Highly Accessorized and Harbor Multiple Putative Virulence Mechanisms to Cause Sheath Brown Rot. https://dx.doi.org/doi:10.1371/journal.pone.0139256
Total: 36, Shown: 36

From here, we could now try to extract additional information on our focal protein (gene) from the cited papers. Recently, Yu et al (http://arxiv.org/abs/2203.09975) described a promising approach to automated knowledge extraction from scientific papers and knowledge graph construction. Unfortunately, their online service (https://bios.idea.edu.cn/) does not provide a SPARQL endpoint.