Example queries against the Pseudomonas fluorescens SBW25 knowledge graph.

Introduction

In this post I will present example SPARQL queries against the Pseudomonas fluorescens SBW25 knowledge graph (SBW25KG). The knowlegde graph was derived from the manually created annotation in gff3 format, as explained in a previous post.

The queries are run against a local instance of the apache-jena-fuseki triplestore. First, I set the endpoint URL and the maximum number of returned records:

%endpoint http://micropop046:3030/plu/
%show 50

Endpoint set to: http://micropop046:3030/plu/

Result maximum size: 50

Retrieve 10 CDS’s from the SBW25KG.

In this first example, I query for CDS features, list their primary name, locus tag, and protein ID.

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX gffo: <https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#>
PREFIX so: <http://purl.obolibrary.org/obo/SO_>

SELECT DISTINCT  *  WHERE {
    ?cds a so:0000316 ,
           gffo:Feature ;
           
        gffo:primary_name ?primary_name ;
        gffo:locus_tag ?locus_tag ;
        gffo:protein_id ?protein .
        
}

ORDER BY ?locus_tag
LIMIT 10

cds	primary_name	locus_tag	protein
http://pflu.evolbio.mpg.de/bio_data/14291	macB	PFLU_2556	CAY48791.1
http://pflu.evolbio.mpg.de/bio_data/14293	mdeA	PFLU_2558	CAY48793.1
http://pflu.evolbio.mpg.de/bio_data/14298	potF	PFLU_2564	CAY48798.1
http://pflu.evolbio.mpg.de/bio_data/14314	idh	PFLU_2580	CAY48814.1
http://pflu.evolbio.mpg.de/bio_data/14326	proX	PFLU_2592	CAY48826.1
http://pflu.evolbio.mpg.de/bio_data/14327	bfrG	PFLU_2593	CAY48827.1
http://pflu.evolbio.mpg.de/bio_data/14332	bfrH	PFLU_2598	CAY48832.1
http://pflu.evolbio.mpg.de/bio_data/14334	cynR	PFLU_2600	CAY48834.1
http://pflu.evolbio.mpg.de/bio_data/14342	yaaX	PFLU_2608	CAY48842.1
http://pflu.evolbio.mpg.de/bio_data/14348	sohB	PFLU_2614	CAY48848.1

Total: 10, Shown: 10

A click on the CDS URL would open the respective page on the Pseudomonas fluorescens SBW25 genome database

Which properties are listed for CDS features?

One of my first exploratory queries against new (to me) SPARQL endpoints is to list the porperties of a given node type. Here, I list all properties of the gffo:Feature subject class.

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX gffo: <https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#>
PREFIX so: <http://purl.obolibrary.org/obo/SO_>

SELECT DISTINCT  ?prop  WHERE {
    ?cds a so:0000316 ,
           gffo:Feature ;
           
        ?prop ?val .
        
}

prop
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://www.w3.org/2000/01/rdf-schema#seeAlso
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#ID
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#Parent
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#end
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#frame
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#locus_tag
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#score
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#seqid
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#source
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#start
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#strand
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#product
http://purl.uniprot.org/core/reviewed
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#Ontology_term
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#annotated_protein_regions
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#codon_start
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#confidence_level
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#features
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#gene
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#inference
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#protein_families
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#protein_id
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#similarity
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#uniprot_annotation_score
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#ft_domain
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#note
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#pathway
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#cc_domain
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#motif
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#ortholog
https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#primary_name

Total: 32, Shown: 32

Retrieve uniprot entry via federated query using protein id to get hold of the uniprot graph.

Here we use the protein ID to link a node on the SBW25KG to the UniProt KG. On the UniProt KG, the protein ID is linked to the protein (type up:Protein) node through the property rdfs:seeAlso and it’s value is a URI anchored on the emblcds prefix http://purl.uniprot.org/embl-cds/. Hence, we need to construct the full emblcds URI from the variable ?protein_id. This is achieved by concatenating the emblcds: prefix (converted to str) and ?protein_id, casting the resulting string to a URI and bind that URI to the variable ?emblcdsuri. The query than matches against the UniProt node that has emblcdsuri as a rdfs:seeAlso property value. To limit the search on the UniProt side, we also demand that the UniProt node is connected to the up:organism resource that designates Pseudomonas fluorescens SBW25 (taxon:216595). We list the first 25 CDSs.

PREFIX up: <http://purl.uniprot.org/core/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX gffo: <https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#>
PREFIX so: <http://purl.obolibrary.org/obo/SO_>
PREFIX emblcds: <http://purl.uniprot.org/embl-cds/>
PREFIX upbase: <http://purl.uniprot.org/uniprot/>
PREFIX taxon: <http://purl.uniprot.org/taxonomy/>

SELECT DISTINCT ?cds ?id ?protein_id ?protein
WHERE {
  ?cds a so:0000316 ;
       gffo:ID ?id ;
       gffo:protein_id ?protein_id .
  BIND(IRI(CONCAT(STR(emblcds:), ?protein_id)) AS ?emblcdsuri)

  SERVICE <http://sparql.uniprot.org/sparql> {
    ?protein a up:Protein .
    ?protein up:organism taxon:216595 .
    
    ?protein  rdfs:seeAlso ?emblcdsuri .
    ?protein  rdfs:seeAlso ?xref .
    
  }
}
LIMIT 25

cds	id	protein_id	protein
http://pflu.evolbio.mpg.de/bio_data/11845	CDS:PFLU_0001-0	CAJ57270.1	http://purl.uniprot.org/uniprot/B0B0A5
http://pflu.evolbio.mpg.de/bio_data/11846	CDS:PFLU_0002-0	CAY46287.1	http://purl.uniprot.org/uniprot/C3KDU3
http://pflu.evolbio.mpg.de/bio_data/11847	CDS:PFLU_0003-0	CAY46288.1	http://purl.uniprot.org/uniprot/C3KDU4
http://pflu.evolbio.mpg.de/bio_data/11848	CDS:PFLU_0004-0	CAY46289.1	http://purl.uniprot.org/uniprot/C3KDU5
http://pflu.evolbio.mpg.de/bio_data/11849	CDS:PFLU_0005-0	CAY46290.1	http://purl.uniprot.org/uniprot/C3KDU6
http://pflu.evolbio.mpg.de/bio_data/11850	CDS:PFLU_0006-0	CAY46291.1	http://purl.uniprot.org/uniprot/C3KE36
http://pflu.evolbio.mpg.de/bio_data/11851	CDS:PFLU_0007-0	CAY46292.1	http://purl.uniprot.org/uniprot/C3KE37
http://pflu.evolbio.mpg.de/bio_data/11852	CDS:PFLU_0008-0	CAY46293.1	http://purl.uniprot.org/uniprot/C3KE38
http://pflu.evolbio.mpg.de/bio_data/11853	CDS:PFLU_0009-0	CAY46294.1	http://purl.uniprot.org/uniprot/C3KE39
http://pflu.evolbio.mpg.de/bio_data/11854	CDS:PFLU_0010-0	CAY46295.1	http://purl.uniprot.org/uniprot/C3KE40
http://pflu.evolbio.mpg.de/bio_data/11855	CDS:PFLU_0011-0	CAY46296.1	http://purl.uniprot.org/uniprot/C3KE41
http://pflu.evolbio.mpg.de/bio_data/11856	CDS:PFLU_0012-0	CAY46297.1	http://purl.uniprot.org/uniprot/C3KE42
http://pflu.evolbio.mpg.de/bio_data/11857	CDS:PFLU_0013-0	CAY46298.1	http://purl.uniprot.org/uniprot/C3KE43
http://pflu.evolbio.mpg.de/bio_data/11858	CDS:PFLU_0014-0	CAY46299.1	http://purl.uniprot.org/uniprot/C3KE44
http://pflu.evolbio.mpg.de/bio_data/11859	CDS:PFLU_0015-0	CAY46300.1	http://purl.uniprot.org/uniprot/C3KE45
http://pflu.evolbio.mpg.de/bio_data/11860	CDS:PFLU_0016-0	CAY46301.1	http://purl.uniprot.org/uniprot/C3KE46
http://pflu.evolbio.mpg.de/bio_data/11861	CDS:PFLU_0017-0	CAY46302.1	http://purl.uniprot.org/uniprot/C3KE47
http://pflu.evolbio.mpg.de/bio_data/11862	CDS:PFLU_0018-0	CAY46303.1	http://purl.uniprot.org/uniprot/C3KE48
http://pflu.evolbio.mpg.de/bio_data/11863	CDS:PFLU_0019-0	CAY46304.1	http://purl.uniprot.org/uniprot/C3KE49
http://pflu.evolbio.mpg.de/bio_data/11864	CDS:PFLU_0020-0	CAY46305.1	http://purl.uniprot.org/uniprot/C3K4H7
http://pflu.evolbio.mpg.de/bio_data/11865	CDS:PFLU_0021-0	CAY46306.1	http://purl.uniprot.org/uniprot/C3K4H8
http://pflu.evolbio.mpg.de/bio_data/11866	CDS:PFLU_0022-0	CAY46307.1	http://purl.uniprot.org/uniprot/C3K4H9
http://pflu.evolbio.mpg.de/bio_data/11867	CDS:PFLU_0023-0	CAY46308.1	http://purl.uniprot.org/uniprot/C3K4I0
http://pflu.evolbio.mpg.de/bio_data/11868	CDS:PFLU_0024-0	CAY46309.1	http://purl.uniprot.org/uniprot/C3K4I1
http://pflu.evolbio.mpg.de/bio_data/11869	CDS:PFLU_0025-0	CAY46310.1	http://purl.uniprot.org/uniprot/C3K4I2

Total: 25, Shown: 25

Query the UniProt annotations for a given PFLU SBW25 CDS.

Using the same strategy as before, we perform a federated query on the UniProt KG and list the rdfs:comment property values for all up:annotation property values.

PREFIX up: <http://purl.uniprot.org/core/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX gffo: <https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#>
PREFIX so: <http://purl.obolibrary.org/obo/SO_>
PREFIX emblcds: <http://purl.uniprot.org/embl-cds/>
PREFIX taxon: <http://purl.uniprot.org/taxonomy/>

SELECT DISTINCT ?cds ?protein ?comment 
WHERE {
  ?cds a so:0000316 ;
       gffo:ID ?id ;
       gffo:protein_id ?protein_id .
  BIND(IRI(CONCAT(STR(emblcds:), ?protein_id)) AS ?emblcdsuri)

  SERVICE <http://sparql.uniprot.org/sparql> {
    ?protein a up:Protein .
    ?protein up:organism taxon:216595 .
      
   
    ?protein  rdfs:seeAlso ?emblcdsuri .
    
    ?protein up:annotation ?annotation .
    ?annotation rdfs:comment ?comment
                
  }
}
LIMIT 25

cds	protein	comment
http://pflu.evolbio.mpg.de/bio_data/11845	http://purl.uniprot.org/uniprot/B0B0A5	Polar residues
http://pflu.evolbio.mpg.de/bio_data/11845	http://purl.uniprot.org/uniprot/B0B0A5	ATP
http://pflu.evolbio.mpg.de/bio_data/11845	http://purl.uniprot.org/uniprot/B0B0A5	Disordered
http://pflu.evolbio.mpg.de/bio_data/11845	http://purl.uniprot.org/uniprot/B0B0A5	Plays an important role in the initiation and regulation of chromosomal replication. Binds to the origin of replication; it binds specifically double-stranded DNA at a 9 bp consensus (dnaA box): 5'-TTATC[CA]A[CA]A-3'. DnaA binds to ATP and to acidic phospholipids.
http://pflu.evolbio.mpg.de/bio_data/11845	http://purl.uniprot.org/uniprot/B0B0A5	Belongs to the DnaA family.
http://pflu.evolbio.mpg.de/bio_data/11845	http://purl.uniprot.org/uniprot/B0B0A5	Chromosomal replication initiator protein DnaA
http://pflu.evolbio.mpg.de/bio_data/11846	http://purl.uniprot.org/uniprot/C3KDU3	Confers DNA tethering and processivity to DNA polymerases and other proteins. Acts as a clamp, forming a ring around DNA (a reaction catalyzed by the clamp-loading complex) which diffuses in an ATP-independent manner freely and bidirectionally along dsDNA. Initially characterized for its ability to contact the catalytic subunit of DNA polymerase III (Pol III), a complex, multichain enzyme responsible for most of the replicative synthesis in bacteria; Pol III exhibits 3'-5' exonuclease proofreading activity. The beta chain is required for initiation of replication as well as for processivity of DNA replication.
http://pflu.evolbio.mpg.de/bio_data/11846	http://purl.uniprot.org/uniprot/C3KDU3	Forms a ring-shaped head-to-tail homodimer around DNA.
http://pflu.evolbio.mpg.de/bio_data/11846	http://purl.uniprot.org/uniprot/C3KDU3	DNA_pol3_beta_3
http://pflu.evolbio.mpg.de/bio_data/11846	http://purl.uniprot.org/uniprot/C3KDU3	DNA_pol3_beta_2
http://pflu.evolbio.mpg.de/bio_data/11846	http://purl.uniprot.org/uniprot/C3KDU3	DNA_pol3_beta
http://pflu.evolbio.mpg.de/bio_data/11846	http://purl.uniprot.org/uniprot/C3KDU3	Belongs to the beta sliding clamp family.
http://pflu.evolbio.mpg.de/bio_data/11847	http://purl.uniprot.org/uniprot/C3KDU4	DNA replication and repair protein RecF
http://pflu.evolbio.mpg.de/bio_data/11847	http://purl.uniprot.org/uniprot/C3KDU4	Belongs to the RecF family.
http://pflu.evolbio.mpg.de/bio_data/11847	http://purl.uniprot.org/uniprot/C3KDU4	ATP
http://pflu.evolbio.mpg.de/bio_data/11847	http://purl.uniprot.org/uniprot/C3KDU4	The RecF protein is involved in DNA metabolism; it is required for DNA replication and normal SOS inducibility. RecF binds preferentially to single-stranded, linear DNA. It also seems to bind ATP.
http://pflu.evolbio.mpg.de/bio_data/11848	http://purl.uniprot.org/uniprot/C3KDU5	Toprim
http://pflu.evolbio.mpg.de/bio_data/11848	http://purl.uniprot.org/uniprot/C3KDU5	Heterotetramer, composed of two GyrA and two GyrB chains. In the heterotetramer, GyrA contains the active site tyrosine that forms a transient covalent intermediate with DNA, while GyrB binds cofactors and catalyzes ATP hydrolysis.
http://pflu.evolbio.mpg.de/bio_data/11848	http://purl.uniprot.org/uniprot/C3KDU5	A type II topoisomerase that negatively supercoils closed circular double-stranded (ds) DNA in an ATP-dependent manner to modulate DNA topology and maintain chromosomes in an underwound state. Negative supercoiling favors strand separation, and DNA replication, transcription, recombination and repair, all of which involve strand separation. Also able to catalyze the interconversion of other topological isomers of dsDNA rings, including catenanes and knotted rings. Type II topoisomerases break and join 2 DNA strands simultaneously in an ATP-dependent manner.
http://pflu.evolbio.mpg.de/bio_data/11848	http://purl.uniprot.org/uniprot/C3KDU5	Belongs to the type II topoisomerase family.
http://pflu.evolbio.mpg.de/bio_data/11848	http://purl.uniprot.org/uniprot/C3KDU5	Belongs to the type II topoisomerase GyrB family.
http://pflu.evolbio.mpg.de/bio_data/11848	http://purl.uniprot.org/uniprot/C3KDU5	Few gyrases are as efficient as E.coli at forming negative supercoils. Not all organisms have 2 type II topoisomerases; in organisms with a single type II topoisomerase this enzyme also has to decatenate newly replicated chromosomes.
http://pflu.evolbio.mpg.de/bio_data/11849	http://purl.uniprot.org/uniprot/C3KDU6	4-aspartylphosphate
http://pflu.evolbio.mpg.de/bio_data/11849	http://purl.uniprot.org/uniprot/C3KDU6	OmpR/PhoB-type
http://pflu.evolbio.mpg.de/bio_data/11849	http://purl.uniprot.org/uniprot/C3KDU6	Response regulatory

Total: 25, Shown: 25

Find proteins that have the same annotation as wssH

Next, we try to find proteins that may have a similar function as a focus gene in Pflu SBW25. Here, we connect to other proteins through the up:annotation property.

PREFIX up: <http://purl.uniprot.org/core/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX gffo: <https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#>
PREFIX so: <http://purl.obolibrary.org/obo/SO_>
PREFIX emblcds: <http://purl.uniprot.org/embl-cds/>
PREFIX taxon: <http://purl.uniprot.org/taxonomy/>

SELECT DISTINCT ?id ?protein_id ?protein ?annotation ?protein2 ?comment
WHERE {
  ?cds a so:0000316 ;
       gffo:ID ?id .
       FILTER(REGEX(?id, 'CDS:PFLU_0307-0'))
  ?cds gffo:protein_id ?protein_id .
  BIND(IRI(CONCAT(STR(emblcds:), ?protein_id)) AS ?emblcdsuri)

  SERVICE <http://sparql.uniprot.org/sparql> {
    ?protein a up:Protein .
    ?protein up:organism taxon:216595 .
    
    ?protein  rdfs:seeAlso ?emblcdsuri .
    
    ?protein up:annotation ?annotation .
    ?protein2 up:annotation ?annotation .
      
    ?annotation rdfs:comment ?comment .
  }
}
LIMIT 25

id	protein_id	protein	annotation	protein2	comment
CDS:PFLU_0307-0	CAY46584.1	http://purl.uniprot.org/uniprot/C3K6B6	http://purl.uniprot.org/uniprot/C3K6B6#SIP016432E29D5FA475	http://purl.uniprot.org/uniprot/C3K6B6	Helical
CDS:PFLU_0307-0	CAY46584.1	http://purl.uniprot.org/uniprot/C3K6B6	http://purl.uniprot.org/uniprot/C3K6B6#SIP2D0C3C41F62290BE	http://purl.uniprot.org/uniprot/C3K6B6	Helical
CDS:PFLU_0307-0	CAY46584.1	http://purl.uniprot.org/uniprot/C3K6B6	http://purl.uniprot.org/uniprot/C3K6B6#SIP5876DA8DDDFD7798	http://purl.uniprot.org/uniprot/C3K6B6	Belongs to the membrane-bound acyltransferase family.
CDS:PFLU_0307-0	CAY46584.1	http://purl.uniprot.org/uniprot/C3K6B6	http://purl.uniprot.org/uniprot/C3K6B6#SIP5E60A310F8922ADA	http://purl.uniprot.org/uniprot/C3K6B6	Helical
CDS:PFLU_0307-0	CAY46584.1	http://purl.uniprot.org/uniprot/C3K6B6	http://purl.uniprot.org/uniprot/C3K6B6#SIP65059D303A47C9AA	http://purl.uniprot.org/uniprot/C3K6B6	Helical
CDS:PFLU_0307-0	CAY46584.1	http://purl.uniprot.org/uniprot/C3K6B6	http://purl.uniprot.org/uniprot/C3K6B6#SIPD5D9816EB7014D97	http://purl.uniprot.org/uniprot/C3K6B6	Helical
CDS:PFLU_0307-0	CAY46584.1	http://purl.uniprot.org/uniprot/C3K6B6	http://purl.uniprot.org/uniprot/C3K6B6#SIPD76729EFF3A86A35	http://purl.uniprot.org/uniprot/C3K6B6	Helical
CDS:PFLU_0307-0	CAY46584.1	http://purl.uniprot.org/uniprot/C3K6B6	http://purl.uniprot.org/uniprot/C3K6B6#SIPE3AEBFE73B122FEE	http://purl.uniprot.org/uniprot/C3K6B6	Helical
CDS:PFLU_0307-0	CAY46584.1	http://purl.uniprot.org/uniprot/C3K6B6	http://purl.uniprot.org/uniprot/C3K6B6#SIPEAFA32F39A331071	http://purl.uniprot.org/uniprot/C3K6B6	Helical

Total: 9, Shown: 9

This is not exactly what I wanted, since the queried “other” protein (?protein2) is identical to the focal protein. It seems that the annotation ID encodes the protein ID itself, so every annotation is unique to a given protein.

Now, instead of matching the up:annotation property value, we try to match the annotation comment. All but one annotation comments say “Helical”, which is probably too generic to serve as a matching criterion. So let’s focus on the comment “Belongs to the membrane-bound acyltransferase family” and see if we find other proteins that share this annotation comment. The query below selects proteins that share the annotation comment with our focal protein and belong to the Pseudomonas genus. We list only the scientific name of the organism and the protein ID of the selected protein on the UniProt KG.

PREFIX up: <http://purl.uniprot.org/core/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX gffo: <https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#>
PREFIX so: <http://purl.obolibrary.org/obo/SO_>
PREFIX emblcds: <http://purl.uniprot.org/embl-cds/>
PREFIX upbase: <http://purl.uniprot.org/uniprot/>
PREFIX taxon: <http://purl.uniprot.org/taxonomy/>

SELECT DISTINCT ?protein2 ?scName
WHERE {
  ?cds a so:0000316 ;
       gffo:ID ?id .
       FILTER(REGEX(?id, 'CDS:PFLU_0307-0'))
  ?cds gffo:protein_id ?protein_id .
  BIND(IRI(CONCAT(STR(emblcds:), ?protein_id)) AS ?emblcdsuri)

  SERVICE <http://sparql.uniprot.org/sparql> {
    ?protein a up:Protein .
    ?protein up:organism taxon:216595 .
    
    ?protein  rdfs:seeAlso ?emblcdsuri .
    
    ?protein up:annotation ?annotation .
    ?annotation rdfs:comment ?antncmnt .
      
    FILTER(REGEX(?antncmnt, "^Belongs"))
      
    ?protein2 up:annotation ?annotation2 .
    ?annotation2 rdfs:comment ?antncmnt .
      
    ?protein2 up:organism ?taxon .
    ?taxon up:scientificName ?scName .
      
    FILTER(REGEX(?scName, "Pseudomonas"))
           
    
  }
}
LIMIT 25

protein2	scName
http://purl.uniprot.org/uniprot/L8MBF1	Pseudomonas furukawaii
http://purl.uniprot.org/uniprot/L8MM14	Pseudomonas furukawaii
http://purl.uniprot.org/uniprot/Q88ND2	Pseudomonas putida (strain ATCC 47054 / DSM 6125 / CFBP 8728 / NCIMB 11950 / KT2440)
http://purl.uniprot.org/uniprot/Q4ZXL1	Pseudomonas syringae pv. syringae (strain B728a)
http://purl.uniprot.org/uniprot/Q3K778	Pseudomonas fluorescens (strain Pf0-1)
http://purl.uniprot.org/uniprot/Q3KHR1	Pseudomonas fluorescens (strain Pf0-1)
http://purl.uniprot.org/uniprot/Q51392	Pseudomonas aeruginosa (strain ATCC 15692 / DSM 22644 / CIP 104116 / JCM 14847 / LMG 12228 / 1C / PRS 101 / PAO1)
http://purl.uniprot.org/uniprot/Q887Q6	Pseudomonas syringae pv. tomato (strain ATCC BAA-871 / DC3000)
http://purl.uniprot.org/uniprot/A0A072ZRH8	Pseudomonas aeruginosa
http://purl.uniprot.org/uniprot/A0A1Y3LHY0	Pseudomonas putida
http://purl.uniprot.org/uniprot/A0A7U6M686	Pseudomonas putida
http://purl.uniprot.org/uniprot/A0A1Q9QVW4	Pseudomonas putida
http://purl.uniprot.org/uniprot/A0A2S3W6C3	Pseudomonas putida
http://purl.uniprot.org/uniprot/A0A2S3WIC5	Pseudomonas putida
http://purl.uniprot.org/uniprot/A0A0D1PC37	Pseudomonas putida
http://purl.uniprot.org/uniprot/A0A177SSH9	Pseudomonas putida
http://purl.uniprot.org/uniprot/A0A2S3WW87	Pseudomonas putida
http://purl.uniprot.org/uniprot/A0A3M8TKL8	Pseudomonas putida
http://purl.uniprot.org/uniprot/A0A0N8HF79	Pseudomonas putida
http://purl.uniprot.org/uniprot/A0A5P1R2W6	Pseudomonas putida
http://purl.uniprot.org/uniprot/A0A1B2FDE2	Pseudomonas putida
http://purl.uniprot.org/uniprot/A0A0M3CIH1	Pseudomonas putida
http://purl.uniprot.org/uniprot/A0A7W2QI71	Pseudomonas putida
http://purl.uniprot.org/uniprot/A0A6I6XTZ1	Pseudomonas putida
http://purl.uniprot.org/uniprot/A0A3M8SAS1	Pseudomonas putida

Total: 25, Shown: 25

Find citations from UniProt KB.

The same could also be done by searching proteins that share the GO term of our focal protein. GO terms are connected to protein subjects through the up:classifiedWith property.

PREFIX up: <http://purl.uniprot.org/core/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX gffo: <https://raw.githubusercontent.com/mpievolbio-scicomp/GenomeFeatureFormatOntology/main/gffo#>
PREFIX so: <http://purl.obolibrary.org/obo/SO_>
PREFIX emblcds: <http://purl.uniprot.org/embl-cds/>
PREFIX upbase: <http://purl.uniprot.org/uniprot/>
PREFIX taxon: <http://purl.uniprot.org/taxonomy/>

SELECT DISTINCT *
WHERE {
  ?cds a so:0000316 ;
       gffo:ID ?id .
       FILTER(REGEX(?id, 'CDS:PFLU_0307-0'))
  ?cds gffo:protein_id ?protein_id .
  BIND(IRI(CONCAT(STR(emblcds:), ?protein_id)) AS ?emblcdsuri)

  SERVICE <http://sparql.uniprot.org/sparql> {
    ?protein a up:Protein .
    ?protein up:organism taxon:216595 .
    
    # Connect to up protein via protein id (as uri).
    ?protein  rdfs:seeAlso ?emblcdsuri .
    
    # Get GO terms.
    ?protein up:classifiedWith ?keyword .
     
  }
}
LIMIT 10

cds	id	protein_id	emblcdsuri	protein	keyword
http://pflu.evolbio.mpg.de/bio_data/12135	CDS:PFLU_0307-0	CAY46584.1	http://purl.uniprot.org/embl-cds/CAY46584.1	http://purl.uniprot.org/uniprot/C3K6B6	http://purl.obolibrary.org/obo/GO_0005886
http://pflu.evolbio.mpg.de/bio_data/12135	CDS:PFLU_0307-0	CAY46584.1	http://purl.uniprot.org/embl-cds/CAY46584.1	http://purl.uniprot.org/uniprot/C3K6B6	http://purl.obolibrary.org/obo/GO_0016021
http://pflu.evolbio.mpg.de/bio_data/12135	CDS:PFLU_0307-0	CAY46584.1	http://purl.uniprot.org/embl-cds/CAY46584.1	http://purl.uniprot.org/uniprot/C3K6B6	http://purl.obolibrary.org/obo/GO_0016746
http://pflu.evolbio.mpg.de/bio_data/12135	CDS:PFLU_0307-0	CAY46584.1	http://purl.uniprot.org/embl-cds/CAY46584.1	http://purl.uniprot.org/uniprot/C3K6B6	http://purl.obolibrary.org/obo/GO_0042121
http://pflu.evolbio.mpg.de/bio_data/12135	CDS:PFLU_0307-0	CAY46584.1	http://purl.uniprot.org/embl-cds/CAY46584.1	http://purl.uniprot.org/uniprot/C3K6B6	http://purl.uniprot.org/keywords/1133
http://pflu.evolbio.mpg.de/bio_data/12135	CDS:PFLU_0307-0	CAY46584.1	http://purl.uniprot.org/embl-cds/CAY46584.1	http://purl.uniprot.org/uniprot/C3K6B6	http://purl.uniprot.org/keywords/1185
http://pflu.evolbio.mpg.de/bio_data/12135	CDS:PFLU_0307-0	CAY46584.1	http://purl.uniprot.org/embl-cds/CAY46584.1	http://purl.uniprot.org/uniprot/C3K6B6	http://purl.uniprot.org/keywords/12
http://pflu.evolbio.mpg.de/bio_data/12135	CDS:PFLU_0307-0	CAY46584.1	http://purl.uniprot.org/embl-cds/CAY46584.1	http://purl.uniprot.org/uniprot/C3K6B6	http://purl.uniprot.org/keywords/16
http://pflu.evolbio.mpg.de/bio_data/12135	CDS:PFLU_0307-0	CAY46584.1	http://purl.uniprot.org/embl-cds/CAY46584.1	http://purl.uniprot.org/uniprot/C3K6B6	http://purl.uniprot.org/keywords/997

Total: 9, Shown: 9

Since we now know the Uniprot ID of our focal protein, we may as well submit our query directly to the Uniprot SPARQL endpoint.

%endpoint https://sparql.uniprot.org/sparql
%show 100

Endpoint set to: https://sparql.uniprot.org/sparql

Result maximum size: 100

In this query, we list all citations of proteins that share a GO term with C3K6B6 (the WssH protein). In addition, we also list the encoding gene’s name and locus tag.

PREFIX up: <http://purl.uniprot.org/core/>
PREFIX uniprotkb: <http://purl.uniprot.org/uniprot/>
PREFIX taxon: <http://purl.uniprot.org/taxonomy/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dc: <http://purl.org/dc/terms/>

SELECT DISTINCT ?scName ?title ?protein2_recName ?gene2_label ?gene2_locus ?url WHERE {
  uniprotkb:C3K6B6 up:classifiedWith ?annotation .
  ?annotation rdfs:comment ?comment .
  
  ?protein2 a up:Protein .
  ?protein2 rdfs:label ?protein2_recName .
  ?protein2 up:encodedBy ?gene2 .
  ?gene2 skos:prefLabel ?gene2_label ;
         up:locusName ?gene2_locus.
  ?protein2 up:classifiedWith ?annotation .
    

  ?protein2 up:organism ?taxon .
  ?taxon up:scientificName ?scName .
  
  FILTER(REGEX(?scName, "^Pseudomonas"))
  
  ?protein2 up:citation ?pub .
  ?pub up:title ?title .
  ?pub dc:identifier ?doi .
  
  BIND(IRI(CONCAT("https://dx.doi.org/", ?doi)) as ?url )
  
}
LIMIT 10

scName	title	protein2_recName	gene2_label	gene2_locus	url
Pseudomonas aeruginosa (strain ATCC 15692 / DSM 22644 / CIP 104116 / JCM 14847 / LMG 12228 / 1C / PRS 101 / PAO1)	Pseudomonas aeruginosa Genome Database and PseudoCAP: facilitating community-based, continually updated, genome annotation.	Coat protein B of bacteriophage Pf1	coaB	PA0723	https://dx.doi.org/doi:10.1093/nar/gki047
Pseudomonas aeruginosa (strain ATCC 15692 / DSM 22644 / CIP 104116 / JCM 14847 / LMG 12228 / 1C / PRS 101 / PAO1)	Complete genome sequence of Pseudomonas aeruginosa PAO1, an opportunistic pathogen.	Coat protein B of bacteriophage Pf1	coaB	PA0723	https://dx.doi.org/doi:10.1038/35023079
Pseudomonas fluorescens (strain SBW25)	Genomic and genetic analyses of diversity and plant interactions of Pseudomonas fluorescens.	Cellulose synthase catalytic subunit [UDP-forming]	bcsA	PFLU_0301	https://dx.doi.org/doi:10.1186/gb-2009-10-5-r51
Pseudomonas fluorescens (strain SBW25)	Genomic and genetic analyses of diversity and plant interactions of Pseudomonas fluorescens.	Cyclic di-GMP-binding protein	bcsB	PFLU_0302	https://dx.doi.org/doi:10.1186/gb-2009-10-5-r51
Pseudomonas fluorescens (strain SBW25)	Adaptive divergence in experimental populations of Pseudomonas fluorescens. I. Genetic and phenotypic bases of wrinkly spreader fitness.	Cellulose synthase catalytic subunit [UDP-forming]	bcsA	PFLU_0301	https://dx.doi.org/doi:10.1093/genetics/161.1.33
Pseudomonas fluorescens (strain SBW25)	Adaptive divergence in experimental populations of Pseudomonas fluorescens. I. Genetic and phenotypic bases of wrinkly spreader fitness.	Cyclic di-GMP-binding protein	bcsB	PFLU_0302	https://dx.doi.org/doi:10.1093/genetics/161.1.33
Pseudomonas aeruginosa (strain ATCC 15692 / DSM 22644 / CIP 104116 / JCM 14847 / LMG 12228 / 1C / PRS 101 / PAO1)	Genome diversity of Pseudomonas aeruginosa PAO1 laboratory strains.	Coat protein B of bacteriophage Pf1	coaB	PA0723	https://dx.doi.org/doi:10.1128/jb.01515-09
Pseudomonas stutzeri (strain ATCC 17588 / DSM 5190 / CCUG 11256 / JCM 5965 / LMG 11199 / NBRC 14165 / NCIMB 11358 / Stanier 221)	Complete genome sequence of the type strain Pseudomonas stutzeri CGMCC 1.1803.	Sulfate transporter CysZ	cysZ	PSTAB_1036	https://dx.doi.org/doi:10.1128/jb.06061-11
Pseudomonas stutzeri (strain ATCC 17588 / DSM 5190 / CCUG 11256 / JCM 5965 / LMG 11199 / NBRC 14165 / NCIMB 11358 / Stanier 221)	Complete Genome Sequence of the Type Strain Pseudomonas stutzeri CGMCC 1.1803.	Sulfate transporter CysZ	cysZ	PSTAB_1036	https://dx.doi.org/doi:10.1128/jb.06061-11
Pseudomonas stutzeri (strain ATCC 17588 / DSM 5190 / CCUG 11256 / JCM 5965 / LMG 11199 / NBRC 14165 / NCIMB 11358 / Stanier 221)	Complete genome sequence of the type strain Pseudomonas stutzeri CGMCC 1.1803.	Amino acid transporter LysE	lysE	PSTAB_3885	https://dx.doi.org/doi:10.1128/jb.06061-11

Total: 10, Shown: 10

Now this is interesting: The UniProt protein encoded by Pflu SBW252’s wssH gene shares GO terms with the genes bcsA and bcsB which have the same locus tag as wssA and wssB. In other words, we have just found alternative names for wssA and wssB.

All properties of a protein

For further endeavors with the Uniprot SPARQL endpoint, let’s find out all the properties of a Uniprot protein node.

PREFIX uniprotkb: <http://purl.uniprot.org/uniprot/>


select distinct ?prop where {
  uniprotkb:C3K6B6 ?prop ?val
 
}
limit 100

prop
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://www.w3.org/2000/01/rdf-schema#seeAlso
http://purl.uniprot.org/core/citation
http://www.w3.org/2000/01/rdf-schema#label
http://purl.uniprot.org/core/mnemonic
http://purl.uniprot.org/core/annotation
http://purl.uniprot.org/core/attribution
http://purl.uniprot.org/core/classifiedWith
http://purl.uniprot.org/core/created
http://purl.uniprot.org/core/encodedBy
http://purl.uniprot.org/core/enzyme
http://purl.uniprot.org/core/existence
http://purl.uniprot.org/core/modified
http://purl.uniprot.org/core/organism
http://purl.uniprot.org/core/recommendedName
http://purl.uniprot.org/core/representativeFor
http://purl.uniprot.org/core/reviewed
http://purl.uniprot.org/core/seedFor
http://purl.uniprot.org/core/sequence
http://purl.uniprot.org/core/version
http://purl.uniprot.org/core/proteome

Total: 21, Shown: 21

Gene properties

… and the same for gene properties.

PREFIX up: <http://purl.uniprot.org/core/>
PREFIX uniprotkb: <http://purl.uniprot.org/uniprot/>


select distinct ?prop where {
  uniprotkb:C3K6B6 up:encodedBy ?gene .
  ?gene ?prop ?val .
 
}
limit 100

prop
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://www.w3.org/2004/02/skos/core#prefLabel
http://purl.uniprot.org/core/locusName

Total: 3, Shown: 3

UniRef

Uniprot provides UniRef clusters, i.e. groups of proteins that share a certain percentage of similarity with a given focal protein. The three cluster groups UniRef100, UniRef90, and UniRef50 contain protein sequences that are 100%, over 90% or over 50% identical to the focal protein, respectively. Here, we utilize these clusters to find literature citations linked to proteins in a given similarity cluster.

Proteins are linked to a given UniRef cluster via their UniParc sequence. Hence, we need to query those UniParc nodes that contain our focus protein, find the UniRef cluster of that UniParc, get the other UniParc sequences in that cluster and finally traverse to the other proteins in those UniParc sequences. The returned data is sorted in descending order of the UniRef matching percentage.

PREFIX up: <http://purl.uniprot.org/core/>
PREFIX uniprotkb: <http://purl.uniprot.org/uniprot/>
PREFIX taxon: <http://purl.uniprot.org/taxonomy/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dc: <http://purl.org/dc/terms/>

SELECT DISTINCT ?pct ?protein ?scName ?title ?url WHERE {
    # Find uniref cluster and uniparcs.
    BIND(uniprotkb:C3K6B6 as ?focus_protein)
    ?uniref a up:Cluster ;
              up:member ?uniparc .
    ?uniparc a up:Sequence ;
               up:sequenceFor ?focus_protein .
    BIND(STRBEFORE(STRAFTER(STR(?uniref), "UniRef"), "_") AS ?pct) 
    # For each uniparc, find all members and list their references.
    ?uniref up:member ?uniparc2 .
    FILTER(?uniparc2 NOT IN (?uniparc))
    
    ?uniparc2 up:sequenceFor ?protein .
    ?protein a up:Protein .
    FILTER(?protein NOT IN (?focus_protein))
    
    ?protein up:organism ?taxon .
    ?taxon up:scientificName ?scName .

    FILTER(REGEX(?scName, "^Pseudomonas"))

    ?protein up:citation ?pub .
    ?pub up:title ?title .
    ?pub dc:identifier ?doi .

    BIND(IRI(CONCAT("https://dx.doi.org/", ?doi)) AS ?url )

  
}
ORDER BY DESC(?pct)
LIMIT 100

pct	protein	scName	title	url
90	http://purl.uniprot.org/uniprot/Q8RSY5	Pseudomonas fluorescens	Adaptive divergence in experimental populations of Pseudomonas fluorescens. I. Genetic and phenotypic bases of wrinkly spreader fitness.	https://dx.doi.org/doi:10.1093/genetics/161.1.33
90	http://purl.uniprot.org/uniprot/A0A5M9ICB8	Pseudomonas panacis	Genetic Organization of the <i>aprX-lipA2</i> Operon Affects the Proteolytic Potential of <i>Pseudomonas</i> Species in Milk.	https://dx.doi.org/doi:10.3389/fmicb.2020.01190
90	http://purl.uniprot.org/uniprot/A0A5M9ICB8	Pseudomonas panacis	Genetic Organization of the aprX-lipA2 Operon Affects the Proteolytic Potential of Pseudomonas Species in Milk.	https://dx.doi.org/doi:10.3389/fmicb.2020.01190
90	http://purl.uniprot.org/uniprot/A0A0H5ARJ4	Pseudomonas trivialis	Complete Genome Sequence of the Rhizobacterium Pseudomonas trivialis Strain IHBB745 with Multiple Plant Growth-Promoting Activities and Tolerance to Desiccation and Alkalinity.	https://dx.doi.org/doi:10.1128/genomea.00943-15
90	http://purl.uniprot.org/uniprot/W2DZG2	Pseudomonas sp. FH1	The rulB gene of plasmid pWW0 is a hotspot for the site-specific insertion of integron-like elements found in the chromosomes of environmental Pseudomonas fluorescens group bacteria.	https://dx.doi.org/doi:10.1111/1462-2920.12345
90	http://purl.uniprot.org/uniprot/A0A829MQ51	Pseudomonas fluorescens BBc6R8	Genome Sequence of the Mycorrhizal Helper Bacterium Pseudomonas fluorescens BBc6R8.	https://dx.doi.org/doi:10.1128/genomea.01152-13
50	http://purl.uniprot.org/uniprot/Q888J1	Pseudomonas syringae pv. tomato (strain ATCC BAA-871 / DC3000)	The complete genome sequence of the Arabidopsis and tomato pathogen Pseudomonas syringae pv. tomato DC3000.	https://dx.doi.org/doi:10.1073/pnas.1731982100
50	http://purl.uniprot.org/uniprot/J2WKT6	Pseudomonas sp. GM79	Twenty-one genome sequences from Pseudomonas species and 19 genome sequences from diverse bacteria isolated from the rhizosphere and endosphere of Populus deltoides.	https://dx.doi.org/doi:10.1128/jb.01243-12
50	http://purl.uniprot.org/uniprot/J2RZY5	Pseudomonas sp. GM48	Twenty-one genome sequences from Pseudomonas species and 19 genome sequences from diverse bacteria isolated from the rhizosphere and endosphere of Populus deltoides.	https://dx.doi.org/doi:10.1128/jb.01243-12
50	http://purl.uniprot.org/uniprot/J3GHR5	Pseudomonas sp. GM50	Twenty-one genome sequences from Pseudomonas species and 19 genome sequences from diverse bacteria isolated from the rhizosphere and endosphere of Populus deltoides.	https://dx.doi.org/doi:10.1128/jb.01243-12
50	http://purl.uniprot.org/uniprot/J2NRJ0	Pseudomonas sp. GM18	Twenty-one genome sequences from Pseudomonas species and 19 genome sequences from diverse bacteria isolated from the rhizosphere and endosphere of Populus deltoides.	https://dx.doi.org/doi:10.1128/jb.01243-12
50	http://purl.uniprot.org/uniprot/J3BN19	Pseudomonas sp. GM67	Twenty-one genome sequences from Pseudomonas species and 19 genome sequences from diverse bacteria isolated from the rhizosphere and endosphere of Populus deltoides.	https://dx.doi.org/doi:10.1128/jb.01243-12
50	http://purl.uniprot.org/uniprot/J2M3Y6	Pseudomonas sp. GM102	Twenty-one genome sequences from Pseudomonas species and 19 genome sequences from diverse bacteria isolated from the rhizosphere and endosphere of Populus deltoides.	https://dx.doi.org/doi:10.1128/jb.01243-12
50	http://purl.uniprot.org/uniprot/Q8RSY5	Pseudomonas fluorescens	Adaptive divergence in experimental populations of Pseudomonas fluorescens. I. Genetic and phenotypic bases of wrinkly spreader fitness.	https://dx.doi.org/doi:10.1093/genetics/161.1.33
50	http://purl.uniprot.org/uniprot/A0A161ZG69	Pseudomonas fluorescens	Mutant phenotypes for thousands of bacterial genes of unknown function.	https://dx.doi.org/doi:10.1038/s41586-018-0124-0
50	http://purl.uniprot.org/uniprot/A0A147GXL8	Pseudomonas psychrotolerans	Genomic Resource of Rice Seed Associated Bacteria.	https://dx.doi.org/doi:10.3389/fmicb.2015.01551
50	http://purl.uniprot.org/uniprot/A0A2W0EWR7	Pseudomonas jessenii	Characterization of the caprolactam degradation pathway in Pseudomonas jessenii using mass spectrometry-based proteomics.	https://dx.doi.org/doi:10.1007/s00253-018-9073-7
50	http://purl.uniprot.org/uniprot/A0A8E6ERY0	Pseudomonas qingdaonensis	A newly isolated Pseudomonas putida S-1 strain for batch-mode-propanethiol degradation and continuous treatment of propanethiol-containing waste gas.	https://dx.doi.org/doi:10.1016/j.jhazmat.2015.09.063
50	http://purl.uniprot.org/uniprot/A0A7Y1N448	Pseudomonas oryzihabitans	Genetic Organization of the <i>aprX-lipA2</i> Operon Affects the Proteolytic Potential of <i>Pseudomonas</i> Species in Milk.	https://dx.doi.org/doi:10.3389/fmicb.2020.01190
50	http://purl.uniprot.org/uniprot/A0A7Y1N448	Pseudomonas oryzihabitans	Genetic Organization of the aprX-lipA2 Operon Affects the Proteolytic Potential of Pseudomonas Species in Milk.	https://dx.doi.org/doi:10.3389/fmicb.2020.01190
50	http://purl.uniprot.org/uniprot/A0A5M9ICB8	Pseudomonas panacis	Genetic Organization of the <i>aprX-lipA2</i> Operon Affects the Proteolytic Potential of <i>Pseudomonas</i> Species in Milk.	https://dx.doi.org/doi:10.3389/fmicb.2020.01190
50	http://purl.uniprot.org/uniprot/A0A5M9ICB8	Pseudomonas panacis	Genetic Organization of the aprX-lipA2 Operon Affects the Proteolytic Potential of Pseudomonas Species in Milk.	https://dx.doi.org/doi:10.3389/fmicb.2020.01190
50	http://purl.uniprot.org/uniprot/A0A077LSB0	Pseudomonas sp. StFLB209	Complete Genome Sequence of N-Acylhomoserine Lactone-Producing Pseudomonas sp. Strain StFLB209, Isolated from Potato Phyllosphere.	https://dx.doi.org/doi:10.1128/genomea.01037-14
50	http://purl.uniprot.org/uniprot/F3IQG6	Pseudomonas amygdali pv. lachrymans str. M302278	Dynamic evolution of pathogenicity revealed by sequencing and comparative genomics of 19 Pseudomonas syringae isolates.	https://dx.doi.org/doi:10.1371/journal.ppat.1002132
50	http://purl.uniprot.org/uniprot/S6SJ46	Pseudomonas syringae pv. actinidiae ICMP 18807	Genomic analysis of the Kiwifruit pathogen Pseudomonas syringae pv. actinidiae provides insight into the origins of an emergent plant disease.	https://dx.doi.org/doi:10.1371/journal.ppat.1003503
50	http://purl.uniprot.org/uniprot/S6SPB9	Pseudomonas syringae pv. actinidiae ICMP 18807	Genomic analysis of the Kiwifruit pathogen Pseudomonas syringae pv. actinidiae provides insight into the origins of an emergent plant disease.	https://dx.doi.org/doi:10.1371/journal.ppat.1003503
50	http://purl.uniprot.org/uniprot/A0A261WJQ2	Pseudomonas avellanae	Genome analysis of the kiwifruit canker pathogen Pseudomonas syringae pv. actinidiae biovar 5.	https://dx.doi.org/doi:10.1038/srep21399
50	http://purl.uniprot.org/uniprot/A0A4P6QNS5	Pseudomonas syringae	The <i>Ptr1</i> Locus of <i>Solanum lycopersicoides</i> Confers Resistance to Race 1 Strains of <i>Pseudomonas syringae</i> pv. <i>tomato</i> and to <i>Ralstonia pseudosolanacearum</i> by Recognizing the Type III Effectors AvrRpt2 and RipBN.	https://dx.doi.org/doi:10.1094/mpmi-01-19-0018-r
50	http://purl.uniprot.org/uniprot/A0A4P6QNS5	Pseudomonas syringae	The Ptr1 locus of Solanum lycopersicoides confers resistance to race 1 strains of Pseudomonas syringae pv. tomato and to Ralstonia pseudosolanacearum by recognizing the type III effectors AvrRpt2/RipBN.	https://dx.doi.org/doi:10.1094/mpmi-01-19-0018-r
50	http://purl.uniprot.org/uniprot/A0A0H5ARJ4	Pseudomonas trivialis	Complete Genome Sequence of the Rhizobacterium Pseudomonas trivialis Strain IHBB745 with Multiple Plant Growth-Promoting Activities and Tolerance to Desiccation and Alkalinity.	https://dx.doi.org/doi:10.1128/genomea.00943-15
50	http://purl.uniprot.org/uniprot/A0A5B7VT39	Pseudomonas sp. MPC6	Exploiting the natural poly(3-hydroxyalkanoates) production capacity of Antarctic Pseudomonas strains: from unique phenotypes to novel biopolymers.	https://dx.doi.org/doi:10.1007/s10295-019-02186-2
50	http://purl.uniprot.org/uniprot/A0A5B7VT39	Pseudomonas sp. MPC6	In-Depth Genomic and Phenotypic Characterization of the Antarctic Psychrotolerant Strain <i>Pseudomonas</i> sp. MPC6 Reveals Unique Metabolic Features, Plasticity, and Biotechnological Potential.	https://dx.doi.org/doi:10.3389/fmicb.2019.01154
50	http://purl.uniprot.org/uniprot/A0A5B7VT39	Pseudomonas sp. MPC6	In-Depth Genomic and Phenotypic Characterization of the Antarctic Psychrotolerant Strain Pseudomonas sp. MPC6 Reveals Unique Metabolic Features, Plasticity, and Biotechnological Potential.	https://dx.doi.org/doi:10.3389/fmicb.2019.01154
50	http://purl.uniprot.org/uniprot/W2DZG2	Pseudomonas sp. FH1	The rulB gene of plasmid pWW0 is a hotspot for the site-specific insertion of integron-like elements found in the chromosomes of environmental Pseudomonas fluorescens group bacteria.	https://dx.doi.org/doi:10.1111/1462-2920.12345
50	http://purl.uniprot.org/uniprot/A0A829MQ51	Pseudomonas fluorescens BBc6R8	Genome Sequence of the Mycorrhizal Helper Bacterium Pseudomonas fluorescens BBc6R8.	https://dx.doi.org/doi:10.1128/genomea.01152-13
50	http://purl.uniprot.org/uniprot/A0A0N0VKM3	Pseudomonas fuscovaginae	Rice-Infecting Pseudomonas Genomes Are Highly Accessorized and Harbor Multiple Putative Virulence Mechanisms to Cause Sheath Brown Rot.	https://dx.doi.org/doi:10.1371/journal.pone.0139256

Total: 36, Shown: 36

From here, we could now try to extract additional information on our focal protein (gene) from the cited papers. Recently, Yu et al (http://arxiv.org/abs/2203.09975) described a promising approach to automated knowledge extraction from scientific papers and knowledge graph construction. Unfortunately, their online service (https://bios.idea.edu.cn/) does not provide a SPARQL endpoint.

SPARQL RDF Linked Data Semantic Web Knowledge Graph queries UniProt Pseudomonas fluorescens SBW25

Introduction

Retrieve 10 CDS’s from the SBW25KG.

Which properties are listed for CDS features?

Retrieve uniprot entry via federated query using protein id to get hold of the uniprot graph.

Query the UniProt annotations for a given PFLU SBW25 CDS.

Find proteins that have the same annotation as wssH

Find citations from UniProt KB.

Find papers referenced by proteins that share a GO or keyword with wssH.

All properties of a protein

Gene properties

UniRef