Linked Open Data for Bioimaging – The NFDI4BIOIMAGE Knowledge Graph

This post describes the NFDI4BIOIMAGE Knowledge Graph (N4BIKG). Many aspects of this project are still very much in the flow including a consensus on which ontologies and terms to employ, how to define various namespaces and much more.

The N4BIKG is accessible through a SPARQO endpoint at https://kg.nfdi4bioimage.de/N4BIKG/sparql. The dataset is split into four named graphs:

PREFIX : <https://nfdi.fiz-karlsruhe.de/ontology#>
PREFIX n4bikg: <https://kg.nfdi4bioimage.de/n4bikg/>
PREFIX nfdicore: <https://nfdi.fiz-karlsruhe.de/ontology/>
PREFIX ome_core: <https://ld.openmicroscopy.org/core/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX t4fs: <http://purl.obolibrary.org/obo/T4FS_>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

select distinct ?graph where {
  graph ?graph {?s ?p ?o}
}
graph
https://kg.nfdi4bioimage.de/n4bikg/core
https://kg.nfdi4bioimage.de/n4bikg/n4bi_zenodo_community
https://kg.nfdi4bioimage.de/n4bikg/services
https://kg.nfdi4bioimage.de/n4bikg/owl

As a convenience, the default graph is the union of all named graphs.

The n4bikg:core graph

The core graph contains basic information about the consortium such as name, acronyms, website, wikidata ID. Indeed, it is such a small graph, that we can list it completely here:

PREFIX : <https://nfdi.fiz-karlsruhe.de/ontology#>
PREFIX n4bikg: <https://kg.nfdi4bioimage.de/n4bikg/>
PREFIX nfdicore: <https://nfdi.fiz-karlsruhe.de/ontology/>
PREFIX ome_core: <https://ld.openmicroscopy.org/core/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX t4fs: <http://purl.obolibrary.org/obo/T4FS_>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

select ?s ?p ?o where {graph n4bikg:core {?s ?p ?o}}
s p o
https://nfdi4bioimage.de/rdf/node http://www.w3.org/1999/02/22-rdf-syntax-ns#type https://nfdi.fiz-karlsruhe.de/ontology/NFDI_0001039
https://nfdi4bioimage.de/rdf/node http://purl.org/dc/elements/1.1/description Nationale Forschungsdateninfrastruktur für Mikroskopie und Bildanalyse
https://nfdi4bioimage.de/rdf/node http://purl.org/dc/elements/1.1/description National Research Data Infrastructure for Microscopy and Bioimage Analysis
https://nfdi4bioimage.de/rdf/node https://nfdi.fiz-karlsruhe.de/ontology/NFDI_0000136 2023-03-01
https://nfdi4bioimage.de/rdf/node https://nfdi.fiz-karlsruhe.de/ontology/NFDI_0001006 https://wikidata.org/entity/Q113500855
https://nfdi4bioimage.de/rdf/node https://nfdi.fiz-karlsruhe.de/ontology/NFDI_0001008 https://nfdi4bioimage.de
https://nfdi4bioimage.de/rdf/node https://nfdi.fiz-karlsruhe.de/ontology/NFDI_0010015 N4BI
https://nfdi4bioimage.de/rdf/node https://nfdi.fiz-karlsruhe.de/ontology/NFDI_0010015 NFDI4BIOIMAGE
https://nfdi4bioimage.de/rdf/node http://www.w3.org/2000/01/rdf-schema#label The NFDI4BIOIMAGE consortium
https://nfdi4bioimage.de/rdf/node http://www.w3.org/2000/01/rdf-schema#comment This IRI represents the NFDI4BIOIMAGE consortium as an object in the RDF sense.
https://kg.nfdi4bioimage.de/n4bikg/core http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.wikidata.org/entity/Q31386861
https://kg.nfdi4bioimage.de/n4bikg/core https://nfdi.fiz-karlsruhe.de/ontology/NFDI_0000142 http://www.wikidata.org/entity/Q18199165
https://kg.nfdi4bioimage.de/n4bikg/owl http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.wikidata.org/entity/Q31386861
https://kg.nfdi4bioimage.de/n4bikg/owl https://nfdi.fiz-karlsruhe.de/ontology/NFDI_0000142 http://www.wikidata.org/entity/Q18199165
https://kg.nfdi4bioimage.de/n4bikg/services http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.wikidata.org/entity/Q31386861
https://kg.nfdi4bioimage.de/n4bikg/services https://nfdi.fiz-karlsruhe.de/ontology/NFDI_0000142 http://www.wikidata.org/entity/Q18199165

The IRI https://nfdi4bioimage.de/rdf/node represents the consortium itself, consequently it’s type is nfdicore:NFDI_0001039. Furthermore, it provides a description, its date of inception, website, acronyms and wikidata IDs of the core, owl, and services graphs. Note the zenodo graph is missing here, we’ll add it in a forthcoming version.

The n4bikg:owl graph

You may ask what nfdicore:0001039 stands for? The answer is provided by the second graph n4bikg:owl containing the two main ontologies employed in the N4BIKG. Running the following query:

PREFIX : <https://nfdi.fiz-karlsruhe.de/ontology#>
PREFIX n4bikg: <https://kg.nfdi4bioimage.de/n4bikg/>
PREFIX nfdicore: <https://nfdi.fiz-karlsruhe.de/ontology/>
PREFIX ome_core: <https://ld.openmicroscopy.org/core/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX t4fs: <http://purl.obolibrary.org/obo/T4FS_>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

select *
where {
  graph n4bikg:owl {
    nfdicore:NFDI_0001039 a ?cls;
                          rdfs:label ?label ;
                          rdfs:comment ?comment .
    }
  }
cls label comment
http://www.w3.org/2002/07/owl#Class nfdi consortium An NFDI Consortium is a collaborative organizational entity within the German National Research Data Infrastructure (NFDI) initiative, responsible for the development, management, and coordination of research data infrastructures in specific scientific domains.
http://www.w3.org/2002/07/owl#Class nfdi consortium The National Research Data Infrastructure (NFDI) is intended to sytematically manage scientifich and research data, provide long-term data storage, backup and accessibility, and network the data (inter-)nationally. The German Research Foundation (DFG) and the Joint Science Conference (GWK) are working together to establish and fund the NFDI, on the basis of the administrative agreement concluded by the Federal Government and the federal states on November 26, 2018. The NFDI is designed as a nationwide network, which is built gradually and cooperatively by the respective communities. The goal is to provide a reliable and sustainable service portfolio that covers generic and disciplinary requirements of research data management in Germany. The NFDI is developed step by step and science-oriented with services available to researchers from vaious disciplines, institutions, and federal states.

returns the sought information: nfdi:NFDI_0001039= is a class labeled “nfdi consortium” and explained in further detail in two comments.

The n4bikg:services graph

The n4bikg:services graph is again so small that we can list it in tabular form:

PREFIX : <https://nfdi.fiz-karlsruhe.de/ontology#>
PREFIX n4bikg: <https://kg.nfdi4bioimage.de/n4bikg/>
PREFIX nfdicore: <https://nfdi.fiz-karlsruhe.de/ontology/>
PREFIX ome_core: <https://ld.openmicroscopy.org/core/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX t4fs: <http://purl.obolibrary.org/obo/T4FS_>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

select * where {
  graph n4bikg:services {?s ?p ?o}
}
s p o
https://nfdi4bioimage.de/rdf/node http://www.wikidata.org/prop/direct/P121 https://omero-nfdi.uni-muenster.de
https://omero-nfdi.uni-muenster.de http://www.w3.org/1999/02/22-rdf-syntax-ns#type https://nfdi.fiz-karlsruhe.de/ontology/NFDI_0000001
https://omero-nfdi.uni-muenster.de http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://purl.obolibrary.org/obo/T4FS_0000397
https://omero-nfdi.uni-muenster.de https://nfdi.fiz-karlsruhe.de/ontology/NFDI_0000201 https://omero-nfdi.uni-muenster.de/vkg/sparql
https://omero-nfdi.uni-muenster.de/vkg/sparql http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.wikidata.org/entity/Q33002955
https://evolomero.evolbio.mpg.de/vkg/sparql http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.wikidata.org/entity/Q33002955
https://evolomero.evolbio.mpg.de http://www.w3.org/1999/02/22-rdf-syntax-ns#type https://nfdi.fiz-karlsruhe.de/ontology/NFDI_0000001
https://evolomero.evolbio.mpg.de http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://purl.obolibrary.org/obo/T4FS_0000397
https://evolomero.evolbio.mpg.de https://nfdi.fiz-karlsruhe.de/ontology/NFDI_0000201 https://evolomero.evolbio.mpg.de/vkg/sparql

The top three triples states that the OMERO instance https://omero-nfdi.uni-muenster.de is operated by the NFDI4BIOIMAGE consortium, that it is a NFDI resource (nfdicore:NFDI_0000001) and, more concretely, a database (t4fs:0000397). Its SPARQL endpoint serving the virtual KG runs at https://omero-nfdi.uni-muenster.de/vkg/sparql.

The fourth triple states that the database has a SPARQL endpoint. The remaining triples make statements about another OMERO instance https://evolomero.evolbio.mpg.de and its respective SPARQL endpoint at https://evolomero.evolbio.mpg.de/vkg. What’s missing here is a statement about this OMERO being a NFDI resource and being operated by NFDI4BIOIMAGE.

The n4bikg:n4bi_zenodo_community graph

Finally, the n4bikg:n4bi_zenodo_community graph connects research artifacts deposited in zenodo (presentations, manuscripts, posters, datasets, software among others) to the N4BIKG. This one is currently the largest graph, comprising triples.

Virtualization

The services graph lists public OMERO instances maintained by the NFDI4BIOIMAGE consortium as well as their SPARQL endpoints. These SPARQL endpoints expose metadata about OMERO records (e.g. Projects, Datasets, Images, ROIs, …). This is achieved by mapping the OMERO database backend (PostgreSQL) to RDF using the R2RML mapping language (Das, Sundara, and Cyganiak 2012). In our case, we employ the Ontop-VKG toolsuite (Calvanese and others 2015). R2RML mapping, also referred to as “knowledge graph virtualization” exposes relational database content and schemas as RDF in a synchronized manner, i.e. the (virtual) knowledge graph maps the current state of the database, changes in the latter are immediately reflected by the former. It comes with the cost of slower query response time as compared to “static” triplestores, where the RDF data is stored locally instead of being generated on the fly. The SPARQL endpoint in Ontop-VKG does not support federated queries As a side effect, we effectively work around a shortcoming of the ontop-vkg SPARQL, namely that it does not support federated queries (SELECT). Linking the virtual KGs in our core KG and having a SPARQL endpoint and frontend on the core KG which does support federated queries (including against the virtual KGs) provides an elegant solution.

Knowledge Graph Registries

Both the OMERO instances and their respective SPARQL endpoint URLs are subjects in the N4BI core KG and linked to the subject representing the NFDI4BIOIMAGE consortium through an appropriate property. In other words, the core KG acts as a registry of image data repositories and their corresponding KGs. Moreover, the N4BIKG is listed in the NFDI Knowledge Graph Registry https://kgi.services.base4nfdi.de/kg_registry/, a SPARQL service connecting the KGs implemented by the various NFDI consortia.

Applications

The following SPARQL query illustrates the power and usefulness of the setup: I first query for databases and their SPARQL endpoint URLs registered on the core KG. Then, I run a federated query (SERVICE) on each located endpoint, collecting information about the images contained in the underlying OMERO instance, filtering out “boring” properties such as rdf:type.


PREFIX : <https://nfdi.fiz-karlsruhe.de/ontology#>
PREFIX n4bikg: <https://kg.nfdi4bioimage.de/n4bikg/>
PREFIX nfdicore: <https://nfdi.fiz-karlsruhe.de/ontology/>
PREFIX ome_core: <https://ld.openmicroscopy.org/core/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX t4fs: <http://purl.obolibrary.org/obo/T4FS_>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?srv ?img ?img_property ?img_property_value
WHERE {
 ?srv a <http://purl.obolibrary.org/obo/T4FS_0000397>; # Get all databases operated by
      nfdicore:NFDI_0000201 ?kg .                     # Get the database's sparql endpoint.

 service ?kg {                                        # Connect to omero sparql endpoint
   ?img a ome_core:Image;                             # Get all images...
   ?img_property ?img_property_value.                 # ... and image properties ...
   filter(?img_property != rdf:type)                  # ... but skip boring type properties.
 }
}
LIMIT 5
srv img img_property img_property_value
https://omero-nfdi.uni-muenster.de https://omero-nfdi.uni-muenster.de/ROI/1 https://ld.openmicroscopy.org/core/image https://omero-nfdi.uni-muenster.de/Image/10
https://omero-nfdi.uni-muenster.de https://omero-nfdi.uni-muenster.de/ROI/101 https://ld.openmicroscopy.org/core/image https://omero-nfdi.uni-muenster.de/Image/111
https://omero-nfdi.uni-muenster.de https://omero-nfdi.uni-muenster.de/ROI/102 https://ld.openmicroscopy.org/core/image https://omero-nfdi.uni-muenster.de/Image/112
https://omero-nfdi.uni-muenster.de https://omero-nfdi.uni-muenster.de/ROI/13951 https://ld.openmicroscopy.org/core/image https://omero-nfdi.uni-muenster.de/Image/29584
https://omero-nfdi.uni-muenster.de https://omero-nfdi.uni-muenster.de/ROI/13952 https://ld.openmicroscopy.org/core/image https://omero-nfdi.uni-muenster.de/Image/29584

Now imagine all the opportunities this opens up: Collecting summary information from all OMERO instances in NFDI4BIOIMAGE, identify all keywords and tags employed in these instances, identify similar projects and datasets, cross linking (federating) to other databases and their SPARQL endpoints to gather further annotations on the images and samples not listed in the OMERO metadata. Make images and annotations findable from a single point of entry (the core KG). So exciting …

As a second example, the query below retrieves information about proteins which are referenced in the image annotations from the UniProtKB protein knowledge base. The linking property is the assay name in the OMERO image annotation, which contains a gene name. UniProtKB is queried for proteins encoded by that gene. Note that the OMERO KG is queries in the service clause wrapped inside the query against UniProtKB. Note that the OMERO KG is queries in the service clause wrapped inside the query against UniProtKB.

PREFIX : <https://nfdi.fiz-karlsruhe.de/ontology#>
PREFIX n4bikg: <https://kg.nfdi4bioimage.de/n4bikg/>
PREFIX nfdicore: <https://nfdi.fiz-karlsruhe.de/ontology/>
PREFIX ome_core: <https://ld.openmicroscopy.org/core/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX t4fs: <http://purl.obolibrary.org/obo/T4FS_>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX taxon: <http://purl.uniprot.org/taxonomy/>
PREFIX up: <http://purl.uniprot.org/core/>
PREFIX omens: <http://www.openmicroscopy.org/ns/default/>

SELECT ?img ?protein ?annotation_text
WHERE
{
  ?protein a up:Protein .
  ?protein up:encodedBy ?gene .
  ?gene skos:prefLabel ?name .
  ?protein up:annotation ?annotation .
  ?annotation rdfs:comment ?annotation_text

  service <https://omero-nfdi.uni-muenster.de/vkg/sparql> {
    ?img a ome_core:Image;
           omens:Assay ?assay .
      bind(ucase(?assay) as ?img_tag)
    }
    filter(contains(?name, ?img_tag))
  }
limit 10
img protein annotation_text
https://omero-nfdi.uni-muenster.de/Image/168 http://purl.uniprot.org/uniprot/A0A016TP63 The sequence shown here is derived from an EMBL/GenBank/DDBJ whole genome shotgun (WGS) entry which is preliminary data.
https://omero-nfdi.uni-muenster.de/Image/120 http://purl.uniprot.org/uniprot/A0A016TP63 The sequence shown here is derived from an EMBL/GenBank/DDBJ whole genome shotgun (WGS) entry which is preliminary data.
https://omero-nfdi.uni-muenster.de/Image/159 http://purl.uniprot.org/uniprot/A0A016TP63 The sequence shown here is derived from an EMBL/GenBank/DDBJ whole genome shotgun (WGS) entry which is preliminary data.
https://omero-nfdi.uni-muenster.de/Image/168 http://purl.uniprot.org/uniprot/A0A016TP63 Anaphase-promoting complex subunit 4-like WD40
https://omero-nfdi.uni-muenster.de/Image/120 http://purl.uniprot.org/uniprot/A0A016TP63 Anaphase-promoting complex subunit 4-like WD40
https://omero-nfdi.uni-muenster.de/Image/159 http://purl.uniprot.org/uniprot/A0A016TP63 Anaphase-promoting complex subunit 4-like WD40
https://omero-nfdi.uni-muenster.de/Image/168 http://purl.uniprot.org/uniprot/A0A016TP63 WD
https://omero-nfdi.uni-muenster.de/Image/120 http://purl.uniprot.org/uniprot/A0A016TP63 WD
https://omero-nfdi.uni-muenster.de/Image/159 http://purl.uniprot.org/uniprot/A0A016TP63 WD
https://omero-nfdi.uni-muenster.de/Image/168 http://purl.uniprot.org/uniprot/A0A016TP67 The sequence shown here is derived from an EMBL/GenBank/DDBJ whole genome shotgun (WGS) entry which is preliminary data.

Calvanese, Diego, and others. 2015. “Ontop: Answering SPARQL Queries over Relational Databases.” Semantic Web Journal, 1278–2490. http://www.semantic-web-journal.net/content/ontop-answering-sparql-queries-over-relational-databases-1.

Das, Souripriya, Seema Sundara, and Richard Cyganiak, eds. 2012. “R2RML: RDB to RDF Mapping Language.” https://www.w3.org/TR/r2rml.