This post describes the NFDI4BIOIMAGE Knowledge Graph (N4BIKG). Many aspects of this project are still very much in the flow including a consensus on which ontologies and terms to employ, how to define various namespaces and much more.
The N4BIKG is accessible through a SPARQO endpoint at https://kg.nfdi4bioimage.de/N4BIKG/sparql. The dataset is split into four named graphs:
PREFIX : <https://nfdi.fiz-karlsruhe.de/ontology#>
PREFIX n4bikg: <https://kg.nfdi4bioimage.de/n4bikg/>
PREFIX nfdicore: <https://nfdi.fiz-karlsruhe.de/ontology/>
PREFIX ome_core: <https://ld.openmicroscopy.org/core/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX t4fs: <http://purl.obolibrary.org/obo/T4FS_>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct ?graph where {
graph ?graph {?s ?p ?o}
}
| graph |
|---|
| https://kg.nfdi4bioimage.de/n4bikg/core |
| https://kg.nfdi4bioimage.de/n4bikg/n4bi_zenodo_community |
| https://kg.nfdi4bioimage.de/n4bikg/services |
| https://kg.nfdi4bioimage.de/n4bikg/owl |
As a convenience, the default graph is the union of all named graphs.
- https://kg.nfdi4bioimage.de/n4bikg/n4bi_zenodo_community: All zenodo records registered under https://zenodo.org/communities/nfdi4bioimage, authors, contributors, and affiliations as one RDF graph
- https://kg.nfdi4bioimage.de/n4bikg/services: Public OMERO instances, their respective SPARQL endpoints
- https://kg.nfdi4bioimage.de/n4bikg/owl: Ontologies employed in either of these graphs (NFDI core ontology and OME-LD schema.
The n4bikg:core graph
The core graph contains basic information about the consortium such as name, acronyms, website, wikidata ID. Indeed, it is such a small graph, that we can list it completely here:
PREFIX : <https://nfdi.fiz-karlsruhe.de/ontology#>
PREFIX n4bikg: <https://kg.nfdi4bioimage.de/n4bikg/>
PREFIX nfdicore: <https://nfdi.fiz-karlsruhe.de/ontology/>
PREFIX ome_core: <https://ld.openmicroscopy.org/core/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX t4fs: <http://purl.obolibrary.org/obo/T4FS_>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select ?s ?p ?o where {graph n4bikg:core {?s ?p ?o}}
The IRI https://nfdi4bioimage.de/rdf/node represents the consortium itself, consequently it’s type is nfdicore:NFDI_0001039.
Furthermore, it provides a description, its date of inception, website, acronyms and wikidata IDs of the core, owl, and services graphs. Note the zenodo graph is missing here, we’ll add
it in a forthcoming version.
The n4bikg:owl graph
You may ask what nfdicore:0001039 stands for? The answer is provided by the second graph n4bikg:owl containing the two main ontologies
employed in the N4BIKG. Running the following query:
PREFIX : <https://nfdi.fiz-karlsruhe.de/ontology#>
PREFIX n4bikg: <https://kg.nfdi4bioimage.de/n4bikg/>
PREFIX nfdicore: <https://nfdi.fiz-karlsruhe.de/ontology/>
PREFIX ome_core: <https://ld.openmicroscopy.org/core/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX t4fs: <http://purl.obolibrary.org/obo/T4FS_>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select *
where {
graph n4bikg:owl {
nfdicore:NFDI_0001039 a ?cls;
rdfs:label ?label ;
rdfs:comment ?comment .
}
}
| cls | label | comment |
|---|---|---|
| http://www.w3.org/2002/07/owl#Class | nfdi consortium | An NFDI Consortium is a collaborative organizational entity within the German National Research Data Infrastructure (NFDI) initiative, responsible for the development, management, and coordination of research data infrastructures in specific scientific domains. |
| http://www.w3.org/2002/07/owl#Class | nfdi consortium | The National Research Data Infrastructure (NFDI) is intended to sytematically manage scientifich and research data, provide long-term data storage, backup and accessibility, and network the data (inter-)nationally. The German Research Foundation (DFG) and the Joint Science Conference (GWK) are working together to establish and fund the NFDI, on the basis of the administrative agreement concluded by the Federal Government and the federal states on November 26, 2018. The NFDI is designed as a nationwide network, which is built gradually and cooperatively by the respective communities. The goal is to provide a reliable and sustainable service portfolio that covers generic and disciplinary requirements of research data management in Germany. The NFDI is developed step by step and science-oriented with services available to researchers from vaious disciplines, institutions, and federal states. |
returns the sought information: nfdi:NFDI_0001039= is a class labeled “nfdi consortium” and explained in further detail in two comments.
The n4bikg:services graph
The n4bikg:services graph is again so small that we can list it in tabular form:
PREFIX : <https://nfdi.fiz-karlsruhe.de/ontology#>
PREFIX n4bikg: <https://kg.nfdi4bioimage.de/n4bikg/>
PREFIX nfdicore: <https://nfdi.fiz-karlsruhe.de/ontology/>
PREFIX ome_core: <https://ld.openmicroscopy.org/core/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX t4fs: <http://purl.obolibrary.org/obo/T4FS_>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select * where {
graph n4bikg:services {?s ?p ?o}
}
The top three triples states that the OMERO instance
https://omero-nfdi.uni-muenster.de is operated by the NFDI4BIOIMAGE consortium,
that it is a NFDI resource (nfdicore:NFDI_0000001) and, more concretely, a
database (t4fs:0000397). Its SPARQL endpoint
serving the virtual KG runs at https://omero-nfdi.uni-muenster.de/vkg/sparql.
The fourth triple states that the database has a SPARQL endpoint. The remaining triples make statements about another OMERO instance https://evolomero.evolbio.mpg.de and its respective SPARQL endpoint at https://evolomero.evolbio.mpg.de/vkg. What’s missing here is a statement about this OMERO being a NFDI resource and being operated by NFDI4BIOIMAGE.
The n4bikg:n4bi_zenodo_community graph
Finally, the n4bikg:n4bi_zenodo_community graph connects research artifacts
deposited in zenodo (presentations, manuscripts, posters, datasets, software
among others) to the N4BIKG. This one is currently the largest graph, comprising
triples.
Virtualization
The services graph lists public OMERO instances maintained by the
NFDI4BIOIMAGE consortium as well as their SPARQL endpoints. These SPARQL
endpoints expose metadata about OMERO records (e.g. Projects, Datasets, Images,
ROIs, …). This is achieved by mapping the OMERO database backend (PostgreSQL)
to RDF using the R2RML mapping language (Das, Sundara, and Cyganiak 2012). In our case, we employ
the Ontop-VKG toolsuite (Calvanese and others 2015).
R2RML mapping, also referred to as “knowledge graph virtualization” exposes
relational database content and schemas as RDF in a synchronized manner, i.e.
the (virtual) knowledge graph maps the current state of the database, changes in
the latter are immediately reflected by the former. It comes with the cost of
slower query response time as compared to “static” triplestores, where the RDF
data is stored locally instead of being generated on the fly. The SPARQL
endpoint in Ontop-VKG does not support federated queries As a side effect, we
effectively work around a shortcoming of the ontop-vkg SPARQL, namely that it
does not support federated queries (SELECT). Linking the virtual KGs in our
core KG and having a SPARQL endpoint and frontend on the core KG which does
support federated queries (including against the virtual KGs) provides an
elegant solution.
Knowledge Graph Registries
Both the OMERO instances and their respective SPARQL endpoint URLs are subjects in the N4BI core KG and linked to the subject representing the NFDI4BIOIMAGE consortium through an appropriate property. In other words, the core KG acts as a registry of image data repositories and their corresponding KGs. Moreover, the N4BIKG is listed in the NFDI Knowledge Graph Registry https://kgi.services.base4nfdi.de/kg_registry/, a SPARQL service connecting the KGs implemented by the various NFDI consortia.
Applications
The following SPARQL query illustrates the power and usefulness
of the setup: I first query for databases and their SPARQL endpoint URLs registered
on the core KG. Then, I run a federated query (SERVICE) on each located endpoint, collecting
information about the images contained in the underlying OMERO instance, filtering out
“boring” properties such as rdf:type.
PREFIX : <https://nfdi.fiz-karlsruhe.de/ontology#>
PREFIX n4bikg: <https://kg.nfdi4bioimage.de/n4bikg/>
PREFIX nfdicore: <https://nfdi.fiz-karlsruhe.de/ontology/>
PREFIX ome_core: <https://ld.openmicroscopy.org/core/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX t4fs: <http://purl.obolibrary.org/obo/T4FS_>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?srv ?img ?img_property ?img_property_value
WHERE {
?srv a <http://purl.obolibrary.org/obo/T4FS_0000397>; # Get all databases operated by
nfdicore:NFDI_0000201 ?kg . # Get the database's sparql endpoint.
service ?kg { # Connect to omero sparql endpoint
?img a ome_core:Image; # Get all images...
?img_property ?img_property_value. # ... and image properties ...
filter(?img_property != rdf:type) # ... but skip boring type properties.
}
}
LIMIT 5
Now imagine all the opportunities this opens up: Collecting summary information from all OMERO instances in NFDI4BIOIMAGE, identify all keywords and tags employed in these instances, identify similar projects and datasets, cross linking (federating) to other databases and their SPARQL endpoints to gather further annotations on the images and samples not listed in the OMERO metadata. Make images and annotations findable from a single point of entry (the core KG). So exciting …
As a second example, the query below retrieves information about proteins which
are referenced in the image annotations from the UniProtKB protein knowledge
base. The linking property is the assay name in the OMERO image annotation,
which contains a gene name. UniProtKB is queried for proteins encoded by that
gene. Note that the OMERO KG is queries in the service clause wrapped inside
the query against UniProtKB. Note that the OMERO KG is queries in the service
clause wrapped inside the query against UniProtKB.
PREFIX : <https://nfdi.fiz-karlsruhe.de/ontology#>
PREFIX n4bikg: <https://kg.nfdi4bioimage.de/n4bikg/>
PREFIX nfdicore: <https://nfdi.fiz-karlsruhe.de/ontology/>
PREFIX ome_core: <https://ld.openmicroscopy.org/core/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX t4fs: <http://purl.obolibrary.org/obo/T4FS_>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX taxon: <http://purl.uniprot.org/taxonomy/>
PREFIX up: <http://purl.uniprot.org/core/>
PREFIX omens: <http://www.openmicroscopy.org/ns/default/>
SELECT ?img ?protein ?annotation_text
WHERE
{
?protein a up:Protein .
?protein up:encodedBy ?gene .
?gene skos:prefLabel ?name .
?protein up:annotation ?annotation .
?annotation rdfs:comment ?annotation_text
service <https://omero-nfdi.uni-muenster.de/vkg/sparql> {
?img a ome_core:Image;
omens:Assay ?assay .
bind(ucase(?assay) as ?img_tag)
}
filter(contains(?name, ?img_tag))
}
limit 10
Calvanese, Diego, and others. 2015. “Ontop: Answering SPARQL Queries over Relational Databases.” Semantic Web Journal, 1278–2490. http://www.semantic-web-journal.net/content/ontop-answering-sparql-queries-over-relational-databases-1.
Das, Souripriya, Seema Sundara, and Richard Cyganiak, eds. 2012. “R2RML: RDB to RDF Mapping Language.” https://www.w3.org/TR/r2rml.