Numba and (vs.) SLURM

Introduction The python library numba provides (among others) just-in-time (jit) compilation for python. Python code can gain tremendous speed-ups without the need to rewrite a single line of code. Simply adding a decorator to a function such as illustrated in this example: import numba @njit def accelerate_me(): ... can lead to run times comparable to C or Fortran code. Let's make a concrete example, where we add random numbers in a for loop (not advised, but used here to demonstrate numba). [Read More]

SLURM Array experiments

Background The SLURM array directive provides a simple and intuitive syntax to submit a job with multiple similar tasks to the underlying HPC system. To fully exploit this capability, a few things should be kept in mind. In the following, we will, starting from a simple, serial job, explore how the total run time of our job behaves when the `array` option is applied. job_001: A job with one task running on a single node on a single CPU This is likely the most simple job thinkable: We run the command `hostname` and let the process idle for 5 seconds. [Read More]

Experimenting with the tripalv3 JSON+LD API

Experimenting with the tripalv3 JSON+LD API Introduction In this post I'm experimenting with querying the JSON-LD (LD stands for Linked Data) API of my Tripal3 genome database for Pseudomonas fluorescens SBW25. A little bit of background: Tripal is a content management system (CMS) for genomic data, based on the Drupal CMS framework. It facilitates setting up customized genomic database webservers to publish genome assemblies, annotation tracks, experimental and modeling data for one or several organisms. [Read More]

Conversion of multiple sequence alignment file formats

Working on a multiple sequence alignment project, I wanted to calculate a distance matrix from a MSA dataset generated by progressiveMauve. The MSA file is in xmfa format. I could not find a tool that calculates a distance matrix directly from the xmfa file, so I searched for conversion utilities to other MSA formats, such as phylip, maf, or msa. This biostars forum entry always came out at top but the links provided there are dead. [Read More]

Querying metadata from omero in python

A colleague needed a table listing image ID, image filename, and number or ROIs for all images in a project. Here’s the python script I came up with: Query Metadata from omero Imports import omero from omero.gateway import BlitzGateway import pandas as pd import getpass Connect to omero conn = BlitzGateway('XXXXX', getpass.getpass(), host='') conn.connect() ········ True Retrieve datasets This sets up an iterator over the needed datasets but does not load any data into memory (yet). [Read More]

Example queries against the Pseudomonas fluorescens SBW25 knowledge graph.

Introduction In this post I will present example SPARQL queries against the Pseudomonas fluorescens SBW25 knowledge graph (SBW25KG). The knowlegde graph was derived from the manually created annotation in gff3 format, as explained in a previous post. The queries are run against a local instance of the apache-jena-fuseki triplestore. First, I set the endpoint URL and the maximum number of returned records: %endpoint http://micropop046:3030/plu/ %show 50 Endpoint set to: http://micropop046:3030/plu/Result maximum size: 50 Retrieve 10 CDS’s from the SBW25KG. [Read More]

Turning a genome annotation into a RDF knowledge graph.

Introduction In this article, I will describe the steps taken to generate a RDF (Resource Description Format) datastructure starting from a gff3 formatted genome annotation file. The annotation file in question is the new reference annotation for Pseudomonas fluorescens strain SBW25. Required packages I will make use of the following python packages: gffutils to read the gff3 file into a sqlite database. rdflib to construct the rdf graph. requests to fetch data (e. [Read More]

Importing data to OMERO

Getting your data into the database is one of the most frequent tasks when working with OMERO. There are two different ways to import data into OMERO: Via the desktop app OMERO.insight or via the commandline client that comes with Both software packages can be found on the OMERO download site While the desktop app is easy and intuitive to use, a drawback of using it is that it must remain open while the data is uploaded. [Read More]

Programming Courses at MPI Evolutionary Biology

As every year, some PostDocs and Staff Scientists offer courses on computing, programming, data analysis and visualization and related topics. Requirements: Some courses require that participants already have certain levels of experience and knowledge. Before signing up, please assess for yourself if you feel comfortable with these requirements. If in doubt, please contact the course responsible. During the course, there will not be enough time to bring everybody up to the expected level before starting the course program. [Read More]