Posts

Evolution und KI

Posted on January 31, 2024 |

Am 30.1.2024 gab ich im Rahmen der Veranstaltungsreihe "MPI Winter Talks" einen Vortrag zum Thema "Künstliche Intelligenz und Evolution". Die Vortragsfolien sind auf zenodo unter https://dx.doi.org/10.5281/zenodo.10599178 erhältlich.

Numba and (vs.) SLURM

Posted on October 18, 2023 |

Introduction The python library numba provides (among others) just-in-time (jit) compilation for python. Python code can gain tremendous speed-ups without the need to rewrite a single line of code. Simply adding a decorator to a function such as illustrated in this example: import numba @njit def accelerate_me(): ... can lead to run times comparable to C or Fortran code. Let's make a concrete example, where we add random numbers in a for loop (not advised, but used here to demonstrate numba). [Read More]

SLURM, HPC, numba, python, scaling, performance

SLURM Array experiments

Posted on October 18, 2023 |

Background The SLURM array directive provides a simple and intuitive syntax to submit a job with multiple similar tasks to the underlying HPC system. To fully exploit this capability, a few things should be kept in mind. In the following, we will, starting from a simple, serial job, explore how the total run time of our job behaves when the `array` option is applied. job_001: A job with one task running on a single node on a single CPU This is likely the most simple job thinkable: We run the command `hostname` and let the process idle for 5 seconds. [Read More]

SLURM, array, mem-per-cpu

Experimenting with the tripalv3 JSON+LD API

Posted on December 30, 2022 |

Experimenting with the tripalv3 JSON+LD API Introduction In this post I'm experimenting with querying the JSON-LD (LD stands for Linked Data) API of my Tripal3 genome database for Pseudomonas fluorescens SBW25. A little bit of background: Tripal is a content management system (CMS) for genomic data, based on the Drupal CMS framework. It facilitates setting up customized genomic database webservers to publish genome assemblies, annotation tracks, experimental and modeling data for one or several organisms. [Read More]

SPARQL, Linked Data, RDF, Pseudomonas fluorescens SBW25, Knowledge Graph, triplestore, genome database

Conversion of multiple sequence alignment file formats

Posted on December 23, 2022 |

Working on a multiple sequence alignment project, I wanted to calculate a distance matrix from a MSA dataset generated by progressiveMauve. The MSA file is in xmfa format. I could not find a tool that calculates a distance matrix directly from the xmfa file, so I searched for conversion utilities to other MSA formats, such as phylip, maf, or msa. This biostars forum entry always came out at top but the links provided there are dead. [Read More]

python progressiveMauve Mauve phylip xmfa file format

Querying metadata from omero in python

Posted on December 23, 2022 | Carsten

A colleague needed a table listing image ID, image filename, and number or ROIs for all images in a project. Here’s the python script I came up with: Query Metadata from omero Imports import omero from omero.gateway import BlitzGateway import pandas as pd import getpass Connect to omero conn = BlitzGateway('XXXXX', getpass.getpass(), host='omero.server.org') conn.connect() ········ True Retrieve datasets This sets up an iterator over the needed datasets but does not load any data into memory (yet). [Read More]

omero metadata python

Example queries against the Pseudomonas fluorescens SBW25 knowledge graph.

Posted on March 31, 2022 | Carsten

Introduction In this post I will present example SPARQL queries against the Pseudomonas fluorescens SBW25 knowledge graph (SBW25KG). The knowlegde graph was derived from the manually created annotation in gff3 format, as explained in a previous post. The queries are run against a local instance of the apache-jena-fuseki triplestore. First, I set the endpoint URL and the maximum number of returned records: %endpoint http://micropop046:3030/plu/ %show 50 Endpoint set to: http://micropop046:3030/plu/Result maximum size: 50 Retrieve 10 CDS’s from the SBW25KG. [Read More]

SPARQL RDF Linked Data Semantic Web Knowledge Graph queries UniProt Pseudomonas fluorescens SBW25

Turning a genome annotation into a RDF knowledge graph.

Posted on March 31, 2022 | Carsten

Introduction In this article, I will describe the steps taken to generate a RDF (Resource Description Format) datastructure starting from a gff3 formatted genome annotation file. The annotation file in question is the new reference annotation for Pseudomonas fluorescens strain SBW25. Required packages I will make use of the following python packages: gffutils to read the gff3 file into a sqlite database. rdflib to construct the rdf graph. requests to fetch data (e. [Read More]

RDF Linked Data Semantic Web Knowledge Graph genome annotation Pseudomonas fluorescens SBW25 gff gff3

Importing data to OMERO

Posted on October 25, 2021 |

Getting your data into the database is one of the most frequent tasks when working with OMERO. There are two different ways to import data into OMERO: Via the desktop app OMERO.insight or via the commandline client that comes with OMERO.py. Both software packages can be found on the OMERO download site https://www.openmicroscopy.org/omero/downloads/. While the desktop app is easy and intuitive to use, a drawback of using it is that it must remain open while the data is uploaded. [Read More]

omero, images, data management, command line, insight

Programming Courses at MPI Evolutionary Biology

Posted on October 2, 2021 | Carsten Fortmann-Grote

As every year, some PostDocs and Staff Scientists offer courses on computing, programming, data analysis and visualization and related topics. Requirements: Some courses require that participants already have certain levels of experience and knowledge. Before signing up, please assess for yourself if you feel comfortable with these requirements. If in doubt, please contact the course responsible. During the course, there will not be enough time to bring everybody up to the expected level before starting the course program. [Read More]

courses workshops teaching programming software development git python R unix command line bash terminal SLURM HPC