Numba and (vs.) SLURM

Introduction The python library numba provides (among others) just-in-time (jit) compilation for python. Python code can gain tremendous speed-ups without the need to rewrite a single line of code. Simply adding a decorator to a function such as illustrated in this example: import numba @njit def accelerate_me(): ... can lead to run times comparable to C or Fortran code. Let's make a concrete example, where we add random numbers in a for loop (not advised, but used here to demonstrate numba). [Read More]

Conversion of multiple sequence alignment file formats

Working on a multiple sequence alignment project, I wanted to calculate a distance matrix from a MSA dataset generated by progressiveMauve. The MSA file is in xmfa format. I could not find a tool that calculates a distance matrix directly from the xmfa file, so I searched for conversion utilities to other MSA formats, such as phylip, maf, or msa. This biostars forum entry always came out at top but the links provided there are dead. [Read More]

Querying metadata from omero in python

A colleague needed a table listing image ID, image filename, and number or ROIs for all images in a project. Here’s the python script I came up with: Query Metadata from omero Imports import omero from omero.gateway import BlitzGateway import pandas as pd import getpass Connect to omero conn = BlitzGateway('XXXXX', getpass.getpass(), host='omero.server.org') conn.connect() ········ True Retrieve datasets This sets up an iterator over the needed datasets but does not load any data into memory (yet). [Read More]

Programming Courses at MPI Evolutionary Biology

As every year, some PostDocs and Staff Scientists offer courses on computing, programming, data analysis and visualization and related topics. Requirements: Some courses require that participants already have certain levels of experience and knowledge. Before signing up, please assess for yourself if you feel comfortable with these requirements. If in doubt, please contact the course responsible. During the course, there will not be enough time to bring everybody up to the expected level before starting the course program. [Read More]

Dask and Jupyter

Parallel python with dask and jupyter The dask framework provides an incredibly useful environment for parallel execution of python code in interactive settings (e.g. jupyter) or batch mode. Its key features are (from what I’ve seen so far): Representation of threading, multiprocessing, and distributed computing with one unified API and CLI. Abstraction of HPC schedulers (PBS, Moab, SLURM, …) Data structures for distributed computing with pandas and numpy syntax Dask-jobqueue The package dask_jobqueue seems to me to be the most userfriendly if it comes to parallelization on HPC clusters with a scheduling system such as SLURM. [Read More]

Research Software Development Workshop 2020

On Dec. 10 & 11, Nikoletta Glynatsi and I ran our first Workshop on “Research Software Development”. Far from appraising myself, but judging from the feedback, it was a big success. For two days, we taught best practises in writing software (exemplified with python), using git for version control, collaborating on gitlab projects and employing gitlab’s built-in continuous integration tools to run automated tests and build a reference manual. All material from the workshop, including all presentations and code examples are available under terms of the MIT License from this gitlab repository: https://gitlab. [Read More]