python

Numba and (vs.) SLURM

Posted on October 18, 2023 |

Introduction The python library numba provides (among others) just-in-time (jit) compilation for python. Python code can gain tremendous speed-ups without the need to rewrite a single line of code. Simply adding a decorator to a function such as illustrated in this example: import numba @njit def accelerate_me(): ... can lead to run times comparable to C or Fortran code. Let's make a concrete example, where we add random numbers in a for loop (not advised, but used here to demonstrate numba). [Read More]

SLURM, HPC, numba, python, scaling, performance

Conversion of multiple sequence alignment file formats

Posted on December 23, 2022 |

Working on a multiple sequence alignment project, I wanted to calculate a distance matrix from a MSA dataset generated by progressiveMauve. The MSA file is in xmfa format. I could not find a tool that calculates a distance matrix directly from the xmfa file, so I searched for conversion utilities to other MSA formats, such as phylip, maf, or msa. This biostars forum entry always came out at top but the links provided there are dead. [Read More]

python progressiveMauve Mauve phylip xmfa file format

Querying metadata from omero in python

Posted on December 23, 2022 | Carsten

A colleague needed a table listing image ID, image filename, and number or ROIs for all images in a project. Here’s the python script I came up with: Query Metadata from omero Imports import omero from omero.gateway import BlitzGateway import pandas as pd import getpass Connect to omero conn = BlitzGateway('XXXXX', getpass.getpass(), host='omero.server.org') conn.connect() ········ True Retrieve datasets This sets up an iterator over the needed datasets but does not load any data into memory (yet). [Read More]

omero metadata python

Programming Courses at MPI Evolutionary Biology

Posted on October 2, 2021 | Carsten Fortmann-Grote

As every year, some PostDocs and Staff Scientists offer courses on computing, programming, data analysis and visualization and related topics. Requirements: Some courses require that participants already have certain levels of experience and knowledge. Before signing up, please assess for yourself if you feel comfortable with these requirements. If in doubt, please contact the course responsible. During the course, there will not be enough time to bring everybody up to the expected level before starting the course program. [Read More]

courses workshops teaching programming software development git python R unix command line bash terminal SLURM HPC

Dask and Jupyter

Posted on January 22, 2021 | Carsten

Parallel python with dask and jupyter The dask framework provides an incredibly useful environment for parallel execution of python code in interactive settings (e.g. jupyter) or batch mode. Its key features are (from what I’ve seen so far): Representation of threading, multiprocessing, and distributed computing with one unified API and CLI. Abstraction of HPC schedulers (PBS, Moab, SLURM, …) Data structures for distributed computing with pandas and numpy syntax Dask-jobqueue The package dask_jobqueue seems to me to be the most userfriendly if it comes to parallelization on HPC clusters with a scheduling system such as SLURM. [Read More]

parallel multiprocessing python dask jupyter scheduler slurm mpi

Research Software Development Workshop 2020

Posted on December 14, 2020 | Carsten

On Dec. 10 & 11, Nikoletta Glynatsi and I ran our first Workshop on “Research Software Development”. Far from appraising myself, but judging from the feedback, it was a big success. For two days, we taught best practises in writing software (exemplified with python), using git for version control, collaborating on gitlab projects and employing gitlab’s built-in continuous integration tools to run automated tests and build a reference manual. All material from the workshop, including all presentations and code examples are available under terms of the MIT License from this gitlab repository: https://gitlab. [Read More]

git python coding software tests CI gitlab

Research-Software Development Workshop

Posted on December 9, 2020 | Carsten

Our (Nikoleta Glynatsi and myself) Research Software Development workshop starts tomorrow. All code and presentations are available here: https://gitlab.gwdg.de/glynatsi/rsd-workshop.

git python version control continuous integration gitlab command line