Welcome to the blog on scientific computing at the Max-Planck Institute for Evolutionary Biology

In this blog, we summarize the activities in the Scientific Computing Unit. In particular, we cover topics such as

* Tips and Tricks, Dos and Don'ts in scientific computing
* Outlines and summaries of ongoing research and software projects
* Updates and new releases from our software repositories
* Recommended readings, videos, websites

Jupyter lab tutorial

On May 5 & 6 2021, I took part in the Workshop “Kompetenz Forschungsdatenmanagement” organized by the Max Planck Digital Library. Day 2 featured a full session on “Reproducible Science with Jupyter” with a presentation by Hans Fangohr (slides available here) followed by an interactive hands-on tutorial. In part 1 of the tutorial, I step through a data analysis workflow based on the Johns-Hopkins University CoViD19 dataset from github. Part 2 and 3 are about Bayesian Inference of SIR model parameters, kindly provided by Johannes Zierenberg from MPI Dynamics and Selforganization. [Read More]

Dask and Jupyter

Parallel python with dask and jupyter The dask framework provides an incredibly useful environment for parallel execution of python code in interactive settings (e.g. jupyter) or batch mode. Its key features are (from what I’ve seen so far): Representation of threading, multiprocessing, and distributed computing with one unified API and CLI. Abstraction of HPC schedulers (PBS, Moab, SLURM, …) Data structures for distributed computing with pandas and numpy syntax Dask-jobqueue The package dask_jobqueue seems to me to be the most userfriendly if it comes to parallelization on HPC clusters with a scheduling system such as SLURM. [Read More]

Research Software Development Workshop 2020

On Dec. 10 & 11, Nikoletta Glynatsi and I ran our first Workshop on “Research Software Development”. Far from appraising myself, but judging from the feedback, it was a big success. For two days, we taught best practises in writing software (exemplified with python), using git for version control, collaborating on gitlab projects and employing gitlab’s built-in continuous integration tools to run automated tests and build a reference manual. All material from the workshop, including all presentations and code examples are available under terms of the MIT License from this gitlab repository: https://gitlab. [Read More]

Running matlab code on HPC with SLURM

Running MATLAB scripts on HPC Today, the question came up how to run MATLAB code on HPC featuring a SLURM scheduler. The syntax for running matlab on the command line is indeed a bit counterintuitive, at least if you are (like me) used to running python or R scripts. Example SLURM script The following snippet is an example for how to submit a matlab script for execution on an HPC Server with the SLURM scheduler: [Read More]

Converting jupyter notebooks with embedded images to pdf.

Inserting images in a jupyter notebook is just drag and drop: This will automagically produce the image link at the drop position. And after executing the cell, the image is rendered So far so good. But ever tried to convert a notebook with embedded images to pdf or html (slides)? My first guess was: Menu -> File -> Export Notebook As -> PDF. However, this immediately runs into Error 500 tracing back to latex not being able to locate the image attachment:5444. [Read More]

Rent-a-Scientist 2020

I just finished my lecture at Heinrich-Heine-Schule Büdelsdorf on “Evolution und Künstliche Intelligenz.” (“Evolution and Artificial Intelligence”). Slides are here.

Omero bulk annotation

Edit May 21 2021 I made a short video presentation on the subject, as a “flash” talk for the OME Community Meeting 2021: https://youtu.be/inGQXtbzoZI . Summary This tutorial gives a step-by-step instruction on annotating containers (Projects, Datasets, Plates, or Screens) and individual or multiple images. In particular, we show how to annotate selected images in batch mode using spreadsheet documents as a convenient method to record metadata during or after an experiment. [Read More]

Using globus-online for tranferring large (>100 GB) datasets over the network

Introduction Globus Online is a web service to transfer data over the internet. It is particularly suited for large files in the GB to TB range. The largest dataset transferred with globus so far is a 2.9 PB file from Argonne National Lab link. I just completed a workflow where I used globus to transfer a ~500 GB dataset from the DRACO HPC cluster at the MPCDF to a shared samba drive at MPI Evolutionary Biology. [Read More]