SciComp Blog

Jupyter lab tutorial

Posted on May 6, 2021 | Carsten

On May 5 & 6 2021, I took part in the Workshop “Kompetenz Forschungsdatenmanagement” organized by the Max Planck Digital Library. Day 2 featured a full session on “Reproducible Science with Jupyter” with a presentation by Hans Fangohr (slides available here) followed by an interactive hands-on tutorial. In part 1 of the tutorial, I step through a data analysis workflow based on the Johns-Hopkins University CoViD19 dataset from github. Part 2 and 3 are about Bayesian Inference of SIR model parameters, kindly provided by Johannes Zierenberg from MPI Dynamics and Selforganization. [Read More]

jupyter notebook reproducible science fair covid pandas visualization

Dask and Jupyter

Posted on January 22, 2021 | Carsten

Parallel python with dask and jupyter The dask framework provides an incredibly useful environment for parallel execution of python code in interactive settings (e.g. jupyter) or batch mode. Its key features are (from what I’ve seen so far): Representation of threading, multiprocessing, and distributed computing with one unified API and CLI. Abstraction of HPC schedulers (PBS, Moab, SLURM, …) Data structures for distributed computing with pandas and numpy syntax Dask-jobqueue The package dask_jobqueue seems to me to be the most userfriendly if it comes to parallelization on HPC clusters with a scheduling system such as SLURM. [Read More]

parallel multiprocessing python dask jupyter scheduler slurm mpi

Research Software Development Workshop 2020

Posted on December 14, 2020 | Carsten

On Dec. 10 & 11, Nikoletta Glynatsi and I ran our first Workshop on “Research Software Development”. Far from appraising myself, but judging from the feedback, it was a big success. For two days, we taught best practises in writing software (exemplified with python), using git for version control, collaborating on gitlab projects and employing gitlab’s built-in continuous integration tools to run automated tests and build a reference manual. All material from the workshop, including all presentations and code examples are available under terms of the MIT License from this gitlab repository: https://gitlab. [Read More]

git python coding software tests CI gitlab

Running matlab code on HPC with SLURM

Posted on December 14, 2020 | Carsten

Running MATLAB scripts on HPC Today, the question came up how to run MATLAB code on HPC featuring a SLURM scheduler. The syntax for running matlab on the command line is indeed a bit counterintuitive, at least if you are (like me) used to running python or R scripts. Example SLURM script The following snippet is an example for how to submit a matlab script for execution on an HPC Server with the SLURM scheduler: [Read More]

scheduler command line syntax script job submission

CancerSim paper

Posted on December 9, 2020 | Carsten

This one comes a bit late but for the sake of completeness:

Our software paper on 2D stochastic cancer simulations with CancerSim is out: doi:10.21105/joss.02436.

stochastic simulations JOSS

Converting jupyter notebooks with embedded images to pdf.

Posted on December 9, 2020 | Carsten

Inserting images in a jupyter notebook is just drag and drop: This will automagically produce the image link at the drop position. And after executing the cell, the image is rendered So far so good. But ever tried to convert a notebook with embedded images to pdf or html (slides)? My first guess was: Menu -> File -> Export Notebook As -> PDF. However, this immediately runs into Error 500 tracing back to latex not being able to locate the image attachment:5444. [Read More]

attachment images nbconvert pandoc

Research-Software Development Workshop

Posted on December 9, 2020 | Carsten

Our (Nikoleta Glynatsi and myself) Research Software Development workshop starts tomorrow. All code and presentations are available here: https://gitlab.gwdg.de/glynatsi/rsd-workshop.

git python version control continuous integration gitlab command line

Rent-a-Scientist 2020

Posted on November 27, 2020 | Carsten

I just finished my lecture at Heinrich-Heine-Schule Büdelsdorf on “Evolution und Künstliche Intelligenz.” (“Evolution and Artificial Intelligence”). Slides are here.

Omero bulk annotation

Posted on September 3, 2020 | Carsten

Edit May 21 2021 I made a short video presentation on the subject, as a “flash” talk for the OME Community Meeting 2021: https://youtu.be/inGQXtbzoZI . Summary This tutorial gives a step-by-step instruction on annotating containers (Projects, Datasets, Plates, or Screens) and individual or multiple images. In particular, we show how to annotate selected images in batch mode using spreadsheet documents as a convenient method to record metadata during or after an experiment. [Read More]

omero image database annotation FAIR microscopy batch annotation

Using globus-online for tranferring large (>100 GB) datasets over the network

Posted on November 14, 2019 |

Introduction Globus Online is a web service to transfer data over the internet. It is particularly suited for large files in the GB to TB range. The largest dataset transferred with globus so far is a 2.9 PB file from Argonne National Lab link. I just completed a workflow where I used globus to transfer a ~500 GB dataset from the DRACO HPC cluster at the MPCDF to a shared samba drive at MPI Evolutionary Biology. [Read More]

SciComp Blog

Welcome to the blog on scientific computing at the Max-Planck Institute for Evolutionary Biology

Further links (in random order):

Jupyter lab tutorial

Dask and Jupyter

Research Software Development Workshop 2020

Running matlab code on HPC with SLURM

CancerSim paper

Converting jupyter notebooks with embedded images to pdf.

Research-Software Development Workshop

Rent-a-Scientist 2020

Omero bulk annotation

Using globus-online for tranferring large (>100 GB) datasets over the network