Jupyter lab tutorial

On May 5 & 6 2021, I took part in the Workshop “Kompetenz Forschungsdatenmanagement” organized by the Max Planck Digital Library. Day 2 featured a full session on “Reproducible Science with Jupyter” with a presentation by Hans Fangohr (slides available here) followed by an interactive hands-on tutorial. In part 1 of the tutorial, I step through a data analysis workflow based on the Johns-Hopkins University CoViD19 dataset from github. Part 2 and 3 are about Bayesian Inference of SIR model parameters, kindly provided by Johannes Zierenberg from MPI Dynamics and Selforganization. [Read More]

Dask and Jupyter

Parallel python with dask and jupyter The dask framework provides an incredibly useful environment for parallel execution of python code in interactive settings (e.g. jupyter) or batch mode. Its key features are (from what I’ve seen so far): Representation of threading, multiprocessing, and distributed computing with one unified API and CLI. Abstraction of HPC schedulers (PBS, Moab, SLURM, …) Data structures for distributed computing with pandas and numpy syntax Dask-jobqueue The package dask_jobqueue seems to me to be the most userfriendly if it comes to parallelization on HPC clusters with a scheduling system such as SLURM. [Read More]