<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Multiprocessing on SciComp Blog</title>
    <link>https://mpievolbio-scicomp.pages.gwdg.de/blog/tags/multiprocessing/</link>
    <description>Recent content in Multiprocessing on SciComp Blog</description>
    <generator>Hugo</generator>
    <language>en</language>
    <lastBuildDate>Fri, 22 Jan 2021 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://mpievolbio-scicomp.pages.gwdg.de/blog/tags/multiprocessing/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Dask and Jupyter</title>
      <link>https://mpievolbio-scicomp.pages.gwdg.de/blog/post/2021-01-22_dask/</link>
      <pubDate>Fri, 22 Jan 2021 00:00:00 +0000</pubDate>
      <guid>https://mpievolbio-scicomp.pages.gwdg.de/blog/post/2021-01-22_dask/</guid>
      <description>&lt;h1 id=&#34;parallel-python-with-dask-and-jupyter&#34;&gt;Parallel python with dask and jupyter&lt;/h1&gt;&#xA;&lt;p&gt;The &lt;a href=&#34;https://dask.org&#34;&gt;dask&lt;/a&gt; framework provides an incredibly useful&#xA;environment for parallel execution of python code in interactive settings (e.g.&#xA;jupyter) or batch mode. Its key features  are (from what I&amp;rsquo;ve seen so far):&lt;/p&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;Representation of threading, multiprocessing, and distributed computing with&#xA;one unified API and CLI.&lt;/li&gt;&#xA;&lt;li&gt;Abstraction of HPC schedulers (PBS, Moab, SLURM, &amp;hellip;)&lt;/li&gt;&#xA;&lt;li&gt;Data structures for distributed computing with pandas and numpy syntax&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h2 id=&#34;dask-jobqueue&#34;&gt;Dask-jobqueue&lt;/h2&gt;&#xA;&lt;p&gt;The package &lt;code&gt;dask_jobqueue&lt;/code&gt; seems to me to be the most userfriendly if it comes&#xA;to parallelization on HPC clusters with a  scheduling system such as SLURM.&#xA;For now, the most interesting for me is to determine how &lt;code&gt;dask&lt;/code&gt; maps the&#xA;parameters given to the &lt;code&gt;cluster&lt;/code&gt; API and the &lt;code&gt;cluster.scale&lt;/code&gt; method to the&#xA;parameters usually given in a SLURM batch job script and the &lt;code&gt;mpirun&lt;/code&gt;&#xA;parameters.&lt;/p&gt;&#xA;&lt;p&gt;But first things first. A typical notebook using &lt;code&gt;dask&lt;/code&gt; starts of with the&#xA;configuration of the &lt;code&gt;cluster&lt;/code&gt;:&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;from&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;dask_jobqueue&lt;/span&gt; &lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;SLURMCluster&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;cluster&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;SLURMCluster&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;   &lt;span class=&#34;n&#34;&gt;cores&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;12&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;            &lt;span class=&#34;c1&#34;&gt;# Number of cores per job =MPI= cpus_per_task&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;   &lt;span class=&#34;n&#34;&gt;memory&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;16GB&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;       &lt;span class=&#34;c1&#34;&gt;# Total Memory per Job (--mem)&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;   &lt;span class=&#34;n&#34;&gt;walltime&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;00:10:00&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;# Expected time to complete a job.&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The basic unit of reference, to against which all &lt;code&gt;dask&lt;/code&gt; parameters seem to be&#xA;defined is the &lt;code&gt;job&lt;/code&gt;. Quoting from the&#xA;&lt;a href=&#34;https://jobqueue.dask.org/en/latest/howitworks.html&#34;&gt;dask-jobqueue&lt;/a&gt;&#xA;documentation,&#xA;&amp;ldquo;A &lt;code&gt;job&lt;/code&gt; is a [set of] resource[s] submitted to, and managed by, the&#xA;job queueing system. One &lt;code&gt;job&lt;/code&gt; may include one or more &lt;code&gt;workers&lt;/code&gt;.&amp;rdquo; And: &amp;ldquo;A &lt;code&gt;worker&lt;/code&gt; is&#xA;a python object that represents a node in a dask &lt;code&gt;cluster&lt;/code&gt;&amp;rdquo; The aim of this post&#xA;is to provide more substantial explanations of these terms and understand how&#xA;they relate precisely to the above mentioned SLURM and mpi parameters.&lt;/p&gt;&#xA;&lt;p&gt;In above snippet, jobs are not defined. Typically, one finds below the cluster&#xA;initialization, a call to &lt;code&gt;cluster.scale()&lt;/code&gt;:&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;cluster&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;scale&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;jobs&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;2&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Dask will then request the appropriate resources through SLURM.&lt;/p&gt;&#xA;&lt;p&gt;In the above example, the resulting job script is&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;print&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;cluster&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;job_script&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;())&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;#!/usr/bin/env bash&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;#SBATCH -J dask-worker&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;#SBATCH -n 1&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;#SBATCH --cpus-per-task=12&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;#SBATCH --mem=15G&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;#SBATCH -t 00:10:00&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;o&#34;&gt;/&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;home&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;/&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;grotec&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;/&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;miniconda3&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;/&lt;/span&gt;&lt;span class=&#34;nb&#34;&gt;bin&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;/&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;python&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;-&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;m&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;distributed&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;cli&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;dask_worker&lt;/span&gt; \&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;tcp&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;//&lt;/span&gt;&lt;span class=&#34;mf&#34;&gt;172.16.3.151&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;51525&lt;/span&gt; \&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;o&#34;&gt;--&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;nthreads&lt;/span&gt; &lt;span class=&#34;mi&#34;&gt;3&lt;/span&gt; \&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;o&#34;&gt;--&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;nprocs&lt;/span&gt; &lt;span class=&#34;mi&#34;&gt;4&lt;/span&gt; \&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;o&#34;&gt;--&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;memory&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;-&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;limit&lt;/span&gt; &lt;span class=&#34;mf&#34;&gt;4.00&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;GB&lt;/span&gt; \&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;o&#34;&gt;--&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;name&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;dummy&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;-&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;name&lt;/span&gt; \&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;o&#34;&gt;--&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;nanny&lt;/span&gt; \&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;o&#34;&gt;--&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;death&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;-&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;timeout&lt;/span&gt; &lt;span class=&#34;mi&#34;&gt;60&lt;/span&gt; \&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;o&#34;&gt;--&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;protocol&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;tcp&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;//&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Let&amp;rsquo;s try to break this down and understand how and why the SLURM and MPI&#xA;parameters have been chosen this way:&lt;/p&gt;&#xA;&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;#SBATCH -n 1&#xA;&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The &lt;code&gt;-n&lt;/code&gt; option in SLURM specifies the number of tasks (long form is&#xA;&lt;code&gt;--ntasks&lt;/code&gt;). Since each dask &lt;code&gt;job&lt;/code&gt; is represented by one SLURM job script, I conclude&#xA;with a certain reservation that one dask &lt;code&gt;job&lt;/code&gt; is also represented by one SLURM&#xA;task and that &lt;code&gt;-n 1&lt;/code&gt; fixates the equivalence between task and job. Dask will&#xA;submit &lt;code&gt;jobs&lt;/code&gt; jobs to the scheduler rather than one big job with &lt;code&gt;ntasks&lt;/code&gt; set to&#xA;the number of requested (dask) jobs.&lt;/p&gt;&#xA;&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;#SBATCH --cpus-per-task=12&#xA;&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;For each task (job), SLURM will allocate 12 CPUs. This corresponds to the&#xA;&lt;code&gt;cores=12&lt;/code&gt; argument in the initialization of the &lt;code&gt;SLURMCluster&lt;/code&gt;. So one core in&#xA;a &lt;code&gt;dask&lt;/code&gt; &lt;code&gt;cluster&lt;/code&gt; is one CPU per task in SLURM.&lt;/p&gt;&#xA;&lt;p&gt;Let&amp;rsquo;s now look at the command line arguments to &lt;code&gt;python -m distributed.cli.dask_worker&lt;/code&gt;:&lt;/p&gt;&#xA;&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;--nthreads 3&#xA;--nprocs 4&#xA;&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;A physical CPU in  modern computer hardware consists of &lt;code&gt;ncore&lt;/code&gt; cores and each&#xA;core can hold a number of (hyper)threads (see &lt;a href=&#34;https://login.scg.stanford.edu/faqs/cores/#computer-architecture&#34;&gt;this FAQ at SCG Stanford Genomics&#xA;Cluster&lt;/a&gt;). This structure is also represented in&#xA;SLURM and Dask. Dask maps one &lt;code&gt;process&lt;/code&gt; (the elementary unit of computation)&#xA;to one core&#xA;number of processes can be given as an argument to the &lt;code&gt;SLURMCluster&lt;/code&gt;&#xA;constructor but by default, it is set as &lt;code&gt;processes ~= sqrt(cores)&lt;/code&gt;.&#xA;Subsequently, the number of threads is set as &lt;code&gt;nthreads = cores / processes&lt;/code&gt;&#xA;such that the number of processes and the number of threads per core is&#xA;approximately equal.&lt;/p&gt;&#xA;&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;--nanny&#xA;&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The &lt;code&gt;--nanny&lt;/code&gt; (activated by default) option tells dask to start up an extra&#xA;process (the &amp;ldquo;nanny process&amp;rdquo;) to monitor the actual worker processes.&lt;/p&gt;&#xA;&lt;p&gt;Another option is to specify the resources to be requested directly:&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;cluster&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;scale&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;cores&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;48&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;   &lt;span class=&#34;c1&#34;&gt;# Here `cores` are the total number of cores to be&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;allocated&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;or&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;cluster&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;scale&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;memory&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;200 GB&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;</description>
    </item>
  </channel>
</rss>
