Introduction to Parallel Programming with Python

online

Online

January 11 - 13, 2022

9:00 - 13:00 CET

Instructors: Johan Hidding, Djura Smits

Helpers: Cunliang Geng, Francesco Nattino, Ou Ku, Lourens Veen

Some adblockers block the registration window. If you do not see the registration box below, please check your adblocker settings.

General Information

The Digital Skills programme at the Netherlands eScience Center focuses on the foundational digital skills needed to put reproducible research into practice. The workshops we run cover the essentials of version control, online collaboration, reproducible code and good programming practices.

Python is one of most widely used languages to do scientific data analysis, visualization, and even modelling and simulation. The popularity of Python is mainly due to the two pillars of a friendly syntax together with the availability of many high-quality libraries. The flexibility that Python offers comes with a few downsides though: code typically doesn’t perform as fast as lower-level implementations in C/C++ or Fortran, and it is not trivial to parallelize Python code to work efficiently on many-core architectures. This workshop addresses both these issues, with an emphasis on being able to run Python code efficiently (in parallel) on multiple cores.

We’ll start with learning to recognize problems that are suitable for parallel processing, looking at dependency diagrams and kitchen recipes. From then on, the workshop is highly interactive, diving straight into the first parallel programs. Participants will be coding along with the instructor in the style of teaching like Software Carpentry. This workshop teaches the principles of parallel programming in Python using Dask, Numba and Snakemake. More importantly, we try to give insight in how these different methods perform and when they should be used.

Who: 

The course is aimed at graduate students and other researchers.

The participant should be:

  • familiar with basic Python: control flow, functions, numpy
  • comfortable working in Jupyter

Recommended:

  • understand how NumPy and/or Pandas work

Where: This training will take place online. The instructors will provide you with the information you will need to connect to this meeting.

When: January 11 - 13, 2022.

Requirements: Participants must have access to a computer with a Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) that they have administrative privileges on. They should have a few specific software packages installed (listed below).

Accessibility: We are dedicated to providing a positive and accessible learning environment for all. Please notify the instructors in advance of the workshop if you require any accommodations or if there is anything we can do to make this workshop more accessible to you.

Contact: Please email or training@esciencecenter.nl for more information.


Code of Conduct

Participants are expected to follow those guidelines:

Syllabus

Schedule

Day 1

09:00 Welcome and icebreaker
09:15 Introduction
10:00 Break
10:15 Measuring performance
11:00 Parallelization using Dask Arrays
12:00 Coffee break
12:15 Accelerate code using Numba
12:45 Wrap-up
13:00 END

Day 2

09:00 Welcome,icebreaker and recap
09:15 Delayed evaluation with Dask
10:30 Coffee break
10:45 Parallel design patterns with Dask Bags
12:00 Tea break
12:15 Exercise in word counting using Dask Bags
12:45 Wrap-up
13:00 END

Day 3

09:00 Welcome, icebreaker and recap
09:15 Dependency based programming with Snakemake
10:30 Coffee break
10:45 Break-out exercises
12:00 Tea break
12:15 Presentations of group work
12:45 Post-workshop Survey
13:00 END

Setup

To participate in this workshop, you will need access to software as described below. In addition, you will need an up-to-date web browser.

We maintain a list of common issues that occur during installation as a reference for instructors that may be useful on the Configuration Problems and Solutions wiki page.

Install the videoconferencing client

If you haven't used Zoom before, go to the official website to download and install the Zoom client for your computer.

Set up your workspace

Like other Carpentries workshops, you will be learning by "coding along" with the Instructors. To do this, you will need to have both the window for the tool you will be learning about (a terminal, RStudio, your web browser, etc..) and the window for the Zoom video conference client open. In order to see both at once, we recommend using one of the following set up options:

This blog post includes detailed information on how to set up your screen to follow along during the workshop.

Requirements

To follow along with the workshop, you need to prepare an environment. Clone the workshop repository that we prepared:

git clone https://github.com/esciencecenter-digital-skills/parallel-python-workshop.git
cd parallel-python-workshop

You may prepare the environment either in conda or using vanilla Python with poetry.

For most users we recommend that you use conda to install the requirements for the workshop.

conda env create -f environment.yml
conda activate parallel-python
pytest

If the tests pass, you’re all good! Otherwise, please contact us before the workshop.

Poetry

Only follow these instructions if you’re on Linux or Mac and don’t have conda installed. Make sure that you have Python 3.9 installed.

If you’ve never used poetry before, check it out!

curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/install-poetry.py | python -

Then, making sure that you’re still inside the parallel-python-workshop directory:

poetry install
poetry shell
pytest

If the tests pass, you’re all good! Otherwise, please contact us before the workshop.