October 10 - 11, 2022
09:30 - 17:00 CEST
Instructors: Leon Oostrum, Hanno Spreeuw
Helpers: Ole Mussmann, Barbara Vreede, Sven van der Burg
Some adblockers block the registration window. If you do not see the registration box below, please check your adblocker settings.
The eScience Center offers a range of free workshops and training courses, open to all researchers affiliated with Dutch research organizations. We organize workshops covering digital skills needed to put reproducible research into practice. These include online collaboration, reproducible code and good programming practices. We also offer more advanced workshops such as GPU Programming, Parallel Programming and Deep Learning.
Python is one of most widely used languages to do scientific data analysis, visualization, and even modelling and simulation. The popularity of Python is mainly due to the two pillars of a friendly syntax together with the availability of many high-quality libraries. The flexibility that Python offers comes with a few downsides though: code typically doesn’t perform as fast as lower-level implementations in C/C++ or Fortran, and it is not trivial to parallelize Python code to work efficiently on many-core architectures. This workshop addresses both these issues, with an emphasis on being able to run Python code efficiently (in parallel) on multiple cores.
We’ll start with learning to recognize problems that are suitable for parallel processing, looking at dependency diagrams and kitchen recipes. From then on, the workshop is highly interactive, diving straight into the first parallel programs. This workshop teaches the principles of parallel programming in Python using Dask, Numba and Snakemake. More importantly, we try to give insight in how these different methods perform and when they should be used.
The participant should be:
Recommended:
Where: Science Park 402, 1098 XH Amsterdam. Get directions with OpenStreetMap or Google Maps.
When: October 10 - 11, 2022, 09:30 - 17:00 CEST.
Requirements: Participants must bring a laptop with a Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) that they have administrative privileges on. They should have a few specific software packages installed (listed below).
Accessibility: We are committed to making this workshop accessible to everybody. The workshop organizers have checked that:
Materials will be provided in advance of the workshop and large-print handouts are available if needed by notifying the organizers in advance. If we can help making learning easier for you (e.g. sign-language interpreters, lactation facilities) please get in touch (using contact details below) and we will attempt to provide them.
Contact: Please email or training@esciencecenter.nl for more information.
Participants are expected to follow those guidelines:
local Amsterdam time | what |
---|---|
09:30 | Welcome and icebreaker |
09:45 | Introduction |
10:30 | Break |
10:40 | Measuring performance |
11:30 | Break |
11:40 | Parallelization using Dask Arrays |
12:30 | Lunch Break |
13:30 | Accelerate code using Numba |
14:30 | Break |
14:40 | Delayed evaluation with Dask |
15:30 | Break |
15:40 | Threads and Processes in Python |
16:15 | Wrap-up |
16:30 | END |
local Amsterdam time | what |
---|---|
09:30 | Welcome and recap |
09:45 | Parallel design patterns with Dask Bags |
10:30 | Break |
10:40 | Dependency based programming with Snakemake |
11:30 | Break |
11:40 | Big exercise in subgroups |
12:30 | Lunch Break |
13:30 | Continue work on big exercise |
15:30 | Break |
15:40 | 5-min presentations + discussion |
16:15 | Post-workshop Survey |
16:30 | Drinks |
To participate in this workshop, you will need access to software as described below. In addition, you will need an up-to-date web browser.
We maintain a list of common issues that occur during installation as a reference for instructors that may be useful on the Configuration Problems and Solutions wiki page.
To follow along with the workshop, you need to prepare an environment. Clone the workshop repository that we prepared:
git clone https://github.com/esciencecenter-digital-skills/parallel-python-workshop.git
cd parallel-python-workshop
You may prepare the environment either in conda
or using vanilla Python with poetry
.
For most users we recommend that you use conda
to install the requirements for the workshop.
conda env create -f environment.yml
conda activate parallel-python
pytest
If the tests pass, you’re all good! Otherwise, please contact us before the workshop.
Only follow these instructions if you’re on Linux or Mac and don’t have conda
installed. Make sure
that you have Python 3.9 installed.
If you’ve never used poetry
before, check it out!
pip install --user poetry
poetry install
poetry run pytest