Skip to main content
Beta
This lesson is in the beta phase, which means that it is ready for teaching by instructors outside of the original author team.
Lesson Title
introductionCommon problemsOverview and rationaleWhy Python?What is parallel computing?
- Programs are parallelizable if you can identify independent
tasks.
- To make programs scalable, you need to chunk the work.
- Parallel programming often triggers a redesign; we use different
patterns.
- Doing work in parallel does not always give a speed-up.
- It is often non-trivial to understand performance
- Memory is just as important as speed
- Measuring is knowing
- Always profile your code to see which parallelization method works
best
- Vectorized algorithms are both a blessing and a curse.
- Numba can help you speed up code
- If we want the most efficient parallelism on a single machine, we
need to circumvent the GIL.
- If your code releases the GIL, threading will be more efficient than
multiprocessing.
- If your code does not release the GIL, some of your code is still in
Python, and you’re wasting precious compute time!
- We can change the strategy by which a computation is evaluated.
- Nothing is computed until we run
compute()
.
- By using delayed evaluation, Dask knows which jobs can be run in
parallel.
- Call
compute
only once at the end of your program to
get the best results.
- Use abstractions to keep programs manageable
- Actually making code faster is not always straight forward
- Easy one-liners can get you 80% of the way
- Writing clean, modular code often makes it easier to parallelise
later on