Machine learning in Python with scikit-learn

Online

Online

April 22 - 25, 2024

9:00 - 13:00 CEST

Instructors: Sven van der burg, Malte Luken, Johan Hidding

Helpers: Claire Donnelly

Some adblockers block the registration window. If you do not see the registration box below, please check your adblocker settings.

General Information

The eScience Center offers a range of free workshops and training courses, open to all researchers affiliated with Dutch research organizations. We organize workshops covering digital skills needed to put reproducible research into practice. These include online collaboration, reproducible code and good programming practices. We also offer more advanced workshops such as GPU Programming, Parallel Programming and Deep Learning.

This hands-on workshop will provide you with the basics of machine learning using Python.

Machine learning is the field devoted to methods and algorithms that ‘learn’ from data. It can be applied to a vast range of different domains, from linguistics to physics and from medical imaging to history.

This workshop covers the basics of machine learning in a practical and hands-on manner, so that upon completion, you will be able to train your first machine learning models and understand what next steps to take to improve them.

We start with data exploration and prepare the data so that it is suitable for machine learning. Then we learn how to train a model on the data using scikit-learn. We learn how to select the best model from the trained models and how to use different machine learning models (like linear regression, logistic regression, and decision tree models). Finally, we discuss some of the best practices when starting your own machine learning project.

Who: 

The course aims to be accessible without a strong technical background.

This course is for you if:

  • You have basic knowledge of Python programming : defining variables, writing functions, importing modules. Some prior experience with the NumPy, pandas and Matplotlib libraries is recommended but not required.
  • You want to learn how to setup a full machine learning pipeline in Python for various machine learning tasks.
  • You want to get an intuition of basic machine learning concepts, such as train-test data splits, model training and evaluation, different machine learning algorithms, overfitting/underfitting, bias-variance trade-off.

This course is not for you if:

  • You already have experience with machine learning or its concepts, this is really an introduction for people that have never done machine learning or only just started but need more guidance.
  • You want to get a solid mathematical understanding of machine learning theory. This course aims to quickly get participants comfortable applying machine learning in practice, we therefore only cover the basis of theoretical concepts without going into depth.
  • You want to learn about deep learning
  • You want to learn about more advanced data preprocessing, like data cleaning, handling missing values etcetera. We only cover the basics of data preprocessing that are needed to setup a machine learning pipeline.

Also have a look at the syllabus to see what topics we will cover.

If you are uncertain whether this course is for you, please send us an email.

Where: This training will take place online. The instructors will provide you with the information you will need to connect to this meeting.

When: April 22 - 25, 2024, 9:00 - 13:00 CEST.

Requirements: Participants must have access to a computer with a Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) that they have administrative privileges on. They should have a few specific software packages installed (listed below).

Accessibility: We are dedicated to providing a positive and accessible learning environment for all. Please notify the instructors in advance of the workshop if you require any accommodations or if there is anything we can do to make this workshop more accessible to you.

Workshop files: You will find all slides, notebooks, archived collaborative documents, and other relevant files in the files folder of the workshop website repository after the workshop.

Contact: Please email or training@esciencecenter.nl for more information.


Code of Conduct

Participants are expected to follow these guidelines:

Syllabus

Machine learning concepts

The predictive modeling pipeline

Selecting the best model

Machine learning algorithms

Machine learning best practices

Schedule

Day 1

09:00 Welcome and icebreaker
09:15 Machine learning concepts
10:15 Coffee break
10:30 Tabular data exploration
11:30 Coffee break
11:45 First model with scikit-learn
12:45 Wrap-up
13:00 END

Day 2

09:00 Welcome and icebreaker
09:15 Working with numerical data
10:15 Coffee break
10:30 Preprocessing features for numerical features
11:30 Coffee break
11:45 Model evaluation using cross-validation
12:30 Intuions on linear models
12:45 Wrap-up
13:00 END

Day 3

09:00 Welcome and icebreaker
09:15 Handling categorical data
10:15 Coffee break
10:30 Encoding categorical variables
11:15 Intuitions on tree-based models
11:30 Coffee break
10:45 Combining numerical and categorical data
12:45 Wrap-up
13:00 END

Day 4

09:00 Welcome and icebreaker
09:15 Theory on selecting the best model: under & overfitting + learning curves
10:15 Coffee break
10:30 Pointers to advanced topics
11:00 Try out learned skills on US Census dataset
11:30 Coffee break
12:30 Concluding remarks
Q&A
12:45 Wrap-up
13:00 END

All times in the schedule are in the CET timezone.


Setup

To participate in this workshop, you will need access to software as described below. In addition, you will need an up-to-date web browser.

We maintain a list of common issues that occur during installation as a reference for instructors that may be useful on the Configuration Problems and Solutions wiki page.

Software setup

It is important that you setup everything on your laptop before the start of the course. This includes installing a Python environment and downloading the necessary files. Please follow these setup instructions. Send us an email if you encounter any problems.

Install the videoconferencing client

If you haven't used Zoom before, go to the official website to download and install the Zoom client for your computer.

Set up your workspace

Like other Carpentries workshops, you will be learning by "coding along" with the Instructors. To do this, you will need to have both the window for the tool you will be learning about (a terminal, RStudio, your web browser, etc..) and the window for the Zoom video conference client open. In order to see both at once, we recommend using one of the following set up options:

This blog post includes detailed information on how to set up your screen to follow along during the workshop.