Machine learning in Python with scikit-learn (ODISSEI summer school)

Odissei

juni 18 - 19, 2024

9:00 - 17:00 CEST

Instructors: Sven van der Burg, Flavio Hafner, Malte Luken, Carsten Schnober

General Information

The eScience Center offers a range of workshops and training courses, aimed at PhD candidates and other researchers or research software engineers. We organize workshops covering digital skills needed to put reproducible research into practice. These include online collaboration, reproducible code and good programming practices. We also offer more advanced workshops such as GPU Programming, Parallel Programming, Image Processing and Deep Learning.

This hands-on workshop will provide you with the basics of machine learning using Python.

Machine learning is the field devoted to methods and algorithms that ‘learn’ from data. It can be applied to a vast range of different domains, from linguistics to physics and from medical imaging to history.

This workshop covers the basics of machine learning in a practical and hands-on manner, so that upon completion, you will be able to train your first machine learning models and understand what next steps to take to improve them.

We start with data exploration and prepare the data so that it is suitable for machine learning. Then we learn how to train a model on the data using scikit-learn. We learn how to select the best model from the trained models and how to use different machine learning models (like linear regression, logistic regression, and decision tree models). Finally, we discuss some of the best practices when starting your own machine learning project.

Where: Polak Building - room 3.09 - Rotterdam. Get directions with OpenStreetMap or Google Maps.

When: juni 18 - 19, 2024, 9:00 - 17:00 CEST.

Requirements: Participants must bring a laptop with a Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) that they have administrative privileges on. They should have a few specific software packages installed (listed below).

Contact: Please email or training@esciencecenter.nl for more information.


Code of Conduct

Participants are expected to follow these guidelines:

Syllabus

Machine learning concepts

The predictive modeling pipeline

Machine learning algorithms

Machine learning best practices

Applying machine learning on LISS dataset:

Schedule

Day 1

local time what
09:00 Welcome and icebreaker
09:15 Introduction to machine learning
10:00 Break
10:10 Tabular data exploration
11:00 Break
11:10 First model with scikit-learn
12:00 Lunch Break
13:00 Fitting a scikit-learn model on numerical data
14:00 Working with numerical data
14:20 Break
14:30 Intuition on linear models
15:00 Handling categorical data
15:50 Break
16:00 Guest lecture
17:00 END

Day 2

local time what
09:00 Welcome and recap
09:15 Fertility prediction assignment
10:00 Break
10:10 Fertility prediction assignment
11:00 Break
11:10 Fertility prediction assignment
12:00 Lunch Break
13:00 Machine learning best practices and next steps
14:00 Fertility prediction assignment
14:20 Break
14:30 Hand in first solution to benchmark
Q&A
15:30 Wrap-up & Post-workshop Survey
15:50 Break
16:00 Guest lecture
17:00 END

All times in the schedule are in the CEST timezone.


Setup

To participate in this workshop, you will need access to software as described below. In addition, you will need an up-to-date web browser.

We maintain a list of common issues that occur during installation as a reference for instructors that may be useful on the Configuration Problems and Solutions wiki page.

Software setup

It is important that you setup everything on your laptop before the start of the course. This includes installing a Python environment and downloading the necessary files. Please follow these setup instructions. Send us an email if you encounter any problems.