Fundamentals of Natural Language Processing (NLP) in Python - Pilot

Netherlands eScience Center

december 02 - 03, 2025

9:30 - 17:00 CET

Instructors: Angel Daza, Carsten Schnober

Helpers: Kody Moodley, Malte Lüken

Some adblockers block the registration window. If you do not see the registration box below, please check your adblocker settings.

General Information

The eScience Center offers a range of workshops and training courses, aimed at PhD candidates and other researchers or research software engineers. We organize workshops covering digital skills needed to put reproducible research into practice. These include online collaboration, reproducible code and good programming practices. We also offer more advanced workshops such as GPU Programming, Parallel Programming, Image Processing and Deep Learning.

This lesson teaches the fundamentals of Natural Language Processing (NLP) in Python. It will equip you with the foundational skills and knowledge needed to carry over text-based research projects. The lesson is designed with researchers in the Humanities and Social Sciences in mind, but is also applicable to other fields of research.

On the first day we will dive into the importance of linguistic principles when dealing with text data, we will also teach basic techniques for text preprocessing and understand the principles behind word embeddings. The second day begins with an introduction to transformers, followed by hands-on work on classification tasks with the BERT model including basic evaluation techniques. In the afternoon, we will cover large language language models, learn to work locally with open source models and understand potential drwbacks and biases when using this technology.

Who: 

The participant should:

  • be familiar with Python
  • be comfortable working in Jupyter

Where: Science Park 402, 1098 XH Amsterdam. Get directions with OpenStreetMap or Google Maps.

When: december 02 - 03, 2025, 9:30 - 17:00 CET.

Requirements: Participants must bring a laptop with a Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) that they have administrative privileges on. They should have a few specific software packages installed (listed below).

Accessibility: We are committed to making this workshop accessible to everybody. The workshop organizers have checked that:

Materials will be provided in advance of the workshop and large-print handouts are available if needed by notifying the organizers in advance. If we can help making learning easier for you (e.g. sign-language interpreters, lactation facilities) please get in touch (using contact details below) and we will attempt to provide them.

Workshop files: You will find all slides, notebooks, archived collaborative documents, and other relevant files in the files folder of the workshop website repository after the workshop.

Contact: Please email or training@esciencecenter.nl for more information.


Code of Conduct

Participants are expected to follow these guidelines:

Syllabus

Introduction

From text to vectors:

BERT and Transformers:

Large Language Models

Schedule

Day 1

09:30 Welcome and icebreaker
09:45 Introduction to NLP
10:30 Coffee break
10:40 Defining NLP Tasks
11:30 Coffee break
11:40 A Primer on Linguistics
12:30 Lunch
13:30 From text to vectors: Preprocessing and NLP Pipelines
14:30 Coffee break
14:40 Word embeddings and Word2Vec
16:00 Tea break
16:15 Training your own Word2Vec
16:45 Wrap-up
17:00 END

Day 2

09:30 Welcome, icebreaker and recap
09:45 Transformers - Introduction and Architecture
10:30 Coffee break
10:40 Understanding and using BERT
11:30 Coffee break
11:40 BERT for Text Classification
12:30 Lunch
13:30 Model Evaluation in NLP
14:30 Coffee break
14:40 Using Large Language Models
16:00 Tea break
16:15 Drawbacks and biases in LLMs
16:45 Post-workshop Survey
17:00 END

All times in the schedule are in the CET timezone.


Setup

To participate in this workshop, you will need access to software as described below. In addition, you will need an up-to-date web browser.

We maintain a list of common issues that occur during installation as a reference for instructors that may be useful on the Configuration Problems and Solutions wiki page.

Software setup

Please follow these setup instructions in preparation for the workshop. This page includes the data sets to be downloaded as well.