december 02 - 03, 2025
9:30 - 17:00 CET
Instructors: Angel Daza, Carsten Schnober
Helpers: Kody Moodley, Malte Lüken
Some adblockers block the registration window. If you do not see the registration box below, please check your adblocker settings.
The eScience Center offers a range of workshops and training courses, aimed at PhD candidates and other researchers or research software engineers. We organize workshops covering digital skills needed to put reproducible research into practice. These include online collaboration, reproducible code and good programming practices. We also offer more advanced workshops such as GPU Programming, Parallel Programming, Image Processing and Deep Learning.
This lesson teaches the fundamentals of Natural Language Processing (NLP) in Python. It will equip you with the foundational skills and knowledge needed to carry over text-based research projects. The lesson is designed with researchers in the Humanities and Social Sciences in mind, but is also applicable to other fields of research.
On the first day we will dive into the importance of linguistic principles when dealing with text data, we will also teach basic techniques for text preprocessing and understand the principles behind word embeddings. The second day begins with an introduction to transformers, followed by hands-on work on classification tasks with the BERT model including basic evaluation techniques. In the afternoon, we will cover large language language models, learn to work locally with open source models and understand potential drwbacks and biases when using this technology.
The participant should:
Where: Science Park 402, 1098 XH Amsterdam. Get directions with OpenStreetMap or Google Maps.
When: december 02 - 03, 2025, 9:30 - 17:00 CET.
Requirements: Participants must bring a laptop with a Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) that they have administrative privileges on. They should have a few specific software packages installed (listed below).
Accessibility: We are committed to making this workshop accessible to everybody. The workshop organizers have checked that:
Materials will be provided in advance of the workshop and large-print handouts are available if needed by notifying the organizers in advance. If we can help making learning easier for you (e.g. sign-language interpreters, lactation facilities) please get in touch (using contact details below) and we will attempt to provide them.
Workshop files: You will find all slides, notebooks, archived collaborative documents, and other relevant files in the files folder of the workshop website repository after the workshop.
Contact: Please email or training@esciencecenter.nl for more information.
Participants are expected to follow these guidelines:
Introduction
From text to vectors:
BERT and Transformers:
Large Language Models
| 09:30 | Welcome and icebreaker |
| 09:45 | Introduction to NLP |
| 10:30 | Coffee break |
| 10:40 | Defining NLP Tasks |
| 11:30 | Coffee break |
| 11:40 | A Primer on Linguistics |
| 12:30 | Lunch |
| 13:30 | From text to vectors: Preprocessing and NLP Pipelines |
| 14:30 | Coffee break |
| 14:40 | Word embeddings and Word2Vec |
| 16:00 | Tea break |
| 16:15 | Training your own Word2Vec |
| 16:45 | Wrap-up |
| 17:00 | END |
| 09:30 | Welcome, icebreaker and recap |
| 09:45 | Transformers - Introduction and Architecture |
| 10:30 | Coffee break |
| 10:40 | Understanding and using BERT |
| 11:30 | Coffee break |
| 11:40 | BERT for Text Classification |
| 12:30 | Lunch |
| 13:30 | Model Evaluation in NLP |
| 14:30 | Coffee break |
| 14:40 | Using Large Language Models |
| 16:00 | Tea break |
| 16:15 | Drawbacks and biases in LLMs |
| 16:45 | Post-workshop Survey |
| 17:00 | END |
All times in the schedule are in the CET timezone.
To participate in this workshop, you will need access to software as described below. In addition, you will need an up-to-date web browser.
We maintain a list of common issues that occur during installation as a reference for instructors that may be useful on the Configuration Problems and Solutions wiki page.
Please follow these setup instructions in preparation for the workshop. This page includes the data sets to be downloaded as well.