Welcome

Last updated on 2024-05-06 | Edit this page

Estimated time: 5 minutes

Overview

Questions

Who is this lesson for?
What will be covered in this lesson?

Objectives

Identify the target audience
Identify the learning goals of the lesson

Welcome

This is a hands-on introduction to Natural Language Processing (or NLP). NLP refers to a set of techniques involving the application of statistical methods, with or without insights from linguistics, to understand natural (i.e, human) language for the sake of solving real-world tasks.

This course is designed to equip researchers in the humanities and social sciences with the foundational skills needed to carry over text-based research projects.

What will we be covering in this lesson?

This lesson provides a high-level introduction to NLP with particular emphasis on applications in the humanities and the social sciences. We will focus on solving a particular problem over the lesson, that is how to identify key entities in text (such as people, places, companies, dates and more) and labeling each one of them with the right category name. Towards the end of the lesson, we will cover also other types of applications (such as topic modelling, and text generation).

After following this lesson, learners will be able to:

Explain and differentiate what are the core topics in NLP
Identify what kinds of tasks NLP techniques excel at, and what are their limitations
Structure a typical NLP pipeline
Extract vector representations of individual words, visualise and manipulate it
Applying a machine learning algorithm to textual data to extract and categorise names of entities (e.gs., places, people)
Apply popular tools and libraries used to solve other tasks in NLP (such as topic modelling, and text generation)

Software packages required

The lesson is coded entirely in Python. We are going to use Jupyter notebooks throughout the lesson and the following packages:

spacy
gensim
transformers

Dataset

In this lesson, we’ll use N books from the Project Gutenberg. We will use their Plain Text UTF-8 version.

The Adventures of Sherlock Holmes by Arthur Conan Doyle - Full text - Wikipedia
The Count of Monte Cristo by Alexandre Dumas - Full text - Wikipedia

Key Points

This lesson on Natural language processing in Python is for researchers working in the field of Humanities and/or Social Sciences
This lesson is an introduction to NLP and aims at implementing first practical NLP applications from scratch