Introduction to machine learning

What is machine learning?

Arthur Samuel

The field of study that gives computers the ability to learn without being explicitly programmed.

Tom Mitchell

A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.

Experience E, Task T, Performance P

  • predicting house price:
    • E = the historical price of many houses
    • T = the task of predicting house price
    • P = the performance is how exactly the program predict the house price
  • playing Go:
    • E = the experience of playing many games of Go
    • T = the task of playing Go
    • P = the probability that the program will win the next Go

The mathematical perspective

ML objective is the function: \[ \hat{y} = f(x) \]

ML & AI

AI & ML
Deep learning

ML & statistics

ML & Statistics

  • Closely related: methods
  • Different: goal
    • Statistics draws population inferences from a sample
    • ML finds generalizable predictive patterns

ML Types

Three main types of ML problems

  • Supervised learning
  • Unsupervised learning
  • Reinforcement learning

Supervised learning

To learn a mapping between input examples and the target variable.

The training data has the target variable.

Supervised learning: Classification

Target variable is discrete, belonging to two or more classes

e.g. handwritten digit recognition

Supervised learning: Regression

Target variable is continuous real value

e.g. predicting length of a salmon

Unsupervised learning

To describe or extract relationships in data

Training data does not have values of target variable

Unsupervised learning problems

  • clustering: to discover groups of similar examples within the data
  • density estimation: to determine the distribution of data within the input space
  • dimensionality reduction: to project the data from a high-dimensional space down to two or three dimensions

Reinforcement learning

An agent operates in an environment and must learn to operate using feedback.

No fixed training data

e.g. Google’s AlphaGo

ML Limitations

Data

ML requires massive data to train on

Data should be unbiased and of good quality

Not easy to get in practice

Extrapolation

We can only make reliable predictions about data which is in the same range as our training data.

If we try to extrapolate beyond what was covered in the training data we’ll probably get wrong answers.

Interpretation of Results

It’s a challenge to accurately interpret results generated by the algorithms.

You have to carefully choose the algorithms for your purpose.

Todays data

Weather data set

  • 18 locations
  • cloud_cover, wind_speed, wind_gust, humidity, pressure, global_radiation, precipitation, sunshine

Thank you

Q&A