Introduction to machine learning

What is machine learning?

The field of study that gives computers the ability to learn without being explicitly programmed.

A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.

Experience E, Task T, Performance P

predicting house price:
- E = the historical price of many houses
- T = the task of predicting house price
- P = the performance is how exactly the program predict the house price
playing Go:
- E = the experience of playing many games of Go
- T = the task of playing Go
- P = the probability that the program will win the next Go

The mathematical perspective

ML objective is the function: \[ \hat{y} = f(x) \]

ML & AI

Let’s treat all problems in the universe as a set. This set can be divided to mathematical problems and other problems (e.g. love problems, moral and ethical problems, cultural conflict problems and so on).

Some of mathematical problems are computable, which means it can be solved in principle by a computing machine, while the other are not.

AI is a subgroup of computable problems. It tries to make intelligent machines, especially intelligent computer programs. It also means computability is the limit of AI. AI cannot do everything, cannot solve the problems that is not computable.

AI problems or goals include e.g problem-solving, reasoning, planning, learning, communicating, acting, etc. There are various algorithms that try to solve these AI problems.

ML is a subgroup of these AI algorithms which seek to enable machine to learn from examples in order to make predictions based on input data. ML is comprised of classical algorithms and deep learning (DL) algorithms. Classical ML algorithms include, e.g. linear regression, decision trees, support vector machine, etc.

Deep learning is a specific group of algorithms that are based on artificial neural networks. “Deep” in deep learning refers to a neural network comprised of more than three layers, which would be inclusive of the inputs and the output.

ML & statistics

ML & Statistics

Closely related: methods
Different: goal
- Statistics draws population inferences from a sample
- ML finds generalizable predictive patterns

ML Types

Three main types of ML problems

Supervised learning
Unsupervised learning
Reinforcement learning

Supervised learning

To learn a mapping between input examples and the target variable.

The training data has the target variable.

Supervised learning: Classification

Target variable is discrete, belonging to two or more classes

e.g. handwritten digit recognition

Supervised learning: Regression

Target variable is continuous real value

e.g. predicting length of a salmon

Unsupervised learning

To describe or extract relationships in data

Training data does not have values of target variable

Unsupervised learning problems

clustering: to discover groups of similar examples within the data
density estimation: to determine the distribution of data within the input space
dimensionality reduction: to project the data from a high-dimensional space down to two or three dimensions
…

Reinforcement learning

An agent operates in an environment and must learn to operate using feedback.

No fixed training data

e.g. Google’s AlphaGo

ML Limitations

Data

ML requires massive data to train on

Data should be unbiased and of good quality

Not easy to get in practice

Extrapolation

We can only make reliable predictions about data which is in the same range as our training data.

If we try to extrapolate beyond what was covered in the training data we’ll probably get wrong answers.

Interpretation of Results

It’s a challenge to accurately interpret results generated by the algorithms.

You have to carefully choose the algorithms for your purpose.

Todays data

Weather data set

18 locations
cloud_cover, wind_speed, wind_gust, humidity, pressure, global_radiation, precipitation, sunshine