\[ f(x) = \hat{y}(w, x) \]
ML is meant to
Then ML goal becomes
minimizing loss function \(J(w)\)
with respect to weights \(w\)
\[\hat{y}(w,x) = w_0 + w_1x_1\]
univariate linear regression
When we have more input features, e.g. number of bedrooms, area of balcony, construction year, etc.
The model becomes \[ \hat{y} (w,x) = w_0 + w_1 x_1 + w_2 x_2 + \cdots + w_n x_n \]
We have a linear model with certain values for the weights. How well does this model capture the data that we observe?
We could use loss function
\[ J(w) = \frac{1}{m}\sum_{i=1}^m (y_i - \hat{y}_i)^2 \]
mean square error, error is the difference between real value \(y\) and predicted value \(\hat{y}\)
Do you still remember the ML goal?
minimize loss function \(J(w)\)
This is done by optimization algorithm:
keep changing weights \(w\) to reduce loss \(J(w)\) until it hopefully ends up at a minimum
Gradient descent for two features:
Linear regression
Loss function can be shared by all regression models
\[ J(w) = \frac{1}{m}\sum_{i=1}^m (y_i - \hat{y}_i)^2 \]
Stochastic Gradient Descent(SGD)
Adam(Adaptive Moment Estimation)
Adam
as the first choice in practicescikit-learn
NN model is multi-layer perceptron
To use more complex NN, other framework should be used, e.g. PyTorch, Keras, TensorFlow, etc.
Want to have a look at various NN models? Try plot NN
Neural network