Linear regression is one of the most basic models in machine learning due to the ability to extend other models/ideas off of it. Its purpose is to predict a numeric response from a set of one or greater independent variables (often represented using a line through data points).
Predicting values using linear models such as these is really easy. That's because the data is aligned linearly, once we find some equation to mimic the average behavior we can predict the outcome of, for example, prices.
To train a model such as this, we need to first identify a way to evaluate how well our model has performed. In machine learning terms, these are called loss functions. One of the most popular loss functions is mean squared error (MSE).
Mean Squared Error:
This function measures how close a predicted value is to the true value, this can be used to measure how close a regression line is to a set of points. It works by squaring the distance between each data point and the regression line. In doing this we can get the sum of all of the squared values and divide by the total number of data points, effictively giving us the average or mean.
Squaring the errors in this manner prevents negative and positive errors cancelling out when we sum them. It also, this is far more important to training our model, gives more weight to points further from the regression line. It's basically punishing a point for being far from the line. Fitting our regression model to a set of training data, we can use MSE on the dataset to train our model!