Linear Regression in Machine Learning: A Beginner’s Guide

1/25/2024

Summary

An introductory explanation of linear regression that breaks down the fundamental concepts including model definition, cost function, and gradient descent, using a simple student study hours example to demonstrate how predictions are made.

What is linear regression and why should we care about it?

Linear regression is a simple type of supervised learning in the field machine learning and data processing. It’s goal is to find linear relations between input and output of data and basend on this linear model it can predict the most probably outbut for a new input. The simple regression shows some of the most fundamental principles in machine learning and is a great way to get into this field.

Example-Problem

We have a schoolclass with 7 different students, each of them learned x hours for the last classtest and got the grade y for it.
Now the question is which grade has the Student 7 most likely, when he learned for 3 hours? Predictions like this we can do with linear regression.

Formula and training

Model definition

To build up our linear formula we need some math. First of all we define the typical formula of a linear straight: y = w*x+b. Here’s x the input, y the output, w the gradient/weight and b the intercept/bias. This formula is called model. To start optimizing the model we need to define some random values for the variables w and b. It’s common to take for both values 0 as value.

Costfunction

After that we define the cost function J:
Blog image
The costfunction is used to calculate the “error” of the model. The error is the sum of all differences between the predicted grades of the model and the real grades of the students. In our example we calculate the following:
J(0, 0) = (1/(2*6)) * ((4-(0*2+0))² + (2-(0*6+0))² + (3-(0*5+0))² + (4-(0*1+0))² + (1-(0*8+0))² + (2-(0*4+0))²). The result (error) is 4.17 which is really huge, because we havent optimized yet.

Training / optimization

To optimize our model we need an algorithm called gradient descent. This algorithm is used to optimize the variables w and b. The training is a process of many iterations, in every iteration the gradient descent optimizes the variables just a little bit.
Blog image
When we first look at the formula for gradient descent, it might seem complicated, but it’s actually pretty straightforward once you break it down. The key thing to remember is that we have two main things to find: the best values for weight (w) and bias (b) in our model. It’s important to treat these two calculations in a similar way to get accurate results.
Let’s focus on how we find the best value for the weight (w) first:1. Start with the Current Weight: We begin with our current guess for the weight, which we can call ‘old w’.
2. Adjusting the Weight: To find a better weight, we make a small adjustment to ‘old w’. We do this by subtracting something from it. This ‘something’ is a combination of two factors: the learning rate (α) and the derivative of the cost function.
3. Learning Rate (α): Think of the learning rate like a step size. In our example, it’s 0.01, which means we’re taking small steps. This helps us to move carefully and not miss the best value.
4. Derivative of the Cost Function: This part tells us the direction we should go to reduce our error. If the derivative is positive, it means we need to decrease ‘w’ to reduce the error. If it’s negative, we should increase ‘w’.
Now, what about the bias (b)? The process is similar:1. Adjusting the Bias: Just like with the weight, we adjust the bias by taking our current guess, ‘old b’, and making a small change to it.
2. Using the Same Learning Rate: We use the same learning rate (0.01 in our example) to determine the size of our step.
3. Applying the Derivative: We again look at the derivative of the cost function, but this time we use it to adjust the bias.
In summary, we’re tweaking our weight and bias little by little, using the learning rate to control how big our steps are, and the derivative of the cost function to guide us in the right direction. This process is repeated until we find the best values for both weight and bias that make our model’s predictions as accurate as possible. Those values we then can insert in the model from the beginning y = w * x +b and with the calculated values of w and b we get good prediction what the predicted outputy is based on input x.

Conclusion

So, that was quite a bit of math. First of all we defined our model with the formula y=wx+c. Then we defined the cost function and calculated the error. After that we moved to the process of taining where we defined the learning rate and then defined the formula to optimize the values of w and b.
This is my first blogpost ever. I hope you enjoyed reading.Your Mario 💚