AI: Linear Regression in Machine Learning: A Beginner’s Guide

01/25/2024

Summary

An introduction to linear regression: model definition, cost function, and gradient descent, explained with a simple example of students and their study hours.

What is linear regression and why should we care about it?

Linear regression is a simple form of supervised learning in the field of machine learning and data processing. Its goal is to find linear relationships between the input and output of data, and based on this linear model, to predict the most likely output for a new input. Simple regression demonstrates some of the most fundamental principles in machine learning and is a good entry point into the field.

Example problem

We have a school class with 7 students, each of whom studied x hours for the last class test and got the grade y for it.

The question is: which grade will Student 7 most likely get if they studied for 3 hours? Predictions like this are exactly what linear regression is for.

Formula and training

Model definition

To build our linear formula, we need a bit of math. First, we define the typical formula of a straight line: y = w·x + b. Here, x is the input, y is the output, w is the slope or weight, and b is the y-intercept or bias. This formula is our model. To start optimizing the model, we need initial values for w and b. A common choice is 0 for both.

Cost function

Next, we define the cost function J:

The cost function calculates the error of the model. The error is the sum of all differences between the model's predicted grades and the students' actual grades. In our example, we compute the following:

J(0, 0) = (1/(2·6)) · ((4−(0·2+0))² + (2−(0·6+0))² + (3−(0·5+0))² + (4−(0·1+0))² + (1−(0·8+0))² + (2−(0·4+0))²)

The result (the error) is 4.17, which is still high since we have not optimized yet.

Training and optimization

To optimize our model, we need an algorithm called gradient descent. With this algorithm, we adjust the variables w and b. Training is a process of many iterations, and in each iteration, gradient descent improves the variables by a small amount.

At first glance, the formula for gradient descent looks complicated, but it becomes clearer once you break it down. The important point is that we are looking for two values: the best values for the weight (w) and the bias (b) in our model. Both calculations should be treated in the same way to get correct results.

Let's first look at how we find the best value for the weight (w):

Start with the current weight: We begin with our current guess for the weight, let's call it "old w".
Adjust the weight: To get a better weight, we adjust "old w" slightly by subtracting something from it. This "something" consists of two factors: the learning rate (α) and the derivative of the cost function.
Learning rate (α): The learning rate is the step size. In our example, it is 0.01, so we take small steps. This way, we move carefully and do not overshoot the best value.
Derivative of the cost function: This part tells us the direction in which to move to reduce the error. If the derivative is positive, we need to decrease w to reduce the error. If it is negative, we need to increase w.

And what about the bias (b)? The process is analogous:

Adjust the bias: As with the weight, we take our current guess, "old b", and change it slightly.
Use the same learning rate: We use the same learning rate (0.01 in our example) to determine the step size.
Apply the derivative: We again look at the derivative of the cost function, but this time we use it to adjust the bias.

In summary: we adjust the weight and bias in small steps, with the learning rate controlling the step size and the derivative of the cost function pointing in the right direction. The process is repeated until we have found values for weight and bias that make our predictions as accurate as possible. We can then plug these values into the model from the beginning, y = w·x + b, and use the computed w and b to make a prediction for the output y based on the input x.

Conclusion

That was a fair amount of math. First, we defined our model with the formula y = w·x + b. Then we set up the cost function and calculated the error. After that, we moved on to training, defined the learning rate, and set up the formula for optimizing w and b.

This is my very first blog post. I hope you enjoyed reading it.

Yours, Mario 💚

AI: Linear Regression in Machine Learning: A Beginner’s Guide

01/25/2024

Summary

An introduction to linear regression: model definition, cost function, and gradient descent, explained with a simple example of students and their study hours.

What is linear regression and why should we care about it?

Linear regression is a simple form of supervised learning in the field of machine learning and data processing. Its goal is to find linear relationships between the input and output of data, and based on this linear model, to predict the most likely output for a new input. Simple regression demonstrates some of the most fundamental principles in machine learning and is a good entry point into the field.

Example problem

We have a school class with 7 students, each of whom studied x hours for the last class test and got the grade y for it.

The question is: which grade will Student 7 most likely get if they studied for 3 hours? Predictions like this are exactly what linear regression is for.

Formula and training

Model definition

To build our linear formula, we need a bit of math. First, we define the typical formula of a straight line: y = w·x + b. Here, x is the input, y is the output, w is the slope or weight, and b is the y-intercept or bias. This formula is our model. To start optimizing the model, we need initial values for w and b. A common choice is 0 for both.

Cost function

Next, we define the cost function J:

The cost function calculates the error of the model. The error is the sum of all differences between the model's predicted grades and the students' actual grades. In our example, we compute the following:

J(0, 0) = (1/(2·6)) · ((4−(0·2+0))² + (2−(0·6+0))² + (3−(0·5+0))² + (4−(0·1+0))² + (1−(0·8+0))² + (2−(0·4+0))²)

The result (the error) is 4.17, which is still high since we have not optimized yet.

Training and optimization

To optimize our model, we need an algorithm called gradient descent. With this algorithm, we adjust the variables w and b. Training is a process of many iterations, and in each iteration, gradient descent improves the variables by a small amount.

At first glance, the formula for gradient descent looks complicated, but it becomes clearer once you break it down. The important point is that we are looking for two values: the best values for the weight (w) and the bias (b) in our model. Both calculations should be treated in the same way to get correct results.

Let's first look at how we find the best value for the weight (w):

Start with the current weight: We begin with our current guess for the weight, let's call it "old w".
Adjust the weight: To get a better weight, we adjust "old w" slightly by subtracting something from it. This "something" consists of two factors: the learning rate (α) and the derivative of the cost function.
Learning rate (α): The learning rate is the step size. In our example, it is 0.01, so we take small steps. This way, we move carefully and do not overshoot the best value.
Derivative of the cost function: This part tells us the direction in which to move to reduce the error. If the derivative is positive, we need to decrease w to reduce the error. If it is negative, we need to increase w.

And what about the bias (b)? The process is analogous:

Adjust the bias: As with the weight, we take our current guess, "old b", and change it slightly.
Use the same learning rate: We use the same learning rate (0.01 in our example) to determine the step size.
Apply the derivative: We again look at the derivative of the cost function, but this time we use it to adjust the bias.

In summary: we adjust the weight and bias in small steps, with the learning rate controlling the step size and the derivative of the cost function pointing in the right direction. The process is repeated until we have found values for weight and bias that make our predictions as accurate as possible. We can then plug these values into the model from the beginning, y = w·x + b, and use the computed w and b to make a prediction for the output y based on the input x.

Conclusion

That was a fair amount of math. First, we defined our model with the formula y = w·x + b. Then we set up the cost function and calculated the error. After that, we moved on to training, defined the learning rate, and set up the formula for optimizing w and b.

This is my very first blog post. I hope you enjoyed reading it.

Yours, Mario 💚