Hey reader👋Hope you are doing well😊
In the last post we have read about Linear regression and some of its basics.
In this post we are going to discuss about how we can minimize our cost function using gradient descent algorithm.
So let's get started🔥
Gradient Descent
It is the algorithm used to find the value of theta that minimizes the cost function.
Cost Function -:
According to this algorithm we will start with some random value of theta let's say Θ = 0 vector
that is value for all parameters Θ's is 0 and then keep changing Θ to reduce the cost function.
where j=0,1,2.....,n
α is learning rate (in practice α=0.01) this indicates that we are taking small steps or we are making small change in the value of Θ.
If α is too large then the steps taken are too large and if α is too small then number of iterations will be more and algorithm will become slow.
To understand it better consider that you are on mountain and you want to get to the lowest point in the valley. Gradient descent is like taking steps downhill in the direction that decreases the altitude. Each step is based on slope of the mountain at current point.
Now let's find the value of Θ -:
For m training samples -:
This is how we can compute value of parameters that minimizes the cost function.
So here you are seeing that we are starting from Θ = 0 then calculating the predicted output for all training samples and then reducing the value of Θ in order to minimize the cost function.
This algorithm is also known as Batch Gradient Descent.
The main disadvantage of this algorithm is that it will fail for large dataset because in order to make one update we need to calculate sum of all training examples.
An alternative to this algorithm is Stochastic Gradient Descent.
Stochastic Gradient Descent
In this algorithm instead of using the whole dataset, we use only one training point at a time to update model's parameters.
[Note -> Stochastic means random]
Stochastic Gradient Descent picks one data point to compute the gradient and update the model.
So here the algorithm will pick any random data point and compute gradient for this then it will pick another value and will do the same thing.
The main disadvantage of this algorithm is that it can be more noisy and less stable because it uses only one data point at a time which can lead to fluctuating gradients.
You can see the implementation of Gradient Descent here -: [https://www.kaggle.com/code/nehagupta09/linear-regression-implementation]
I hope you have understood this. If you have any doubts please comment I'll try to solve your queries.
Don't forget to follow me for more.
Thankyou 💙
Top comments (2)
Hey @ngneha09 , loved your blog. I am also learning ML. Your blog will help me a lot. Keep writing, I will be following you regularly.
I am happy that my blogs are helpful to you :)