part 2 - Machine Learning - 2
Consider you have above datasets which 2 thetas and to get the min of J(Ɵ1,Ɵ2) , you need to reach to the minimum so gradient descent gives the solution for that. Above is a kind of hill where black and red marks are 2 objects who wants to get down off the hill and reach home and reach home is the white side of the diagram.
Suppose Black cross is Ɵ1 , so in that case black cross will start taking baby steps towards the bottom of the tip, same Ɵ2 ie red does the same job of starting baby step to get down, you can see an entirely different root itself comes to reach to tip. So that is what all about Gradient descent which says recursively try to find the minimum.You can minimize any other function j as well not compulsory cost function.Main job of Gradient Descent is find a value of theta for your hopefully minimized the cost function theta.
a := a+1, it doesn’t mean a = a+1 , but it means take a and increase its value by 1. Where a = b is Truth Assertion not assignment.
Calculus derivative. Derivate is nothing but tangent of the function. Below the red line is actually the derivative. So slope of the point is called derivative
· For the above diagram, the derivative will always be a +ve number as the sloop line will finally make a tangent and that will give positive number..
So positive derivative will keep alpha as positive so finally θ1 will be moving to the left which is right thing as it will reach to the minimum.
· For the above diagram, the derivative will always be a -ve number
So negative derivative will keep alpha as negative so finally θ1 will be moving to the right.
Simultaneous update means-
· If learning rate alpha is very small
Then the function will take very little baby step to come down, so this way it will take lots of time to reach to minimum.
If the learning rate is small, gradient descent ends up taking an extremely small step on each iteration, so this would actually slow down (rather than speed up) the convergence of the algorithm.
If alpha is too large, then gradient descent may overshoot the minimum and that result to never reach the minimum.
- · When Ɵ1 is at local optima
As slope of the red line = 0 , so derivative term will be entirely 0.
So above the pink dot is the point the object lies where gradient descent algo will starts
So even the alpha remains the same, the derivative becomes lesser and lesser each time ie green , red respectively. So it proves Gradient Descent will automatically take smaller steps , so keeping alpha constant won’t hurt .
At a local minimum, the derivative (gradient) is zero, so gradient descent will not change the parameters
So now run the above gradient descent algorithm for linear regression, you can get the minimum easily.
From the above, consider the point starts at 900, -0.1, the rightmost corner red dot.
The point will now move to little left, then more left, then more left and accordingly the red straight line will change in left graph from low density to high density and finally once the right graph point will reach to the center then left graph straight line will reach to the highest density of number of you may the linear regression will be minimized properly.