part 2 - Machine Learning - 2
Consider you have above datasets which 2 thetas and to get the min of J(Ɵ1,Ɵ2) , you need to reach to the minimum so gradient descent gives the solution for that. Above is a kind of hill where black and red marks are 2 objects who wants to get down off the hill and reach home and reach home is the white side of the diagram.
Suppose Black cross is Ɵ1 , so in that case black cross will start taking baby steps
towards the bottom of the tip, same Ɵ2 ie red does the same job of starting baby step to get down, you can
see an entirely different root itself comes to reach to tip. So that is what
all about Gradient descent which says recursively try to find the minimum.You can minimize any other
function j as well not compulsory cost function.Main job of Gradient Descent is find a value of
theta for your hopefully minimized the cost function theta.
Gradient Descent Algorithm
Calculus
derivative. Derivate is nothing but tangent of the function. Below the red line
is actually the derivative. So slope of the point is called derivative
·
For the above diagram, the derivative will
always be a +ve number as the sloop line will finally make a tangent and that
will give positive number..
So positive derivative
will keep alpha as positive so finally θ1 will be moving
to the left which is right thing as it will reach to the minimum.
·
For the above diagram, the derivative will
always be a -ve number
So negative
derivative will keep alpha as negative so finally θ1 will be
moving to the right.
Simultaneous update means-
·
If learning rate alpha is very small
Then the function will take very little baby step to come
down, so this way it will take lots of time to reach to minimum.
If the learning rate
is small, gradient descent ends up taking an extremely small step on each
iteration, so this would actually slow down (rather than speed up) the
convergence of the algorithm.
If alpha is too large, then gradient descent may overshoot
the minimum and that result to never reach the minimum.
- · When Ɵ1 is at local optima
As slope of the red line = 0 , so derivative term
will be entirely 0.
So above the pink dot is the point the object lies where
gradient descent algo will starts
So even the alpha remains the same, the derivative
becomes lesser and lesser each time ie green , red respectively. So it proves
Gradient Descent will automatically take smaller steps , so keeping alpha
constant won’t hurt .
At a local minimum,
the derivative (gradient) is zero, so gradient descent will not change the
parameters
Gradient Descent for Linear Regression
So now run the above gradient descent algorithm for linear regression, you can get the minimum easily.
From the above, consider the point starts at 900, -0.1, the rightmost corner red dot.
The point will now move to little left, then more left, then more left and accordingly the red straight line will change in left graph from low density to high density and finally once the right graph point will reach to the center then left graph straight line will reach to the highest density of number of you may the linear regression will be minimized properly.
No comments:
Post a Comment