part 3 - Machine Learning - 3
If we have training set as follows-
We need to calculate J(0 , 1);
Multiple features in Machine Learning
What if we have multiple features and now all them can derive
the result different.
What if we have house size, house number, and number of
rooms etc. to determine the price, so now we can assume more precise result and
different results as well?
To understand this, we have a set of example-
Gradient Descent Algorithm for More than one
feature
Gradient Descent Algorithm for Linear Expression Single
Dimension
Gradient Descent Algorithm for Linear Expression Multiple
Dimension
Mathematical tricks to make Gradient Descent Work Well
SCALING
·
The bottom line about making gradient Descent
algorithm to work well is all the dimension\features should have a single
scaling, precisely one shouldn’t be declared in sq. Feet and other in CM. Try
to make all of them in single scale.
E.g.:
X1 =
Size of house (0 – 2000 feets2)
X2 = number
of bedrooms (1- 5)
So now the X1 will make the Cost Function J very very
tall and skinny oval shaped as the range is extremely high and over all the
impression will be very congested oval shaped graph.
Now on running gradient descent on such output, it may
take long time to reach the global minimum as it needs to travel a long
distance to reach there.
So to avoid such scenario we must scale the feature,
means make both the features under the same scale, so we derive the following
formula which finally result a proper gradient descent on cost Function J graph with circles where finding minimum is easy
and faster.
Gradient descent to work better – Limited Range
Always try to keep range of gradient descent features under-
Feature scaling speeds up gradient descent by avoiding much
extra iteration that are required when one or more features take on much larger
values than the rest.
No comments:
Post a Comment