part 3 - Machine Learning - 3
If we have training set as follows-
We need to calculate J(0 , 1);
What if we have multiple features and now all them can derive the result different.
What if we have house size, house number, and number of rooms etc. to determine the price, so now we can assume more precise result and different results as well?
To understand this, we have a set of example-
Gradient Descent Algorithm for Linear Expression Single Dimension
Gradient Descent Algorithm for Linear Expression Multiple Dimension
· The bottom line about making gradient Descent algorithm to work well is all the dimension\features should have a single scaling, precisely one shouldn’t be declared in sq. Feet and other in CM. Try to make all of them in single scale.
X1 = Size of house (0 – 2000 feets2)
X2 = number of bedrooms (1- 5)
So now the X1 will make the Cost Function J very very tall and skinny oval shaped as the range is extremely high and over all the impression will be very congested oval shaped graph.
Now on running gradient descent on such output, it may take long time to reach the global minimum as it needs to travel a long distance to reach there.
So to avoid such scenario we must scale the feature, means make both the features under the same scale, so we derive the following formula which finally result a proper gradient descent on cost Function J graph with circles where finding minimum is easy and faster.
Gradient descent to work better – Limited Range
Always try to keep range of gradient descent features under-
Feature scaling speeds up gradient descent by avoiding much extra iteration that are required when one or more features take on much larger values than the rest.