# part 3 - Machine Learning - 3

If we have training set as follows-

We need to calculate J(0 , 1);

##
Multiple features in Machine Learning

What if we have multiple features and now all them can derive
the result different.

What if we have house size, house number, and number of
rooms etc. to determine the price, so now we can assume more precise result and
different results as well?

To understand this, we have a set of example-

###
Gradient Descent Algorithm for More than one
feature

*Gradient Descent Algorithm for Linear Expression Single Dimension*

*Gradient Descent Algorithm for Linear Expression Multiple Dimension*

###
**Mathematical
tricks to make Gradient Descent Work Well**

####
SCALING

·
The bottom line about making gradient Descent
algorithm to work well is all the dimension\features should have a single
scaling, precisely one shouldn’t be declared in sq. Feet and other in CM. Try
to make all of them in single scale.

E.g.:

*X*

_{1}= Size of house (0 – 2000 feets^{2})*X*

_{2}= number of bedrooms (1- 5)
So now the X1 will make the

**very very tall and skinny oval shaped as the range is extremely high and over all the impression will be very congested oval shaped graph.***Cost Function J*
Now on running gradient descent on such output, it may
take long time to reach the global minimum as it needs to travel a long
distance to reach there.

So to avoid such scenario we must scale the feature,
means make both the features under the same scale, so we derive the following
formula which finally result a proper gradient descent on

**cost Function J**graph with circles where finding minimum is easy and faster.### Gradient descent to work better – Limited Range

Always try to keep range of gradient descent features under-

Feature scaling speeds up gradient descent by avoiding much
extra iteration that are required when one or more features take on much larger
values than the rest.

## No comments:

Post a Comment