Wednesday, May 7, 2014

Machine Learning - 2 (Basics , Cost Function)

part 1 - Machine Learning 1

Machine learning Basics

In a machine learning algorithm, we have a training set, on which an operation happens which is called learning algorithm and based on the learning it gives a output named a hypothesis.

So if we consider the example of supervised algorithm Property example, so by inputting x as any square feet of house, h will be producing a real output for me.

Representation of h (hypothesis\response surface)


So from above, h can be deduced as linear function as property problem can be solved from linear equation.

The above property model uses linear function of h.
This model is known as Linear regression with one variable or univariate linear regression.

Cost Function

Cost Function determines how to fit the best possible straight line into our data.
So from the above example if you have number of datasets we need to figure out where exactly the straight line must be drawn or it will be great trouble to get the right result.

So from the above, we can conclude following- for any value of x what will be the value function will be giving which will be finally the value of y.
1.      For any value of x , value of Y remains constant as Ɵ1x with 0 becomes 0 , so there is no multiple of x. So for any value of X , value of Y is just 1.5 , no change throughout.. So h(x) is nothing but a value of Y wrt to x and since x already became 0, y remained constant. So property price will never change based on plot size increase.
2.      For any value of x , value of y will be half. Here for value of x , value of y ideally will be 1 , if we are going head to head , but what happened here is for any value of x , value of y is half of the value of so property rate won’t be increased as high if the size of property getting bigger and bigger.
3.      For any value of x, value of y will be 1.5. So property price will as well get increased as size of the property increases.



Cost function is directly proportion to performance of the predicted response. It is an extremely important topic for that sake.

In a supervised learning problem, we can divide the tasks in 3 categories-

1.      A collection of n-dimensional feature vectors, because n-dimension input like A property details carrying plot size, number, construction size, location etc. Each property is a dimension.
{Xi}, I = 1 …n

2.      A collection of observed responses like in our case, property price is the observed response
{yi},I = 1 ….. n

3.      A predicted output ie response surface or hypothesis

Above are data sets with having plot details property. Here X is having P dimension and each dimension can be number, name, loc, size etc. Y is the response output with some number or defined output attribute.
So the h(x) can draw certain conclusion about y .
·        So here we are trying to find a rule or function which can predict almost the correct value of almost every x input.

So to achieve the same, cost function comes to safe –
So we need to find how well the hypothesis fit to the input data.

Suppose we have a training example –

The blue curve is the hypothesis value we derive from the cost function.
Now some cost function developed the 2nd drawing, red mark.

The purpose of the cost function is essentially fit the value of h(x).

From the above 2nd diagram it’s very clear that J1 is much better than J2 because J1 travelled through almost all the value of X, so predicted value of H(x) will naturally be better there. So J2 > J1.

·        Smaller the value of hypothesis the more better it is.
·        Main goal of machine learning is develop the value of h(x) so that the value of j remains the minimum.

So main characteristics of a cost function is
J(yi , h(xi))

It checks the observed response – yi
It try to find the value of predictive responses – h(xi)

"Gradient Descent is actually the method to minimize cost function of multiple variables"

Cost Function Example and Better Understanding

So we have something called hypothesis and other we have minimize function. So what these meant for.
So considering a popular example, let we have a minimize cost function-

No consider we have a dataset like following-

So for each value of X , the value of Y is as well same like (1:1 , 2:2 , 3:3)
So we have 3 datasets.
        ·        So consider we get Ɵ1 = 1.
So now my graph will exactly looking like the above as a straight line.

Here Ɵ value is expected as Ɵ1(x) = 1, then expected Y = 1 , but you need to subtract predicted value of y with actual value.

          ·        So consider we get Ɵ1 = 0.5

So the best possible value of Ɵ is 1 because then actual vs predicted value is same and that is what the cost function should derive.

If J(θ0,θ1)=0, that means the line defined by the equation "y=θ0+θ1x" perfectly fits all of our data

No comments: