part 1 - Machine Learning 1
Machine learning Basics
TRAINING SET
|
So if we consider the
example of supervised algorithm Property example, so by inputting x as any
square feet of house, h will be producing a real output for me.
Representation of h (hypothesis\response surface)
So from above, h can be deduced as linear function as property problem can be solved from linear equation. The above property model uses linear function of h.
This model is known as Linear regression
with one variable or univariate linear regression.
|
Cost Function
Cost Function determines how to fit the best possible
straight line into our data.
So from the above example if you have number of datasets we
need to figure out where exactly the straight line must be drawn or it will be
great trouble to get the right result.
So from the above, we can conclude following- for any value
of x what will be the value function will be giving which will be finally the
value of y.
1.
For any value of x , value of Y remains constant
as Ɵ1x with 0 becomes 0 , so there is no
multiple of x. So for any value of X , value of Y is just 1.5 , no change
throughout.. So h(x) is nothing but a value of Y wrt to x and since x
already became 0, y remained constant. So property price will never change
based on plot size increase.
2.
For any value of x , value of y will be half.
Here for value of x , value of y ideally will be 1 , if we are going head to
head , but what happened here is for any value of x , value of y is half of the
value of so property rate won’t be increased as high if the size of property
getting bigger and bigger.
3.
For any value of x, value of y will be 1.5. So
property price will as well get increased as size of the property increases.
UNDERSTANDING COST FUNCTION
Cost function is directly proportion to performance of the predicted response. It is an extremely important topic for that sake.
In a supervised learning problem, we can divide the tasks
in 3 categories-
1.
A collection of n-dimensional feature vectors,
because n-dimension input like A property details carrying plot size, number,
construction size, location etc. Each property is a dimension.
{Xi}, I = 1 …n
2.
A collection of observed responses like in our
case, property price is the observed response
{yi},I = 1 ….. n
3.
A predicted output ie response surface or
hypothesis
h(x)
Above are data sets with having plot details property.
Here X is having P dimension and each dimension can be number, name, loc, size
etc. Y is the response output with some number or defined output attribute.
So the h(x) can draw certain conclusion about y .
·
So here we
are trying to find a rule or function which can predict almost the correct
value of almost every x input.
So to achieve the same, cost function comes to safe –
So we need to find how well the hypothesis fit to the
input data.
Suppose we have a training example –
The blue curve is the hypothesis value we derive from the
cost function.
Now some cost function developed the 2nd drawing,
red mark.
The purpose of the cost function is
essentially fit the value of h(x).
From the above 2nd diagram it’s very clear
that J1 is much better than J2 because J1 travelled through almost all the
value of X, so predicted value of H(x) will naturally be better there. So J2
> J1.
·
Smaller
the value of hypothesis the more better it is.
·
Main goal
of machine learning is develop the value of h(x) so that the value of j remains
the minimum.
So main characteristics of a
cost function is
J(yi
, h(xi))
It checks the observed response – yi
It try to find the value of predictive responses – h(xi)
"Gradient Descent is actually the method to minimize cost function of multiple variables"
Cost Function Example and Better Understanding
So we have something called hypothesis and other we have
minimize function. So what these meant for.
So considering a popular example, let we have a minimize
cost function-
No consider we have a dataset like following-
So for each value of X , the value of Y is as well same like
(1:1 , 2:2 , 3:3)
So we have 3 datasets.
·
So consider we get Ɵ1
= 1.
So now my
graph will exactly looking like the above as a straight line.
Here Ɵ value is expected
as Ɵ1(x) = 1, then expected Y = 1 , but you need to subtract
predicted value of y with actual value.
·
So consider we get Ɵ1
= 0.5
So the best possible value of Ɵ is 1
because then actual vs predicted value is same and that is what the cost
function should derive.
If J(θ0,θ1)=0, that means the line
defined by the equation "y=θ0+θ1x" perfectly fits all of our data
Cost Function
Cost Function determines how to fit the best possible
straight line into our data.
So from the above example if you have number of datasets we
need to figure out where exactly the straight line must be drawn or it will be
great trouble to get the right result.
So from the above, we can conclude following- for any value
of x what will be the value function will be giving which will be finally the
value of y.
1.
For any value of x , value of Y remains constant
as Ɵ1x with 0 becomes 0 , so there is no
multiple of x. So for any value of X , value of Y is just 1.5 , no change
throughout.. So h(x) is nothing but a value of Y wrt to x and since x
already became 0, y remained constant. So property price will never change
based on plot size increase.
2.
For any value of x , value of y will be half.
Here for value of x , value of y ideally will be 1 , if we are going head to
head , but what happened here is for any value of x , value of y is half of the
value of so property rate won’t be increased as high if the size of property
getting bigger and bigger.
3.
For any value of x, value of y will be 1.5. So
property price will as well get increased as size of the property increases.
UNDERSTANDING COST FUNCTION
Cost function is directly proportion to performance of the predicted response. It is an extremely important topic for that sake.
In a supervised learning problem, we can divide the tasks
in 3 categories-
1.
A collection of n-dimensional feature vectors,
because n-dimension input like A property details carrying plot size, number,
construction size, location etc. Each property is a dimension.
{Xi}, I = 1 …n
2.
A collection of observed responses like in our
case, property price is the observed response
{yi},I = 1 ….. n
3.
A predicted output ie response surface or
hypothesis
h(x)
Above are data sets with having plot details property.
Here X is having P dimension and each dimension can be number, name, loc, size
etc. Y is the response output with some number or defined output attribute.
So the h(x) can draw certain conclusion about y .
·
So here we
are trying to find a rule or function which can predict almost the correct
value of almost every x input.
So to achieve the same, cost function comes to safe –
So we need to find how well the hypothesis fit to the
input data.
Suppose we have a training example –
The blue curve is the hypothesis value we derive from the
cost function.
Now some cost function developed the 2nd drawing,
red mark.
The purpose of the cost function is
essentially fit the value of h(x).
From the above 2nd diagram it’s very clear
that J1 is much better than J2 because J1 travelled through almost all the
value of X, so predicted value of H(x) will naturally be better there. So J2
> J1.
·
Smaller
the value of hypothesis the more better it is.
·
Main goal
of machine learning is develop the value of h(x) so that the value of j remains
the minimum.
So main characteristics of a
cost function is
J(yi
, h(xi))
It checks the observed response – yi
It try to find the value of predictive responses – h(xi)
"Gradient Descent is actually the method to minimize cost function of multiple variables"
Cost Function Example and Better Understanding
So we have something called hypothesis and other we have
minimize function. So what these meant for.
So considering a popular example, let we have a minimize
cost function-
No consider we have a dataset like following-
So for each value of X , the value of Y is as well same like
(1:1 , 2:2 , 3:3)
So we have 3 datasets.
·
So consider we get Ɵ1
= 1.
So now my
graph will exactly looking like the above as a straight line.
Here Ɵ value is expected
as Ɵ1(x) = 1, then expected Y = 1 , but you need to subtract
predicted value of y with actual value.
·
So consider we get Ɵ1
= 0.5
So the best possible value of Ɵ is 1
because then actual vs predicted value is same and that is what the cost
function should derive.
If J(θ0,θ1)=0, that means the line
defined by the equation "y=θ0+θ1x" perfectly fits all of our data
No comments:
Post a Comment