Data Engineer working with multiple Big Data technologies and Machine Learning: Some interesting things about Regression

Residuals

Once we run the linear regression model, we get residuals as summary, so to determine more about residuals, there are some interesting points I accidentally found and it helped me so I am drafting ...

Residuals, have mean Zero so it does mean that residual is balanced among the data points , so no pattern its just scattered and there will almost equal positive and negative.

so if I run the linear regression in R-

fit <- lm ( relation ~ person , data = people) ,

so to justify the theory , just do the simple mean of residuals-

mean(fit$residuals) - must give a value very close to zero.

There is no correlation between residuals and predictors.

cov(fit$residuals, people$person) .

covariance

While googling I found a new equation -

var(data) = var(estimate) + var(residuals)

Least Square

Regression line is the line through the data which has minimum (least) squared 'error', the vertical distance between actual predictor and the prediction made by line.

Squaring the distances ensures the data points above and below the line are treated the same.

The method of choosing the 'best regression line' (or fitting a line to the data) is known as ordinary least squares.

Data Engineer working with multiple Big Data technologies and Machine Learning

Monday, October 27, 2014

Some interesting things about Regression

Residuals

Least Square

No comments:

Python Java BigData Machine Learning Data Mining Developer