Linear Regression or Regression with Multiple Convariates

Believe me these are extremely easy to understand and R-programming has already these algorithms implemented , you just need to know how to use them :)

Lets consider we have values X and Y. In Simple word

Linear Regression is very widely used Machine Learning algorithm everywhere because Models which depend linearly on their unknown parameters are

Uses of Linear Regression ~

1 line , that's all , method="lm" , isn't it extraordinary :)

So the summary here-

Details about summary linear Regression Model Summary

So now lets plot the fitted vs residual graph and see how well the model worked.

Some weird :) but atleast the line went through almost all the data.

Now how well the model worked -

Well my data is anyway weird, so seriously it worked pretty good , believe me :)

So now on the model value , we should try the testing dataset and that's as well straightforward -

disclaimer** - I am not a phd holder or data scientist , what I do is self interest and learning ... so it may contain some serious mistakes :)

Believe me these are extremely easy to understand and R-programming has already these algorithms implemented , you just need to know how to use them :)

Lets consider we have values X and Y. In Simple word

*Linear Regression*is way to model a relationship between X and Y , that's all :-). Now we have X1...Xn and Y , then relationship between them is*Multiple Linear Regression*.Linear Regression is very widely used Machine Learning algorithm everywhere because Models which depend linearly on their unknown parameters are

**.***easier to fit*Uses of Linear Regression ~

- Prediction Analysis kind of applications can be done using Linear Regression , precisely after developing a Linear Regression Model, for any new value of X , we can predict the value of Y (based on the model developed with a previous set of data).

- For a given Y, if we are provided with multiple X like X1.....Xn , then this technique can be used to find the relationship between each of the X with Y , so we can find the weakest relationship with Y and the best one as well .

Why I did all the theory above is , so that I could remember the basics, rest is all easy :).

------------------------------------------------------------------

So now I'd like to do an example in R and the best resource I could find was Population.

Talk about population , so How can I miss India , so somehow I managed to get dataset-

Above is just a snapshot of the data , I had data from 1700 till 2014 and yeah some missing data as well in-between .

to use R , already caret package has an implementation of regression , so load the same and for plotting I am using ggplot.

The bottomline after getting data is to do the exploratory analysis, well I have 2 fields and no time :) , so just a quick plot-

Looking great , its growing ,.. growing ..and .. so its real data .

So 1st thing 1st , split the data in 2 parts , training and testing

```
allTrainData <- createDataPartition(y=data$population,p=0.7,list=FALSE)
training <- data[allTrainData,]
testing <- data[-allTrainData,]
```

So now I have X and Y , or simply wanted to find the population based on year or vice versa .

Don't worry , R brought caret package which already brought implementation of the linear regression algorithm.

What the formula behind it , please check my other blog about detail of Linear Modelling but here -

Don't worry , R brought caret package which already brought implementation of the linear regression algorithm.

What the formula behind it , please check my other blog about detail of Linear Modelling but here -

```
model <- train(population~.,method="lm",data=training)
finalModel <- model$finalModel
```

1 line , that's all , method="lm" , isn't it extraordinary :)

So the summary here-

```
Call:
lm(formula = .outcome ~ ., data = dat)
Residuals:
Min 1Q Median 3Q Max
-186364 -164118 -83667 106876 811176
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -6516888 668533 -9.748 4.69e-16 ***
year 3616 346 10.451 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 216300 on 97 degrees of freedom
Multiple R-squared: 0.5296, Adjusted R-squared: 0.5248
F-statistic: 109.2 on 1 and 97 DF, p-value: < 2.2e-16
```

Details about summary linear Regression Model Summary

So now lets plot the fitted vs residual graph and see how well the model worked.

Some weird :) but atleast the line went through almost all the data.

Now how well the model worked -

Well my data is anyway weird, so seriously it worked pretty good , believe me :)

So now on the model value , we should try the testing dataset and that's as well straightforward -

```
pred <- predict(model,testing)
```

disclaimer** - I am not a phd holder or data scientist , what I do is self interest and learning ... so it may contain some serious mistakes :)

## No comments:

Post a Comment