Acknowledgment: the materials below are partially based on Montgomery, D. C., Peck, E. A., Vining, G. G., Introduction to Linear Regression Analysis (5th Edition), Wiley Series in Probability and Statistics, 2012. This materials was initilated by Yichen Qin and modified by Tianhai Zu for teaching purpose.
Let us start with a data set that concerns the hardening of cement. In particular, the researchers were interested in learning how the composition of the cement affected the heat evolved during the hardening of the cement. Therefore, they measured and recorded the following data on 13 batches of cement:
Response y: heat evolved in calories during hardening of cement on a per gram basis
Predictor x1: % of tricalcium aluminate
Predictor x2: % of tricalcium silicate
Predictor x3: % of tetracalcium alumino ferrite
Predictor x4: % of dicalcium silicate
rm(list=ls())
hald <- read.csv("data_HaldCement.csv",h=TRUE)
head(hald)
## y x1 x2 x3 x4
## 1 78.5 7 26 6 60
## 2 74.3 1 29 15 52
## 3 104.3 11 56 8 20
## 4 87.6 11 31 8 47
## 5 95.9 7 52 6 33
## 6 109.2 11 55 9 22
pairs(hald,pch=20)
For this data set, if you want to build a model to predict y
, which variables do you want to include in the regression?
summary(lm(y~x1,data=hald))
##
## Call:
## lm(formula = y ~ x1, data = hald)
##
## Residuals:
## Min 1Q Median 3Q Max
## -16.061 -9.048 1.339 7.883 15.614
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 81.4793 4.9273 16.54 4.07e-09 ***
## x1 1.8687 0.5264 3.55 0.00455 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.73 on 11 degrees of freedom
## Multiple R-squared: 0.5339, Adjusted R-squared: 0.4916
## F-statistic: 12.6 on 1 and 11 DF, p-value: 0.004552
summary(lm(y~x2,data=hald))
##
## Call:
## lm(formula = y ~ x2, data = hald)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.752 -6.008 -1.684 3.794 21.387
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 57.4237 8.4906 6.763 3.1e-05 ***
## x2 0.7891 0.1684 4.686 0.000665 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.077 on 11 degrees of freedom
## Multiple R-squared: 0.6663, Adjusted R-squared: 0.6359
## F-statistic: 21.96 on 1 and 11 DF, p-value: 0.0006648
summary(lm(y~x3,data=hald))
##
## Call:
## lm(formula = y ~ x3, data = hald)
##
## Residuals:
## Min 1Q Median 3Q Max
## -24.168 -10.075 4.144 10.299 14.399
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 110.2027 7.9478 13.866 2.6e-08 ***
## x3 -1.2558 0.5984 -2.098 0.0598 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 13.28 on 11 degrees of freedom
## Multiple R-squared: 0.2859, Adjusted R-squared: 0.221
## F-statistic: 4.403 on 1 and 11 DF, p-value: 0.05976
summary(lm(y~x4,data=hald))
##
## Call:
## lm(formula = y ~ x4, data = hald)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.589 -8.228 1.495 4.726 17.524
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 117.5679 5.2622 22.342 1.62e-10 ***
## x4 -0.7382 0.1546 -4.775 0.000576 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.964 on 11 degrees of freedom
## Multiple R-squared: 0.6745, Adjusted R-squared: 0.645
## F-statistic: 22.8 on 1 and 11 DF, p-value: 0.0005762
summary(lm(y~x1+x2+x3+x4,data=hald))
##
## Call:
## lm(formula = y ~ x1 + x2 + x3 + x4, data = hald)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.1750 -1.6709 0.2508 1.3783 3.9254
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 62.4054 70.0710 0.891 0.3991
## x1 1.5511 0.7448 2.083 0.0708 .
## x2 0.5102 0.7238 0.705 0.5009
## x3 0.1019 0.7547 0.135 0.8959
## x4 -0.1441 0.7091 -0.203 0.8441
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.446 on 8 degrees of freedom
## Multiple R-squared: 0.9824, Adjusted R-squared: 0.9736
## F-statistic: 111.5 on 4 and 8 DF, p-value: 4.756e-07
Obviously, models with only one covariate is too weak for prediction, but the model with all covariates included have all insignificant p-values. So what is the best model?
Model-Building Problem
“Conflicting” goals in regression model building:
Want as many regressors as possible so that the “information content” in the variables will influence the fitted values
Want as few regressors as necessary because the variance of parameter estimates and fitted values will increase as the number of regressors increases and more regressors can cost more money in data collection/model maintenance
Need to find a compromise that leads to the best regression equation
Notes on Variable selection techniques.
None of the variable selection techniques can guarantee the best regression equation for the dataset of interest
The techniques may very well give different results
Complete reliance on the algorithm for results is to be avoided
Other valuable information such as experience with and knowledge of the data and problem should be utilized
Consequences of Deleting Variables
Improves the precision of the parameter estimates of retained variables
Improves the precision of the variance of the predicted response
Can induce bias into the estimates of coefficients and variance of predicted response
What if transformation is needed for some of the covariates? Here is the strategy for regression model building.
All Possible Regressions
Assume the intercept term is in all equations
If there are \(k\) regressors, we would investigate \(2^k\) possible regression equations
Use the criteria discussed to determine some candidate models
Complete regression analysis on candidate models
Criterion for variable selection
Adjusted R squared (or Prediction R squared or PRESS)
AIC or BIC
Stepwise regression
The coefficient of determination \(R^2\) is defined as
\[R^2= 1 - \frac{SSRes}{SST}\]
Large values of \(R^2\) are preferred, however, adding covariates will only increase this value.
Not a good measurement for variable selection.
For a model with k covariates, the adjusted R squared is
\[R^2_{adj}= 1- \frac{n-1}{n-k-1}\bigg(\frac{SSRes}{SST}\bigg)=1 - \frac{n-1}{n-k-1}(1-R^2)\]
This value will not necessarily increase as additional terms are introduced into the model
Large values of adjusted R2 are preferred
Adjusted R squared is a good measurement for variable selection.
Example
library(leaps)
hald_allmodels_r2 <- leaps( x=hald[,-1], y=hald[,1], method="r2")
hald_allmodels_r2
## $which
## 1 2 3 4
## 1 FALSE FALSE FALSE TRUE
## 1 FALSE TRUE FALSE FALSE
## 1 TRUE FALSE FALSE FALSE
## 1 FALSE FALSE TRUE FALSE
## 2 TRUE TRUE FALSE FALSE
## 2 TRUE FALSE FALSE TRUE
## 2 FALSE FALSE TRUE TRUE
## 2 FALSE TRUE TRUE FALSE
## 2 FALSE TRUE FALSE TRUE
## 2 TRUE FALSE TRUE FALSE
## 3 TRUE TRUE FALSE TRUE
## 3 TRUE TRUE TRUE FALSE
## 3 TRUE FALSE TRUE TRUE
## 3 FALSE TRUE TRUE TRUE
## 4 TRUE TRUE TRUE TRUE
##
## $label
## [1] "(Intercept)" "1" "2" "3" "4"
##
## $size
## [1] 2 2 2 2 3 3 3 3 3 3 4 4 4 4 5
##
## $r2
## [1] 0.6745420 0.6662683 0.5339480 0.2858727 0.9786784 0.9724710 0.9352896
## [8] 0.8470254 0.6800604 0.5481667 0.9823355 0.9822847 0.9812811 0.9728200
## [15] 0.9823756
cbind(hald_allmodels_r2$which, Rsq = hald_allmodels_r2$r2)
## 1 2 3 4 Rsq
## 1 0 0 0 1 0.6745420
## 1 0 1 0 0 0.6662683
## 1 1 0 0 0 0.5339480
## 1 0 0 1 0 0.2858727
## 2 1 1 0 0 0.9786784
## 2 1 0 0 1 0.9724710
## 2 0 0 1 1 0.9352896
## 2 0 1 1 0 0.8470254
## 2 0 1 0 1 0.6800604
## 2 1 0 1 0 0.5481667
## 3 1 1 0 1 0.9823355
## 3 1 1 1 0 0.9822847
## 3 1 0 1 1 0.9812811
## 3 0 1 1 1 0.9728200
## 4 1 1 1 1 0.9823756
hald_allmodels_r2adj <- leaps( x=hald[,-1], y=hald[,1], method="adjr2")
hald_allmodels_r2adj
## $which
## 1 2 3 4
## 1 FALSE FALSE FALSE TRUE
## 1 FALSE TRUE FALSE FALSE
## 1 TRUE FALSE FALSE FALSE
## 1 FALSE FALSE TRUE FALSE
## 2 TRUE TRUE FALSE FALSE
## 2 TRUE FALSE FALSE TRUE
## 2 FALSE FALSE TRUE TRUE
## 2 FALSE TRUE TRUE FALSE
## 2 FALSE TRUE FALSE TRUE
## 2 TRUE FALSE TRUE FALSE
## 3 TRUE TRUE FALSE TRUE
## 3 TRUE TRUE TRUE FALSE
## 3 TRUE FALSE TRUE TRUE
## 3 FALSE TRUE TRUE TRUE
## 4 TRUE TRUE TRUE TRUE
##
## $label
## [1] "(Intercept)" "1" "2" "3" "4"
##
## $size
## [1] 2 2 2 2 3 3 3 3 3 3 4 4 4 4 5
##
## $adjr2
## [1] 0.6449549 0.6359290 0.4915797 0.2209521 0.9744140 0.9669653 0.9223476
## [8] 0.8164305 0.6160725 0.4578001 0.9764473 0.9763796 0.9750415 0.9637599
## [15] 0.9735634
cbind(hald_allmodels_r2adj$which, adjRsq = hald_allmodels_r2adj$adjr2)
## 1 2 3 4 adjRsq
## 1 0 0 0 1 0.6449549
## 1 0 1 0 0 0.6359290
## 1 1 0 0 0 0.4915797
## 1 0 0 1 0 0.2209521
## 2 1 1 0 0 0.9744140
## 2 1 0 0 1 0.9669653
## 2 0 0 1 1 0.9223476
## 2 0 1 1 0 0.8164305
## 2 0 1 0 1 0.6160725
## 2 1 0 1 0 0.4578001
## 3 1 1 0 1 0.9764473
## 3 1 1 1 0 0.9763796
## 3 1 0 1 1 0.9750415
## 3 0 1 1 1 0.9637599
## 4 1 1 1 1 0.9735634
According adjusted R squared, ## 3 1 1 0 1 0.9764473
corresponds to the best model. That is, x1
, x2
, and x4
should be included in the regression.
Another example:
asphalt <- read.csv("data_Asphalt.csv",header=TRUE)
head(asphalt)
## Observation y x1 x2 x3 x4 x5 x6
## 1 1 6.75 2.8 4.68 4.87 0 8.4 4.916
## 2 2 13.00 1.4 5.19 4.50 0 6.5 4.563
## 3 3 14.75 1.4 4.82 4.73 0 7.9 5.321
## 4 4 12.60 3.3 4.85 4.76 0 8.3 4.865
## 5 5 8.25 1.7 4.86 4.95 0 8.4 3.776
## 6 6 10.67 2.9 5.16 4.45 0 7.4 4.397
pairs(asphalt,pch=20)
asphalt_allmodels_r2 <- leaps(x=asphalt[,3:8],y=asphalt[,2], method="r2")
asphalt_allmodels_r2
## $which
## 1 2 3 4 5 6
## 1 FALSE FALSE FALSE TRUE FALSE FALSE
## 1 TRUE FALSE FALSE FALSE FALSE FALSE
## 1 FALSE FALSE FALSE FALSE FALSE TRUE
## 1 FALSE FALSE FALSE FALSE TRUE FALSE
## 1 FALSE TRUE FALSE FALSE FALSE FALSE
## 1 FALSE FALSE TRUE FALSE FALSE FALSE
## 2 FALSE TRUE FALSE TRUE FALSE FALSE
## 2 FALSE FALSE TRUE TRUE FALSE FALSE
## 2 FALSE FALSE FALSE TRUE FALSE TRUE
## 2 FALSE FALSE FALSE TRUE TRUE FALSE
## 2 TRUE FALSE FALSE TRUE FALSE FALSE
## 2 TRUE TRUE FALSE FALSE FALSE FALSE
## 2 TRUE FALSE FALSE FALSE FALSE TRUE
## 2 TRUE FALSE FALSE FALSE TRUE FALSE
## 2 TRUE FALSE TRUE FALSE FALSE FALSE
## 2 FALSE FALSE FALSE FALSE TRUE TRUE
## 3 FALSE TRUE TRUE TRUE FALSE FALSE
## 3 FALSE TRUE FALSE TRUE TRUE FALSE
## 3 FALSE TRUE FALSE TRUE FALSE TRUE
## 3 FALSE FALSE TRUE TRUE FALSE TRUE
## 3 TRUE TRUE FALSE TRUE FALSE FALSE
## 3 FALSE FALSE FALSE TRUE TRUE TRUE
## 3 FALSE FALSE TRUE TRUE TRUE FALSE
## 3 TRUE FALSE TRUE TRUE FALSE FALSE
## 3 TRUE FALSE FALSE TRUE FALSE TRUE
## 3 TRUE FALSE FALSE TRUE TRUE FALSE
## 4 FALSE TRUE TRUE TRUE TRUE FALSE
## 4 FALSE TRUE TRUE TRUE FALSE TRUE
## 4 TRUE TRUE TRUE TRUE FALSE FALSE
## 4 FALSE TRUE FALSE TRUE TRUE TRUE
## 4 TRUE TRUE FALSE TRUE TRUE FALSE
## 4 FALSE FALSE TRUE TRUE TRUE TRUE
## 4 TRUE TRUE FALSE TRUE FALSE TRUE
## 4 TRUE FALSE TRUE TRUE FALSE TRUE
## 4 TRUE FALSE FALSE TRUE TRUE TRUE
## 4 TRUE FALSE TRUE TRUE TRUE FALSE
## 5 FALSE TRUE TRUE TRUE TRUE TRUE
## 5 TRUE TRUE TRUE TRUE TRUE FALSE
## 5 TRUE TRUE TRUE TRUE FALSE TRUE
## 5 TRUE TRUE FALSE TRUE TRUE TRUE
## 5 TRUE FALSE TRUE TRUE TRUE TRUE
## 5 TRUE TRUE TRUE FALSE TRUE TRUE
## 6 TRUE TRUE TRUE TRUE TRUE TRUE
##
## $label
## [1] "(Intercept)" "1" "2" "3" "4"
## [6] "5" "6"
##
## $size
## [1] 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 6 6
## [39] 6 6 6 6 7
##
## $r2
## [1] 0.65755827 0.31095511 0.17970996 0.07244760 0.05871311 0.04418471
## [7] 0.67988500 0.67054203 0.66837794 0.66422324 0.65895778 0.34851619
## [13] 0.34222182 0.33205132 0.32205656 0.28241387 0.70161258 0.69572645
## [19] 0.68288526 0.68193516 0.68110172 0.68035604 0.67189893 0.67096105
## [25] 0.66840245 0.66507011 0.70709701 0.70382967 0.70175221 0.70146515
## [31] 0.69616264 0.68625513 0.68320263 0.68215000 0.68062906 0.67225423
## [37] 0.71103759 0.70715638 0.70384365 0.70150575 0.68682460 0.41732348
## [43] 0.71125649
cbind(asphalt_allmodels_r2$which,Rsq=asphalt_allmodels_r2$r2)
## 1 2 3 4 5 6 Rsq
## 1 0 0 0 1 0 0 0.65755827
## 1 1 0 0 0 0 0 0.31095511
## 1 0 0 0 0 0 1 0.17970996
## 1 0 0 0 0 1 0 0.07244760
## 1 0 1 0 0 0 0 0.05871311
## 1 0 0 1 0 0 0 0.04418471
## 2 0 1 0 1 0 0 0.67988500
## 2 0 0 1 1 0 0 0.67054203
## 2 0 0 0 1 0 1 0.66837794
## 2 0 0 0 1 1 0 0.66422324
## 2 1 0 0 1 0 0 0.65895778
## 2 1 1 0 0 0 0 0.34851619
## 2 1 0 0 0 0 1 0.34222182
## 2 1 0 0 0 1 0 0.33205132
## 2 1 0 1 0 0 0 0.32205656
## 2 0 0 0 0 1 1 0.28241387
## 3 0 1 1 1 0 0 0.70161258
## 3 0 1 0 1 1 0 0.69572645
## 3 0 1 0 1 0 1 0.68288526
## 3 0 0 1 1 0 1 0.68193516
## 3 1 1 0 1 0 0 0.68110172
## 3 0 0 0 1 1 1 0.68035604
## 3 0 0 1 1 1 0 0.67189893
## 3 1 0 1 1 0 0 0.67096105
## 3 1 0 0 1 0 1 0.66840245
## 3 1 0 0 1 1 0 0.66507011
## 4 0 1 1 1 1 0 0.70709701
## 4 0 1 1 1 0 1 0.70382967
## 4 1 1 1 1 0 0 0.70175221
## 4 0 1 0 1 1 1 0.70146515
## 4 1 1 0 1 1 0 0.69616264
## 4 0 0 1 1 1 1 0.68625513
## 4 1 1 0 1 0 1 0.68320263
## 4 1 0 1 1 0 1 0.68215000
## 4 1 0 0 1 1 1 0.68062906
## 4 1 0 1 1 1 0 0.67225423
## 5 0 1 1 1 1 1 0.71103759
## 5 1 1 1 1 1 0 0.70715638
## 5 1 1 1 1 0 1 0.70384365
## 5 1 1 0 1 1 1 0.70150575
## 5 1 0 1 1 1 1 0.68682460
## 5 1 1 1 0 1 1 0.41732348
## 6 1 1 1 1 1 1 0.71125649
asphalt_allmodels_r2adj <- leaps(x=asphalt[,3:8],y=asphalt[,2], method="adjr2")
asphalt_allmodels_r2adj
## $which
## 1 2 3 4 5 6
## 1 FALSE FALSE FALSE TRUE FALSE FALSE
## 1 TRUE FALSE FALSE FALSE FALSE FALSE
## 1 FALSE FALSE FALSE FALSE FALSE TRUE
## 1 FALSE FALSE FALSE FALSE TRUE FALSE
## 1 FALSE TRUE FALSE FALSE FALSE FALSE
## 1 FALSE FALSE TRUE FALSE FALSE FALSE
## 2 FALSE TRUE FALSE TRUE FALSE FALSE
## 2 FALSE FALSE TRUE TRUE FALSE FALSE
## 2 FALSE FALSE FALSE TRUE FALSE TRUE
## 2 FALSE FALSE FALSE TRUE TRUE FALSE
## 2 TRUE FALSE FALSE TRUE FALSE FALSE
## 2 TRUE TRUE FALSE FALSE FALSE FALSE
## 2 TRUE FALSE FALSE FALSE FALSE TRUE
## 2 TRUE FALSE FALSE FALSE TRUE FALSE
## 2 TRUE FALSE TRUE FALSE FALSE FALSE
## 2 FALSE FALSE FALSE FALSE TRUE TRUE
## 3 FALSE TRUE TRUE TRUE FALSE FALSE
## 3 FALSE TRUE FALSE TRUE TRUE FALSE
## 3 FALSE TRUE FALSE TRUE FALSE TRUE
## 3 FALSE FALSE TRUE TRUE FALSE TRUE
## 3 TRUE TRUE FALSE TRUE FALSE FALSE
## 3 FALSE FALSE FALSE TRUE TRUE TRUE
## 3 FALSE FALSE TRUE TRUE TRUE FALSE
## 3 TRUE FALSE TRUE TRUE FALSE FALSE
## 3 TRUE FALSE FALSE TRUE FALSE TRUE
## 3 TRUE FALSE FALSE TRUE TRUE FALSE
## 4 FALSE TRUE TRUE TRUE TRUE FALSE
## 4 FALSE TRUE TRUE TRUE FALSE TRUE
## 4 TRUE TRUE TRUE TRUE FALSE FALSE
## 4 FALSE TRUE FALSE TRUE TRUE TRUE
## 4 TRUE TRUE FALSE TRUE TRUE FALSE
## 4 FALSE FALSE TRUE TRUE TRUE TRUE
## 4 TRUE TRUE FALSE TRUE FALSE TRUE
## 4 TRUE FALSE TRUE TRUE FALSE TRUE
## 4 TRUE FALSE FALSE TRUE TRUE TRUE
## 4 TRUE FALSE TRUE TRUE TRUE FALSE
## 5 FALSE TRUE TRUE TRUE TRUE TRUE
## 5 TRUE TRUE TRUE TRUE TRUE FALSE
## 5 TRUE TRUE TRUE TRUE FALSE TRUE
## 5 TRUE TRUE FALSE TRUE TRUE TRUE
## 5 TRUE FALSE TRUE TRUE TRUE TRUE
## 5 TRUE TRUE TRUE FALSE TRUE TRUE
## 6 TRUE TRUE TRUE TRUE TRUE TRUE
##
## $label
## [1] "(Intercept)" "1" "2" "3" "4"
## [6] "5" "6"
##
## $size
## [1] 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 6 6
## [39] 6 6 6 6 7
##
## $adjr2
## [1] 0.64574994 0.28719494 0.15142410 0.04046303 0.02625494 0.01122557
## [7] 0.65701964 0.64700932 0.64469065 0.64023918 0.63459762 0.30198163
## [13] 0.29523766 0.28434070 0.27363203 0.23115772 0.66845842 0.66191828
## [19] 0.64765029 0.64659462 0.64566858 0.64484005 0.63544326 0.63440116
## [25] 0.63155828 0.62785567 0.66203501 0.65826501 0.65586793 0.65553671
## [31] 0.64941843 0.63798669 0.63446457 0.63325000 0.63149507 0.62183181
## [37] 0.65324511 0.64858766 0.64461237 0.64180690 0.62418952 0.30078818
## [43] 0.63907062
cbind(asphalt_allmodels_r2adj$which,adjRsq=asphalt_allmodels_r2adj$adjr2)
## 1 2 3 4 5 6 adjRsq
## 1 0 0 0 1 0 0 0.64574994
## 1 1 0 0 0 0 0 0.28719494
## 1 0 0 0 0 0 1 0.15142410
## 1 0 0 0 0 1 0 0.04046303
## 1 0 1 0 0 0 0 0.02625494
## 1 0 0 1 0 0 0 0.01122557
## 2 0 1 0 1 0 0 0.65701964
## 2 0 0 1 1 0 0 0.64700932
## 2 0 0 0 1 0 1 0.64469065
## 2 0 0 0 1 1 0 0.64023918
## 2 1 0 0 1 0 0 0.63459762
## 2 1 1 0 0 0 0 0.30198163
## 2 1 0 0 0 0 1 0.29523766
## 2 1 0 0 0 1 0 0.28434070
## 2 1 0 1 0 0 0 0.27363203
## 2 0 0 0 0 1 1 0.23115772
## 3 0 1 1 1 0 0 0.66845842
## 3 0 1 0 1 1 0 0.66191828
## 3 0 1 0 1 0 1 0.64765029
## 3 0 0 1 1 0 1 0.64659462
## 3 1 1 0 1 0 0 0.64566858
## 3 0 0 0 1 1 1 0.64484005
## 3 0 0 1 1 1 0 0.63544326
## 3 1 0 1 1 0 0 0.63440116
## 3 1 0 0 1 0 1 0.63155828
## 3 1 0 0 1 1 0 0.62785567
## 4 0 1 1 1 1 0 0.66203501
## 4 0 1 1 1 0 1 0.65826501
## 4 1 1 1 1 0 0 0.65586793
## 4 0 1 0 1 1 1 0.65553671
## 4 1 1 0 1 1 0 0.64941843
## 4 0 0 1 1 1 1 0.63798669
## 4 1 1 0 1 0 1 0.63446457
## 4 1 0 1 1 0 1 0.63325000
## 4 1 0 0 1 1 1 0.63149507
## 4 1 0 1 1 1 0 0.62183181
## 5 0 1 1 1 1 1 0.65324511
## 5 1 1 1 1 1 0 0.64858766
## 5 1 1 1 1 0 1 0.64461237
## 5 1 1 0 1 1 1 0.64180690
## 5 1 0 1 1 1 1 0.62418952
## 5 1 1 1 0 1 1 0.30078818
## 6 1 1 1 1 1 1 0.63907062
A third example:
n=1000
x1=rnorm(n)
x2=rnorm(n)
x3=rnorm(n)
x4=rnorm(n)
y=1+2*x1+3*x2+rnorm(n)
results=leaps( x=cbind(x1,x2,x3,x4), y=y, method="r2")
cbind(results$which, Rsq=results$r2)
## 1 2 3 4 Rsq
## 1 0 1 0 0 0.6097956074
## 1 1 0 0 0 0.2954778659
## 1 0 0 0 1 0.0012368128
## 1 0 0 1 0 0.0003920018
## 2 1 1 0 0 0.9261756045
## 2 0 1 1 0 0.6103199511
## 2 0 1 0 1 0.6098169039
## 2 1 0 0 1 0.2962985076
## 2 1 0 1 0 0.2955210672
## 2 0 0 1 1 0.0016020574
## 3 1 1 1 0 0.9261944546
## 3 1 1 0 1 0.9261827416
## 3 0 1 1 1 0.6103372699
## 3 1 0 1 1 0.2963492196
## 4 1 1 1 1 0.9262011552
results2=leaps( x=cbind(x1,x2,x3,x4), y=y, method="adjr2")
cbind(results2$which, adjRsq=results2$adjr2)
## 1 2 3 4 adjRsq
## 1 0 1 0 0 0.6094046210
## 1 1 0 0 0 0.2947719319
## 1 0 0 0 1 0.0002360481
## 1 0 0 1 0 -0.0006096094
## 2 1 1 0 0 0.9260275114
## 2 0 1 1 0 0.6095382459
## 2 0 1 0 1 0.6090341895
## 2 1 0 0 1 0.2948868697
## 2 1 0 1 0 0.2941078697
## 2 0 0 1 1 -0.0004007469
## 3 1 1 1 0 0.9259721487
## 3 1 1 0 1 0.9259604005
## 3 0 1 1 1 0.6091635869
## 3 1 0 1 1 0.2942297895
## 4 1 1 1 1 0.9259044764
AIC: Akaike’s Information Criterion
\[ AIC = n \log\bigg(\frac{SSRes}{n}\bigg) + 2p\]
BIC: Bayesian Information Criterion
\[ BIC = n \log\bigg(\frac{SSRes}{n}\bigg) + p \log(n)\]
\(p\) is the number of parameters in the model. In linear regression, \(p=k+1\), i.e., intercept plus slopes.
The models with small values of AIC or BIC are preferred.
Commonly used for more complicated modeling situations
Example
hald_allmodels=regsubsets(y~x1+x2+x3+x4, data=hald, nbest=4)
summary(hald_allmodels)
## Subset selection object
## Call: regsubsets.formula(y ~ x1 + x2 + x3 + x4, data = hald, nbest = 4)
## 4 Variables (and intercept)
## Forced in Forced out
## x1 FALSE FALSE
## x2 FALSE FALSE
## x3 FALSE FALSE
## x4 FALSE FALSE
## 4 subsets of each size up to 4
## Selection Algorithm: exhaustive
## x1 x2 x3 x4
## 1 ( 1 ) " " " " " " "*"
## 1 ( 2 ) " " "*" " " " "
## 1 ( 3 ) "*" " " " " " "
## 1 ( 4 ) " " " " "*" " "
## 2 ( 1 ) "*" "*" " " " "
## 2 ( 2 ) "*" " " " " "*"
## 2 ( 3 ) " " " " "*" "*"
## 2 ( 4 ) " " "*" "*" " "
## 3 ( 1 ) "*" "*" " " "*"
## 3 ( 2 ) "*" "*" "*" " "
## 3 ( 3 ) "*" " " "*" "*"
## 3 ( 4 ) " " "*" "*" "*"
## 4 ( 1 ) "*" "*" "*" "*"
plot(hald_allmodels, scale="bic")
plot(hald_allmodels, scale="adjr2")
Another example
asphalt_allmodels <- regsubsets(y~x1+x2+x3+x4+x5+x6,data=asphalt,nbest=3)
summary(asphalt_allmodels)
## Subset selection object
## Call: regsubsets.formula(y ~ x1 + x2 + x3 + x4 + x5 + x6, data = asphalt,
## nbest = 3)
## 6 Variables (and intercept)
## Forced in Forced out
## x1 FALSE FALSE
## x2 FALSE FALSE
## x3 FALSE FALSE
## x4 FALSE FALSE
## x5 FALSE FALSE
## x6 FALSE FALSE
## 3 subsets of each size up to 6
## Selection Algorithm: exhaustive
## x1 x2 x3 x4 x5 x6
## 1 ( 1 ) " " " " " " "*" " " " "
## 1 ( 2 ) "*" " " " " " " " " " "
## 1 ( 3 ) " " " " " " " " " " "*"
## 2 ( 1 ) " " "*" " " "*" " " " "
## 2 ( 2 ) " " " " "*" "*" " " " "
## 2 ( 3 ) " " " " " " "*" " " "*"
## 3 ( 1 ) " " "*" "*" "*" " " " "
## 3 ( 2 ) " " "*" " " "*" "*" " "
## 3 ( 3 ) " " "*" " " "*" " " "*"
## 4 ( 1 ) " " "*" "*" "*" "*" " "
## 4 ( 2 ) " " "*" "*" "*" " " "*"
## 4 ( 3 ) "*" "*" "*" "*" " " " "
## 5 ( 1 ) " " "*" "*" "*" "*" "*"
## 5 ( 2 ) "*" "*" "*" "*" "*" " "
## 5 ( 3 ) "*" "*" "*" "*" " " "*"
## 6 ( 1 ) "*" "*" "*" "*" "*" "*"
plot(asphalt_allmodels)
A third example
n=1000
x1=rnorm(n)
x2=rnorm(n)
x3=rnorm(n)
x4=rnorm(n)
y=1+2*x1+3*x2+rnorm(n)
m1=lm(y~x1)
m2=lm(y~x1+x2)
m3=lm(y~x1+x2+x3)
m4=lm(y~x1+x2+x3+x4)
BIC(m1)
## [1] 5190.934
BIC(m2)
## [1] 2842.267
BIC(m3)
## [1] 2847.547
BIC(m4)
## [1] 2854.439
AIC(m1)
## [1] 5176.211
AIC(m2)
## [1] 2822.636
AIC(m3)
## [1] 2823.008
AIC(m4)
## [1] 2824.992
Stepwise regression methods include:
Forward Selection
Backward Elimination
Stepwise Regression (combination of forward and backward)
Forward Selection
Procedure is based on the idea that no variables are in the model originally, but are added one at a time
The first regressor selected to be entered into the model is the one with the highest correlation with the response
If the F test corresponding to the model containing this variable is significant (larger than some predetermined value), then that regressor is left in the model
The second regressor examined is the one with the largest partial correlation with the response. If the F-statistic corresponding to the addition of this variable is significant, the regressor is retained
This process continues until all regressors are examined
Example Hald data set
# forward selection using F
add1(lm(y~1,data=hald), y~x1+x2+x3+x4, test="F")
## Single term additions
##
## Model:
## y ~ 1
## Df Sum of Sq RSS AIC F value Pr(>F)
## <none> 2715.76 71.444
## x1 1 1450.08 1265.69 63.519 12.6025 0.0045520 **
## x2 1 1809.43 906.34 59.178 21.9606 0.0006648 ***
## x3 1 776.36 1939.40 69.067 4.4034 0.0597623 .
## x4 1 1831.90 883.87 58.852 22.7985 0.0005762 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
add1(lm(y~x4,data=hald), y~x1+x2+x3+x4, test="F")
## Single term additions
##
## Model:
## y ~ x4
## Df Sum of Sq RSS AIC F value Pr(>F)
## <none> 883.87 58.852
## x1 1 809.10 74.76 28.742 108.2239 1.105e-06 ***
## x2 1 14.99 868.88 60.629 0.1725 0.6867
## x3 1 708.13 175.74 39.853 40.2946 8.375e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
add1(lm(y~x1+x4,data=hald), y~x1+x2+x3+x4, test="F")
## Single term additions
##
## Model:
## y ~ x1 + x4
## Df Sum of Sq RSS AIC F value Pr(>F)
## <none> 74.762 28.742
## x2 1 26.789 47.973 24.974 5.0259 0.05169 .
## x3 1 23.926 50.836 25.728 4.2358 0.06969 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
End of Example
Backward Elimination
Procedure is based on the idea that all variables are in the model originally and examined one at a time and removed if not significant
The partial F statistic is calculated for each variable as if it were the last one added to the model
The regressor with the smallest F statistic is examined first and will be removed if this value is less than some predetermined value Fout
If this regressor is removed, then the model is refit with the remaining regressor variables and the partial F statistics calculated again
The regressor with the smallest partial F statistic will be removed if that value is less than Fout.
The process continues until all regressors are examined
Example Hald data set
# backward selection using F
drop1(lm(y~x1+x2+x3+x4,data=hald), test="F")
## Single term deletions
##
## Model:
## y ~ x1 + x2 + x3 + x4
## Df Sum of Sq RSS AIC F value Pr(>F)
## <none> 47.864 26.944
## x1 1 25.9509 73.815 30.576 4.3375 0.07082 .
## x2 1 2.9725 50.836 25.728 0.4968 0.50090
## x3 1 0.1091 47.973 24.974 0.0182 0.89592
## x4 1 0.2470 48.111 25.011 0.0413 0.84407
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
drop1(lm(y~x1+x2+x4,data=hald), test="F")
## Single term deletions
##
## Model:
## y ~ x1 + x2 + x4
## Df Sum of Sq RSS AIC F value Pr(>F)
## <none> 47.97 24.974
## x1 1 820.91 868.88 60.629 154.0076 5.781e-07 ***
## x2 1 26.79 74.76 28.742 5.0259 0.05169 .
## x4 1 9.93 57.90 25.420 1.8633 0.20540
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
drop1(lm(y~x1+x2,data=hald), test="F")
## Single term deletions
##
## Model:
## y ~ x1 + x2
## Df Sum of Sq RSS AIC F value Pr(>F)
## <none> 57.90 25.420
## x1 1 848.43 906.34 59.178 146.52 2.692e-07 ***
## x2 1 1207.78 1265.69 63.519 208.58 5.029e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
End of Example
Stepwise Regression
This procedure is a modification of forward selection
The contribution of each regressor variable that is put into the model is reassessed by way of its partial F statistic
A regressor that makes it into the model, may also be removed it if is found to be insignificant with the addition of other variables to the model
If the partial F-statistic is less than Fout, the variable will be removed.
Stepwise requires both an Fin value and Fout value
Stepwise Regression Methods
No one model may be the “best”
The three stepwise techniques could result in different models
Inexperienced analysts may use the final model simply because the procedure spit it out.
Example Hald data set
# stepwise selection using F
add1(lm(y~1,data=hald), y~x1+x2+x3+x4, test="F")
## Single term additions
##
## Model:
## y ~ 1
## Df Sum of Sq RSS AIC F value Pr(>F)
## <none> 2715.76 71.444
## x1 1 1450.08 1265.69 63.519 12.6025 0.0045520 **
## x2 1 1809.43 906.34 59.178 21.9606 0.0006648 ***
## x3 1 776.36 1939.40 69.067 4.4034 0.0597623 .
## x4 1 1831.90 883.87 58.852 22.7985 0.0005762 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
add1(lm(y~x4,data=hald), y~x1+x2+x3+x4, test="F")
## Single term additions
##
## Model:
## y ~ x4
## Df Sum of Sq RSS AIC F value Pr(>F)
## <none> 883.87 58.852
## x1 1 809.10 74.76 28.742 108.2239 1.105e-06 ***
## x2 1 14.99 868.88 60.629 0.1725 0.6867
## x3 1 708.13 175.74 39.853 40.2946 8.375e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
add1(lm(y~x1+x4,data=hald), y~x1+x2+x3+x4, test="F")
## Single term additions
##
## Model:
## y ~ x1 + x4
## Df Sum of Sq RSS AIC F value Pr(>F)
## <none> 74.762 28.742
## x2 1 26.789 47.973 24.974 5.0259 0.05169 .
## x3 1 23.926 50.836 25.728 4.2358 0.06969 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
add1(lm(y~x1+x2+x4,data=hald), y~x1+x2+x3+x4, test="F")
## Single term additions
##
## Model:
## y ~ x1 + x2 + x4
## Df Sum of Sq RSS AIC F value Pr(>F)
## <none> 47.973 24.974
## x3 1 0.10909 47.864 26.944 0.0182 0.8959
drop1(lm(y~x1+x2+x4,data=hald), test="F")
## Single term deletions
##
## Model:
## y ~ x1 + x2 + x4
## Df Sum of Sq RSS AIC F value Pr(>F)
## <none> 47.97 24.974
## x1 1 820.91 868.88 60.629 154.0076 5.781e-07 ***
## x2 1 26.79 74.76 28.742 5.0259 0.05169 .
## x4 1 9.93 57.90 25.420 1.8633 0.20540
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
drop1(lm(y~x1+x2,data=hald), test="F")
## Single term deletions
##
## Model:
## y ~ x1 + x2
## Df Sum of Sq RSS AIC F value Pr(>F)
## <none> 57.90 25.420
## x1 1 848.43 906.34 59.178 146.52 2.692e-07 ***
## x2 1 1207.78 1265.69 63.519 208.58 5.029e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Similary, we could use AIC for stepwise regression.
# stepwise selection using AIC
null=lm(y~1, data=hald)
full=lm(y~., data=hald)
step(null, scope=list(lower=null, upper=full), direction="forward")
## Start: AIC=71.44
## y ~ 1
##
## Df Sum of Sq RSS AIC
## + x4 1 1831.90 883.87 58.852
## + x2 1 1809.43 906.34 59.178
## + x1 1 1450.08 1265.69 63.519
## + x3 1 776.36 1939.40 69.067
## <none> 2715.76 71.444
##
## Step: AIC=58.85
## y ~ x4
##
## Df Sum of Sq RSS AIC
## + x1 1 809.10 74.76 28.742
## + x3 1 708.13 175.74 39.853
## <none> 883.87 58.852
## + x2 1 14.99 868.88 60.629
##
## Step: AIC=28.74
## y ~ x4 + x1
##
## Df Sum of Sq RSS AIC
## + x2 1 26.789 47.973 24.974
## + x3 1 23.926 50.836 25.728
## <none> 74.762 28.742
##
## Step: AIC=24.97
## y ~ x4 + x1 + x2
##
## Df Sum of Sq RSS AIC
## <none> 47.973 24.974
## + x3 1 0.10909 47.864 26.944
##
## Call:
## lm(formula = y ~ x4 + x1 + x2, data = hald)
##
## Coefficients:
## (Intercept) x4 x1 x2
## 71.6483 -0.2365 1.4519 0.4161
step(full, scope=list(lower=null, upper=full), direction="backward")
## Start: AIC=26.94
## y ~ x1 + x2 + x3 + x4
##
## Df Sum of Sq RSS AIC
## - x3 1 0.1091 47.973 24.974
## - x4 1 0.2470 48.111 25.011
## - x2 1 2.9725 50.836 25.728
## <none> 47.864 26.944
## - x1 1 25.9509 73.815 30.576
##
## Step: AIC=24.97
## y ~ x1 + x2 + x4
##
## Df Sum of Sq RSS AIC
## <none> 47.97 24.974
## - x4 1 9.93 57.90 25.420
## - x2 1 26.79 74.76 28.742
## - x1 1 820.91 868.88 60.629
##
## Call:
## lm(formula = y ~ x1 + x2 + x4, data = hald)
##
## Coefficients:
## (Intercept) x1 x2 x4
## 71.6483 1.4519 0.4161 -0.2365
step(full, scope=list(lower=null, upper=full), direction="both")
## Start: AIC=26.94
## y ~ x1 + x2 + x3 + x4
##
## Df Sum of Sq RSS AIC
## - x3 1 0.1091 47.973 24.974
## - x4 1 0.2470 48.111 25.011
## - x2 1 2.9725 50.836 25.728
## <none> 47.864 26.944
## - x1 1 25.9509 73.815 30.576
##
## Step: AIC=24.97
## y ~ x1 + x2 + x4
##
## Df Sum of Sq RSS AIC
## <none> 47.97 24.974
## - x4 1 9.93 57.90 25.420
## + x3 1 0.11 47.86 26.944
## - x2 1 26.79 74.76 28.742
## - x1 1 820.91 868.88 60.629
##
## Call:
## lm(formula = y ~ x1 + x2 + x4, data = hald)
##
## Coefficients:
## (Intercept) x1 x2 x4
## 71.6483 1.4519 0.4161 -0.2365
step(null, scope=list(lower=null, upper=full), direction="both")
## Start: AIC=71.44
## y ~ 1
##
## Df Sum of Sq RSS AIC
## + x4 1 1831.90 883.87 58.852
## + x2 1 1809.43 906.34 59.178
## + x1 1 1450.08 1265.69 63.519
## + x3 1 776.36 1939.40 69.067
## <none> 2715.76 71.444
##
## Step: AIC=58.85
## y ~ x4
##
## Df Sum of Sq RSS AIC
## + x1 1 809.10 74.76 28.742
## + x3 1 708.13 175.74 39.853
## <none> 883.87 58.852
## + x2 1 14.99 868.88 60.629
## - x4 1 1831.90 2715.76 71.444
##
## Step: AIC=28.74
## y ~ x4 + x1
##
## Df Sum of Sq RSS AIC
## + x2 1 26.79 47.97 24.974
## + x3 1 23.93 50.84 25.728
## <none> 74.76 28.742
## - x1 1 809.10 883.87 58.852
## - x4 1 1190.92 1265.69 63.519
##
## Step: AIC=24.97
## y ~ x4 + x1 + x2
##
## Df Sum of Sq RSS AIC
## <none> 47.97 24.974
## - x4 1 9.93 57.90 25.420
## + x3 1 0.11 47.86 26.944
## - x2 1 26.79 74.76 28.742
## - x1 1 820.91 868.88 60.629
##
## Call:
## lm(formula = y ~ x4 + x1 + x2, data = hald)
##
## Coefficients:
## (Intercept) x4 x1 x2
## 71.6483 -0.2365 1.4519 0.4161
Similary, we could use BIC for stepwise regression.
# stepwise selection using BIC
n=dim(hald)[1]
step(null, scope=list(lower=null, upper=full), k = log(n), direction="forward")
## Start: AIC=72.01
## y ~ 1
##
## Df Sum of Sq RSS AIC
## + x4 1 1831.90 883.87 59.982
## + x2 1 1809.43 906.34 60.308
## + x1 1 1450.08 1265.69 64.649
## + x3 1 776.36 1939.40 70.197
## <none> 2715.76 72.009
##
## Step: AIC=59.98
## y ~ x4
##
## Df Sum of Sq RSS AIC
## + x1 1 809.10 74.76 30.437
## + x3 1 708.13 175.74 41.547
## <none> 883.87 59.982
## + x2 1 14.99 868.88 62.324
##
## Step: AIC=30.44
## y ~ x4 + x1
##
## Df Sum of Sq RSS AIC
## + x2 1 26.789 47.973 27.234
## + x3 1 23.926 50.836 27.987
## <none> 74.762 30.437
##
## Step: AIC=27.23
## y ~ x4 + x1 + x2
##
## Df Sum of Sq RSS AIC
## <none> 47.973 27.234
## + x3 1 0.10909 47.864 29.769
##
## Call:
## lm(formula = y ~ x4 + x1 + x2, data = hald)
##
## Coefficients:
## (Intercept) x4 x1 x2
## 71.6483 -0.2365 1.4519 0.4161
step(full, scope=list(lower=null, upper=full), k = log(n), direction="backward")
## Start: AIC=29.77
## y ~ x1 + x2 + x3 + x4
##
## Df Sum of Sq RSS AIC
## - x3 1 0.1091 47.973 27.234
## - x4 1 0.2470 48.111 27.271
## - x2 1 2.9725 50.836 27.987
## <none> 47.864 29.769
## - x1 1 25.9509 73.815 32.836
##
## Step: AIC=27.23
## y ~ x1 + x2 + x4
##
## Df Sum of Sq RSS AIC
## - x4 1 9.93 57.90 27.115
## <none> 47.97 27.234
## - x2 1 26.79 74.76 30.437
## - x1 1 820.91 868.88 62.324
##
## Step: AIC=27.11
## y ~ x1 + x2
##
## Df Sum of Sq RSS AIC
## <none> 57.90 27.115
## - x1 1 848.43 906.34 60.308
## - x2 1 1207.78 1265.69 64.649
##
## Call:
## lm(formula = y ~ x1 + x2, data = hald)
##
## Coefficients:
## (Intercept) x1 x2
## 52.5773 1.4683 0.6623
step(full, scope=list(lower=null, upper=full), k = log(n), direction="both")
## Start: AIC=29.77
## y ~ x1 + x2 + x3 + x4
##
## Df Sum of Sq RSS AIC
## - x3 1 0.1091 47.973 27.234
## - x4 1 0.2470 48.111 27.271
## - x2 1 2.9725 50.836 27.987
## <none> 47.864 29.769
## - x1 1 25.9509 73.815 32.836
##
## Step: AIC=27.23
## y ~ x1 + x2 + x4
##
## Df Sum of Sq RSS AIC
## - x4 1 9.93 57.90 27.115
## <none> 47.97 27.234
## + x3 1 0.11 47.86 29.769
## - x2 1 26.79 74.76 30.437
## - x1 1 820.91 868.88 62.324
##
## Step: AIC=27.11
## y ~ x1 + x2
##
## Df Sum of Sq RSS AIC
## <none> 57.90 27.115
## + x4 1 9.93 47.97 27.234
## + x3 1 9.79 48.11 27.271
## - x1 1 848.43 906.34 60.308
## - x2 1 1207.78 1265.69 64.649
##
## Call:
## lm(formula = y ~ x1 + x2, data = hald)
##
## Coefficients:
## (Intercept) x1 x2
## 52.5773 1.4683 0.6623
step(null, scope=list(lower=null, upper=full), k = log(n), direction="both")
## Start: AIC=72.01
## y ~ 1
##
## Df Sum of Sq RSS AIC
## + x4 1 1831.90 883.87 59.982
## + x2 1 1809.43 906.34 60.308
## + x1 1 1450.08 1265.69 64.649
## + x3 1 776.36 1939.40 70.197
## <none> 2715.76 72.009
##
## Step: AIC=59.98
## y ~ x4
##
## Df Sum of Sq RSS AIC
## + x1 1 809.10 74.76 30.437
## + x3 1 708.13 175.74 41.547
## <none> 883.87 59.982
## + x2 1 14.99 868.88 62.324
## - x4 1 1831.90 2715.76 72.009
##
## Step: AIC=30.44
## y ~ x4 + x1
##
## Df Sum of Sq RSS AIC
## + x2 1 26.79 47.97 27.234
## + x3 1 23.93 50.84 27.987
## <none> 74.76 30.437
## - x1 1 809.10 883.87 59.982
## - x4 1 1190.92 1265.69 64.649
##
## Step: AIC=27.23
## y ~ x4 + x1 + x2
##
## Df Sum of Sq RSS AIC
## - x4 1 9.93 57.90 27.115
## <none> 47.97 27.234
## + x3 1 0.11 47.86 29.769
## - x2 1 26.79 74.76 30.437
## - x1 1 820.91 868.88 62.324
##
## Step: AIC=27.11
## y ~ x1 + x2
##
## Df Sum of Sq RSS AIC
## <none> 57.90 27.115
## + x4 1 9.93 47.97 27.234
## + x3 1 9.79 48.11 27.271
## - x1 1 848.43 906.34 60.308
## - x2 1 1207.78 1265.69 64.649
##
## Call:
## lm(formula = y ~ x1 + x2, data = hald)
##
## Coefficients:
## (Intercept) x1 x2
## 52.5773 1.4683 0.6623
LASSO stands for Least Absolute Shrinkage and Selection Operator. The algorithm is another variation of linear regression like ridge regression. We use lasso regression when we have large number of predictor variables.
Lasso regression selects variables and estimate coefficient simoutaneously, via apply L1 regularization.The L1 regularization adds a penality equivalent to the absolute of the maginitude of regression coefficients and tries to minimize them. The equation of lasso is similar to ridge regression and looks like as given below.
Traditional least square estimation: \[ \hat{\beta} = \arg\min \sum_{i=1}^{n} (y_i - \beta_0 - \sum_{j=1}^{k} \beta_j x_{ij})^2 \]
Ridge Regression estimation: \[ \hat{\beta}_\lambda = \arg\min \sum_{i=1}^{n} (y_i - \beta_0 - \sum_{j=1}^{k} \beta_j x_{ij})^2 + \lambda \sum_{j=1}^{p} \beta_j^2 \]
LASSO Regression estimation: \[ \hat{\beta}_\lambda = \arg\min \sum_{i=1}^{n} (y_i - \beta_0 - \sum_{j=1}^{k} \beta_j x_{ij})^2 + \lambda \sum_{j=1}^{p} |\beta_j | \]
The following diagram is the visual interpretation comparing OLS and lasso regression:
think about \(\beta = [5,5] and [10,0]\) in ridge regression and Lasso regression.
If \(\lambda\) = 0, We get same coefficients as linear regression If \(\lambda\) = vary large, All coefficients are shriked towards zero. \(\lambda\) controls the level of sparsity.
The LASSO is not very good at handling variables which show correlation between them and thus can sometimes show very wild behaviors.
library(glmnet)
## Loading required package: Matrix
## Loaded glmnet 3.0-2
lreg <- glmnet(x=as.matrix(hald[,-1]), y=hald[,1],alpha=1)
#rbind(lreg$lambda,lreg$beta)
matplot(lreg$lambda,t(lreg$beta),type="l",ylab="coefficient",xlab="lambda")
abline(h=0)
plot(lreg)
We can apply cross-validation to find the best cv
cv_fit <- cv.glmnet(x=as.matrix(hald[,-1]), y=hald[,1],alpha=1,nfolds = 4)
plot(cv_fit)
opt_lambda <- cv_fit$lambda.min
opt_lambda
## [1] 0.07810171
We can also predict with Lasso regression:
#only matrix form is acceptable in ridge regression
old_x <- as.matrix(hald[,-1])
y_predicted <- predict(lreg, s = opt_lambda, newx = old_x)
newx <- matrix(c(x1=15,x2=50,x3=6,x4=70),ncol=4)
predict(lreg, s = opt_lambda, newx = newx)
## 1
## [1,] 97.64065
And the in-sample mse:
mse <- mean((hald[,1] - y_predicted)^2)
mse
## [1] 3.700099
#your turn, what's the in-sample mse from the best model selected by R-square?
add1(lm(y~1,data=asphalt[,-1]), y~x1+x2+x3+x4+x5+x6, test="F")
## Single term additions
##
## Model:
## y ~ 1
## Df Sum of Sq RSS AIC F value Pr(>F)
## <none> 1366.52 119.367
## x1 1 424.93 941.59 109.821 13.0872 0.001117 **
## x2 1 80.23 1286.29 119.491 1.8089 0.189067
## x3 1 60.38 1306.14 119.966 1.3406 0.256377
## x4 1 898.57 467.95 88.146 55.6859 3.181e-08 ***
## x5 1 99.00 1267.52 119.036 2.2651 0.143134
## x6 1 245.58 1120.94 115.226 6.3533 0.017473 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
add1(lm(y~x4,data=asphalt[,-1]), y~x1+x2+x3+x4+x5+x6, test="F")
## Single term additions
##
## Model:
## y ~ x4
## Df Sum of Sq RSS AIC F value Pr(>F)
## <none> 467.95 88.146
## x1 1 1.9125 466.04 90.019 0.1149 0.7372
## x2 1 30.5099 437.44 88.056 1.9529 0.1733
## x3 1 17.7425 450.21 88.948 1.1035 0.3025
## x5 1 9.1078 458.85 89.536 0.5558 0.4622
## x6 1 14.7853 453.17 89.151 0.9135 0.3474
drop1(lm(y~x4,data=asphalt[,-1]), test="F")
## Single term deletions
##
## Model:
## y ~ x4
## Df Sum of Sq RSS AIC F value Pr(>F)
## <none> 467.95 88.146
## x4 1 898.57 1366.52 119.367 55.686 3.181e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# stepwise with AIC
null=lm(y~1, data=asphalt[,-1])
full=lm(y~., data=asphalt[,-1])
step(null, scope=list(lower=null, upper=full), direction="forward")
## Start: AIC=119.37
## y ~ 1
##
## Df Sum of Sq RSS AIC
## + x4 1 898.57 467.95 88.146
## + x1 1 424.93 941.59 109.821
## + x6 1 245.58 1120.94 115.226
## + x5 1 99.00 1267.52 119.036
## <none> 1366.52 119.367
## + x2 1 80.23 1286.29 119.491
## + x3 1 60.38 1306.14 119.966
##
## Step: AIC=88.15
## y ~ x4
##
## Df Sum of Sq RSS AIC
## + x2 1 30.5099 437.44 88.056
## <none> 467.95 88.146
## + x3 1 17.7425 450.21 88.948
## + x6 1 14.7853 453.17 89.151
## + x5 1 9.1078 458.85 89.536
## + x1 1 1.9125 466.04 90.019
##
## Step: AIC=88.06
## y ~ x4 + x2
##
## Df Sum of Sq RSS AIC
## + x3 1 29.6911 407.75 87.877
## <none> 437.44 88.056
## + x5 1 21.6476 415.80 88.482
## + x6 1 4.0999 433.34 89.764
## + x1 1 1.6627 435.78 89.938
##
## Step: AIC=87.88
## y ~ x4 + x2 + x3
##
## Df Sum of Sq RSS AIC
## <none> 407.75 87.877
## + x5 1 7.4946 400.26 89.302
## + x6 1 3.0297 404.72 89.646
## + x1 1 0.1908 407.56 89.862
##
## Call:
## lm(formula = y ~ x4 + x2 + x3, data = asphalt[, -1])
##
## Coefficients:
## (Intercept) x4 x2 x3
## -38.229 -10.253 4.534 5.800
step(full, scope=list(lower=null, upper=full), direction="backward")
## Start: AIC=92.86
## y ~ x1 + x2 + x3 + x4 + x5 + x6
##
## Df Sum of Sq RSS AIC
## - x1 1 0.30 394.87 90.882
## - x6 1 5.60 400.18 91.295
## - x5 1 10.13 404.70 91.644
## - x3 1 13.32 407.90 91.888
## <none> 394.57 92.858
## - x2 1 33.39 427.96 93.376
## - x4 1 401.66 796.24 112.623
##
## Step: AIC=90.88
## y ~ x2 + x3 + x4 + x5 + x6
##
## Df Sum of Sq RSS AIC
## - x6 1 5.38 400.26 89.302
## - x5 1 9.85 404.72 89.646
## - x3 1 13.08 407.95 89.892
## <none> 394.87 90.882
## - x2 1 33.87 428.74 91.433
## - x4 1 540.02 934.89 115.600
##
## Step: AIC=89.3
## y ~ x2 + x3 + x4 + x5
##
## Df Sum of Sq RSS AIC
## - x5 1 7.49 407.75 87.877
## - x3 1 15.54 415.80 88.482
## <none> 400.26 89.302
## - x2 1 48.10 448.36 90.820
## - x4 1 709.20 1109.46 118.907
##
## Step: AIC=87.88
## y ~ x2 + x3 + x4
##
## Df Sum of Sq RSS AIC
## <none> 407.75 87.877
## - x3 1 29.69 437.44 88.056
## - x2 1 42.46 450.21 88.948
## - x4 1 786.16 1193.92 119.181
##
## Call:
## lm(formula = y ~ x2 + x3 + x4, data = asphalt[, -1])
##
## Coefficients:
## (Intercept) x2 x3 x4
## -38.229 4.534 5.800 -10.253
step(full, scope=list(lower=null, upper=full), direction="both")
## Start: AIC=92.86
## y ~ x1 + x2 + x3 + x4 + x5 + x6
##
## Df Sum of Sq RSS AIC
## - x1 1 0.30 394.87 90.882
## - x6 1 5.60 400.18 91.295
## - x5 1 10.13 404.70 91.644
## - x3 1 13.32 407.90 91.888
## <none> 394.57 92.858
## - x2 1 33.39 427.96 93.376
## - x4 1 401.66 796.24 112.623
##
## Step: AIC=90.88
## y ~ x2 + x3 + x4 + x5 + x6
##
## Df Sum of Sq RSS AIC
## - x6 1 5.38 400.26 89.302
## - x5 1 9.85 404.72 89.646
## - x3 1 13.08 407.95 89.892
## <none> 394.87 90.882
## - x2 1 33.87 428.74 91.433
## + x1 1 0.30 394.57 92.858
## - x4 1 540.02 934.89 115.600
##
## Step: AIC=89.3
## y ~ x2 + x3 + x4 + x5
##
## Df Sum of Sq RSS AIC
## - x5 1 7.49 407.75 87.877
## - x3 1 15.54 415.80 88.482
## <none> 400.26 89.302
## - x2 1 48.10 448.36 90.820
## + x6 1 5.38 394.87 90.882
## + x1 1 0.08 400.18 91.295
## - x4 1 709.20 1109.46 118.907
##
## Step: AIC=87.88
## y ~ x2 + x3 + x4
##
## Df Sum of Sq RSS AIC
## <none> 407.75 87.877
## - x3 1 29.69 437.44 88.056
## - x2 1 42.46 450.21 88.948
## + x5 1 7.49 400.26 89.302
## + x6 1 3.03 404.72 89.646
## + x1 1 0.19 407.56 89.862
## - x4 1 786.16 1193.92 119.181
##
## Call:
## lm(formula = y ~ x2 + x3 + x4, data = asphalt[, -1])
##
## Coefficients:
## (Intercept) x2 x3 x4
## -38.229 4.534 5.800 -10.253
step(null, scope=list(lower=null, upper=full), direction="both")
## Start: AIC=119.37
## y ~ 1
##
## Df Sum of Sq RSS AIC
## + x4 1 898.57 467.95 88.146
## + x1 1 424.93 941.59 109.821
## + x6 1 245.58 1120.94 115.226
## + x5 1 99.00 1267.52 119.036
## <none> 1366.52 119.367
## + x2 1 80.23 1286.29 119.491
## + x3 1 60.38 1306.14 119.966
##
## Step: AIC=88.15
## y ~ x4
##
## Df Sum of Sq RSS AIC
## + x2 1 30.51 437.44 88.056
## <none> 467.95 88.146
## + x3 1 17.74 450.21 88.948
## + x6 1 14.79 453.17 89.151
## + x5 1 9.11 458.85 89.536
## + x1 1 1.91 466.04 90.019
## - x4 1 898.57 1366.52 119.367
##
## Step: AIC=88.06
## y ~ x4 + x2
##
## Df Sum of Sq RSS AIC
## + x3 1 29.69 407.75 87.877
## <none> 437.44 88.056
## - x2 1 30.51 467.95 88.146
## + x5 1 21.65 415.80 88.482
## + x6 1 4.10 433.34 89.764
## + x1 1 1.66 435.78 89.938
## - x4 1 848.84 1286.29 119.491
##
## Step: AIC=87.88
## y ~ x4 + x2 + x3
##
## Df Sum of Sq RSS AIC
## <none> 407.75 87.877
## - x3 1 29.69 437.44 88.056
## - x2 1 42.46 450.21 88.948
## + x5 1 7.49 400.26 89.302
## + x6 1 3.03 404.72 89.646
## + x1 1 0.19 407.56 89.862
## - x4 1 786.16 1193.92 119.181
##
## Call:
## lm(formula = y ~ x4 + x2 + x3, data = asphalt[, -1])
##
## Coefficients:
## (Intercept) x4 x2 x3
## -38.229 -10.253 4.534 5.800
# stepwise with BIC
n=dim(asphalt)[1]
step(null, scope=list(lower=null, upper=full), k = log(n), direction="forward")
## Start: AIC=120.8
## y ~ 1
##
## Df Sum of Sq RSS AIC
## + x4 1 898.57 467.95 91.014
## + x1 1 424.93 941.59 112.689
## + x6 1 245.58 1120.94 118.094
## <none> 1366.52 120.801
## + x5 1 99.00 1267.52 121.904
## + x2 1 80.23 1286.29 122.359
## + x3 1 60.38 1306.14 122.834
##
## Step: AIC=91.01
## y ~ x4
##
## Df Sum of Sq RSS AIC
## <none> 467.95 91.014
## + x2 1 30.5099 437.44 92.358
## + x3 1 17.7425 450.21 93.250
## + x6 1 14.7853 453.17 93.452
## + x5 1 9.1078 458.85 93.838
## + x1 1 1.9125 466.04 94.321
##
## Call:
## lm(formula = y ~ x4, data = asphalt[, -1])
##
## Coefficients:
## (Intercept) x4
## 11.72 -10.77
step(full, scope=list(lower=null, upper=full), k = log(n), direction="backward")
## Start: AIC=102.9
## y ~ x1 + x2 + x3 + x4 + x5 + x6
##
## Df Sum of Sq RSS AIC
## - x1 1 0.30 394.87 99.486
## - x6 1 5.60 400.18 99.899
## - x5 1 10.13 404.70 100.248
## - x3 1 13.32 407.90 100.492
## - x2 1 33.39 427.96 101.980
## <none> 394.57 102.896
## - x4 1 401.66 796.24 121.227
##
## Step: AIC=99.49
## y ~ x2 + x3 + x4 + x5 + x6
##
## Df Sum of Sq RSS AIC
## - x6 1 5.38 400.26 96.472
## - x5 1 9.85 404.72 96.816
## - x3 1 13.08 407.95 97.062
## - x2 1 33.87 428.74 98.603
## <none> 394.87 99.486
## - x4 1 540.02 934.89 122.770
##
## Step: AIC=96.47
## y ~ x2 + x3 + x4 + x5
##
## Df Sum of Sq RSS AIC
## - x5 1 7.49 407.75 93.613
## - x3 1 15.54 415.80 94.218
## <none> 400.26 96.472
## - x2 1 48.10 448.36 96.556
## - x4 1 709.20 1109.46 124.643
##
## Step: AIC=93.61
## y ~ x2 + x3 + x4
##
## Df Sum of Sq RSS AIC
## - x3 1 29.69 437.44 92.358
## - x2 1 42.46 450.21 93.250
## <none> 407.75 93.613
## - x4 1 786.16 1193.92 123.483
##
## Step: AIC=92.36
## y ~ x2 + x4
##
## Df Sum of Sq RSS AIC
## - x2 1 30.51 467.95 91.014
## <none> 437.44 92.358
## - x4 1 848.84 1286.29 122.359
##
## Step: AIC=91.01
## y ~ x4
##
## Df Sum of Sq RSS AIC
## <none> 467.95 91.014
## - x4 1 898.57 1366.52 120.801
##
## Call:
## lm(formula = y ~ x4, data = asphalt[, -1])
##
## Coefficients:
## (Intercept) x4
## 11.72 -10.77
step(full, scope=list(lower=null, upper=full), k = log(n), direction="both")
## Start: AIC=102.9
## y ~ x1 + x2 + x3 + x4 + x5 + x6
##
## Df Sum of Sq RSS AIC
## - x1 1 0.30 394.87 99.486
## - x6 1 5.60 400.18 99.899
## - x5 1 10.13 404.70 100.248
## - x3 1 13.32 407.90 100.492
## - x2 1 33.39 427.96 101.980
## <none> 394.57 102.896
## - x4 1 401.66 796.24 121.227
##
## Step: AIC=99.49
## y ~ x2 + x3 + x4 + x5 + x6
##
## Df Sum of Sq RSS AIC
## - x6 1 5.38 400.26 96.472
## - x5 1 9.85 404.72 96.816
## - x3 1 13.08 407.95 97.062
## - x2 1 33.87 428.74 98.603
## <none> 394.87 99.486
## + x1 1 0.30 394.57 102.896
## - x4 1 540.02 934.89 122.770
##
## Step: AIC=96.47
## y ~ x2 + x3 + x4 + x5
##
## Df Sum of Sq RSS AIC
## - x5 1 7.49 407.75 93.613
## - x3 1 15.54 415.80 94.218
## <none> 400.26 96.472
## - x2 1 48.10 448.36 96.556
## + x6 1 5.38 394.87 99.486
## + x1 1 0.08 400.18 99.899
## - x4 1 709.20 1109.46 124.643
##
## Step: AIC=93.61
## y ~ x2 + x3 + x4
##
## Df Sum of Sq RSS AIC
## - x3 1 29.69 437.44 92.358
## - x2 1 42.46 450.21 93.250
## <none> 407.75 93.613
## + x5 1 7.49 400.26 96.472
## + x6 1 3.03 404.72 96.816
## + x1 1 0.19 407.56 97.032
## - x4 1 786.16 1193.92 123.483
##
## Step: AIC=92.36
## y ~ x2 + x4
##
## Df Sum of Sq RSS AIC
## - x2 1 30.51 467.95 91.014
## <none> 437.44 92.358
## + x3 1 29.69 407.75 93.613
## + x5 1 21.65 415.80 94.218
## + x6 1 4.10 433.34 95.500
## + x1 1 1.66 435.78 95.674
## - x4 1 848.84 1286.29 122.359
##
## Step: AIC=91.01
## y ~ x4
##
## Df Sum of Sq RSS AIC
## <none> 467.95 91.014
## + x2 1 30.51 437.44 92.358
## + x3 1 17.74 450.21 93.250
## + x6 1 14.79 453.17 93.452
## + x5 1 9.11 458.85 93.838
## + x1 1 1.91 466.04 94.321
## - x4 1 898.57 1366.52 120.801
##
## Call:
## lm(formula = y ~ x4, data = asphalt[, -1])
##
## Coefficients:
## (Intercept) x4
## 11.72 -10.77
step(null, scope=list(lower=null, upper=full), k = log(n), direction="both")
## Start: AIC=120.8
## y ~ 1
##
## Df Sum of Sq RSS AIC
## + x4 1 898.57 467.95 91.014
## + x1 1 424.93 941.59 112.689
## + x6 1 245.58 1120.94 118.094
## <none> 1366.52 120.801
## + x5 1 99.00 1267.52 121.904
## + x2 1 80.23 1286.29 122.359
## + x3 1 60.38 1306.14 122.834
##
## Step: AIC=91.01
## y ~ x4
##
## Df Sum of Sq RSS AIC
## <none> 467.95 91.014
## + x2 1 30.51 437.44 92.358
## + x3 1 17.74 450.21 93.250
## + x6 1 14.79 453.17 93.452
## + x5 1 9.11 458.85 93.838
## + x1 1 1.91 466.04 94.321
## - x4 1 898.57 1366.52 120.801
##
## Call:
## lm(formula = y ~ x4, data = asphalt[, -1])
##
## Coefficients:
## (Intercept) x4
## 11.72 -10.77
# then lasso?
# plot lasso?
# mse?
End of Example