Acknowledgment: the materials below are partially based on Montgomery, D. C., Peck, E. A., Vining, G. G., Introduction to Linear Regression Analysis (5th Edition), Wiley Series in Probability and Statistics, 2012. This materials was initilated by Yichen Qin and modified by Tianhai Zu for teaching purpose.
A high leverage point is:
An outlier is
Leverage is a measure of how far away an observation is from those of the other observations.
The average size of the leverage is \((k+1)/n\).
Traditionally, any \(h_{ii} > 2(k+1)/n\) indicates a leverage point.
Appropriate for large \(n\); otherwise consider large as compared to other values
An observation with large \(h_{ii}\) and a large residual is likely to be influential
Discard if there is an error in recording a measured value.
Do not discard if it is a valid observation, and apply robust estimation techniques.
The influence measures discussed here are those that measure the effect of deleting the ith observation.
\[ D_i=\frac{ e_i^2 }{(k+1) MSRes} \frac{h_{ii}}{(1-h_{ii})^2} = \frac{ e_{(i)}^2 h_{ii} }{(k+1) MSRes} \]
What contributes to \(D_i\):
Large values of \(D_i\) indicate an influential point, usually if \(D_i > 1\).
delivery <- read.csv("data_delivery.csv",h=T)
pairs (delivery,pch=20)
model1 <- lm(DeliveryTime ~ NumberofCases + Distance, data=delivery)
influence.measures(model1)
## Influence measures of
## lm(formula = DeliveryTime ~ NumberofCases + Distance, data = delivery) :
##
## dfb.1_ dfb.NmbC dfb.Dstn dffit cov.r cook.d hat inf
## 1 -0.18727 0.41131 -0.43486 -0.5709 0.871 1.00e-01 0.1018
## 2 0.08979 -0.04776 0.01441 0.0986 1.215 3.38e-03 0.0707
## 3 -0.00352 0.00395 -0.00285 -0.0052 1.276 9.46e-06 0.0987
## 4 0.45196 0.08828 -0.27337 0.5008 0.876 7.76e-02 0.0854
## 5 -0.03167 -0.01330 0.02424 -0.0395 1.240 5.43e-04 0.0750
## 6 -0.01468 0.00179 0.00108 -0.0188 1.200 1.23e-04 0.0429
## 7 0.07807 -0.02228 -0.01102 0.0790 1.240 2.17e-03 0.0818
## 8 0.07120 0.03338 -0.05382 0.0938 1.206 3.05e-03 0.0637
## 9 -2.57574 0.92874 1.50755 4.2961 0.342 3.42e+00 0.4983 *
## 10 0.10792 -0.33816 0.34133 0.3987 1.305 5.38e-02 0.1963
## 11 -0.03427 0.09253 -0.00269 0.2180 1.172 1.62e-02 0.0861
## 12 -0.03027 -0.04867 0.05397 -0.0677 1.291 1.60e-03 0.1137
## 13 0.07237 -0.03562 0.01134 0.0813 1.207 2.29e-03 0.0611
## 14 0.04952 -0.06709 0.06182 0.0974 1.228 3.29e-03 0.0782
## 15 0.02228 -0.00479 0.00684 0.0426 1.192 6.32e-04 0.0411
## 16 -0.00269 0.06442 -0.08419 -0.0972 1.369 3.29e-03 0.1659
## 17 0.02886 0.00649 -0.01570 0.0339 1.219 4.01e-04 0.0594
## 18 0.24856 0.18973 -0.27243 0.3653 1.069 4.40e-02 0.0963
## 19 0.17256 0.02357 -0.09897 0.1862 1.215 1.19e-02 0.0964
## 20 0.16804 -0.21500 -0.09292 -0.6718 0.760 1.32e-01 0.1017
## 21 -0.16193 -0.29718 0.33641 -0.3885 1.238 5.09e-02 0.1653
## 22 0.39857 -1.02541 0.57314 -1.1950 1.398 4.51e-01 0.3916 *
## 23 -0.15985 0.03729 -0.05265 -0.3075 0.890 2.99e-02 0.0413
## 24 -0.11972 0.40462 -0.46545 -0.5711 0.948 1.02e-01 0.1206
## 25 -0.01682 0.00085 0.00559 -0.0176 1.231 1.08e-04 0.0666
cooks.distance(model1)
## 1 2 3 4 5
## 1.000921e-01 3.375704e-03 9.455785e-06 7.764718e-02 5.432217e-04
## 6 7 8 9 10
## 1.231067e-04 2.171604e-03 3.051135e-03 3.419318e+00 5.384516e-02
## 11 12 13 14 15
## 1.619975e-02 1.596392e-03 2.294737e-03 3.292786e-03 6.319880e-04
## 16 17 18 19 20
## 3.289086e-03 4.013419e-04 4.397807e-02 1.191868e-02 1.324449e-01
## 21 22 23 24 25
## 5.086063e-02 4.510455e-01 2.989892e-02 1.023224e-01 1.084694e-04
plot(model1)