Acknowledgment: the materials below are partially based on Montgomery, D. C., Peck, E. A., Vining, G. G., Introduction to Linear Regression Analysis (5th Edition), Wiley Series in Probability and Statistics, 2012. This materials was initilated by Yichen Qin and modified by Tianhai Zu for teaching purpose.

High Leverage Point

A high leverage point is:

Outlier

An outlier is

Leverage

Leverage is a measure of how far away an observation is from those of the other observations.

The average size of the leverage is \((k+1)/n\).

Traditionally, any \(h_{ii} > 2(k+1)/n\) indicates a leverage point.

Appropriate for large \(n\); otherwise consider large as compared to other values

An observation with large \(h_{ii}\) and a large residual is likely to be influential

Treatment of Outliers

Discard if there is an error in recording a measured value.

Do not discard if it is a valid observation, and apply robust estimation techniques.

Measures of Influence

The influence measures discussed here are those that measure the effect of deleting the ith observation.

Cook’s Distance

\[ D_i=\frac{ e_i^2 }{(k+1) MSRes} \frac{h_{ii}}{(1-h_{ii})^2} = \frac{ e_{(i)}^2 h_{ii} }{(k+1) MSRes} \]

What contributes to \(D_i\):

Large values of \(D_i\) indicate an influential point, usually if \(D_i > 1\).

Example

delivery <- read.csv("data_delivery.csv",h=T)
pairs (delivery,pch=20)

model1 <- lm(DeliveryTime ~ NumberofCases + Distance, data=delivery)
influence.measures(model1)
## Influence measures of
##   lm(formula = DeliveryTime ~ NumberofCases + Distance, data = delivery) :
## 
##      dfb.1_ dfb.NmbC dfb.Dstn   dffit cov.r   cook.d    hat inf
## 1  -0.18727  0.41131 -0.43486 -0.5709 0.871 1.00e-01 0.1018    
## 2   0.08979 -0.04776  0.01441  0.0986 1.215 3.38e-03 0.0707    
## 3  -0.00352  0.00395 -0.00285 -0.0052 1.276 9.46e-06 0.0987    
## 4   0.45196  0.08828 -0.27337  0.5008 0.876 7.76e-02 0.0854    
## 5  -0.03167 -0.01330  0.02424 -0.0395 1.240 5.43e-04 0.0750    
## 6  -0.01468  0.00179  0.00108 -0.0188 1.200 1.23e-04 0.0429    
## 7   0.07807 -0.02228 -0.01102  0.0790 1.240 2.17e-03 0.0818    
## 8   0.07120  0.03338 -0.05382  0.0938 1.206 3.05e-03 0.0637    
## 9  -2.57574  0.92874  1.50755  4.2961 0.342 3.42e+00 0.4983   *
## 10  0.10792 -0.33816  0.34133  0.3987 1.305 5.38e-02 0.1963    
## 11 -0.03427  0.09253 -0.00269  0.2180 1.172 1.62e-02 0.0861    
## 12 -0.03027 -0.04867  0.05397 -0.0677 1.291 1.60e-03 0.1137    
## 13  0.07237 -0.03562  0.01134  0.0813 1.207 2.29e-03 0.0611    
## 14  0.04952 -0.06709  0.06182  0.0974 1.228 3.29e-03 0.0782    
## 15  0.02228 -0.00479  0.00684  0.0426 1.192 6.32e-04 0.0411    
## 16 -0.00269  0.06442 -0.08419 -0.0972 1.369 3.29e-03 0.1659    
## 17  0.02886  0.00649 -0.01570  0.0339 1.219 4.01e-04 0.0594    
## 18  0.24856  0.18973 -0.27243  0.3653 1.069 4.40e-02 0.0963    
## 19  0.17256  0.02357 -0.09897  0.1862 1.215 1.19e-02 0.0964    
## 20  0.16804 -0.21500 -0.09292 -0.6718 0.760 1.32e-01 0.1017    
## 21 -0.16193 -0.29718  0.33641 -0.3885 1.238 5.09e-02 0.1653    
## 22  0.39857 -1.02541  0.57314 -1.1950 1.398 4.51e-01 0.3916   *
## 23 -0.15985  0.03729 -0.05265 -0.3075 0.890 2.99e-02 0.0413    
## 24 -0.11972  0.40462 -0.46545 -0.5711 0.948 1.02e-01 0.1206    
## 25 -0.01682  0.00085  0.00559 -0.0176 1.231 1.08e-04 0.0666
cooks.distance(model1)
##            1            2            3            4            5 
## 1.000921e-01 3.375704e-03 9.455785e-06 7.764718e-02 5.432217e-04 
##            6            7            8            9           10 
## 1.231067e-04 2.171604e-03 3.051135e-03 3.419318e+00 5.384516e-02 
##           11           12           13           14           15 
## 1.619975e-02 1.596392e-03 2.294737e-03 3.292786e-03 6.319880e-04 
##           16           17           18           19           20 
## 3.289086e-03 4.013419e-04 4.397807e-02 1.191868e-02 1.324449e-01 
##           21           22           23           24           25 
## 5.086063e-02 4.510455e-01 2.989892e-02 1.023224e-01 1.084694e-04
plot(model1)