Model Validation

Acknowledgment: the materials below are partially based on Montgomery, D. C., Peck, E. A., Vining, G. G., Introduction to Linear Regression Analysis (5th Edition), Wiley Series in Probability and Statistics, 2012. This materials was initilated by Yichen Qin and modified by Tianhai Zu for teaching purpose.

Model validation in general

Let us start with the general model building process.

As a quick review, we have discussed what model adequacy checking means.

Model Adequacy Checking:

Residual analysis, lack of fit testing, determining influential observations
Checks the fit of the model to the available data

On the other hand, let us define what model validation means.

Model Validation:

Determining if the model will behave or function as it was intended in the operating environment
More specifically, what is validation? How do we know that an estimated regression model is generalizable beyond the sample data used to fit it?

Some elementary tools for model validation:

Analysis of model coefficients and predicted values
Check for “inappropriate” signs on the coefficients
Check for unusual magnitudes on the coefficients
Check for stability in the coefficient estimates
Check the predicted values

To formally validate models, we could

Ideally, we can obtain new independent data with which to validate our model.
However, obtaining new data isn’t always available.
How to validate our model? Cross validation.

Validation technique: cross validation

However, in practice,

Most of the time we cannot obtain new independent data to validate our model.
An alternative is to partition the sample data into a training (or model-building) set, which we can use to develop the model, and a validation (or prediction) set, which is used to evaluate the predictive ability of the model.
This is called cross-validation.

In this section, we will focus on the following cross validation.

K-fold cross validation
Leave one out cross validation (LOOCV)

K-fold cross validation

This partitions the sample dataset into K parts which are (roughly) equal in size.
For each part, we use the remaining K - 1 parts to estimate the model of interest (i.e., the training sample) and test the predictability of the model with the remaining part (i.e., the validation sample).
We then calculate the sum of squared prediction errors, and combine the K estimates of prediction error to produce a K-fold cross-validation estimate.
Usually, K=5 or 10.

The idea of K-fold CV is summarized in the figure below.

K-fold cross-validation

Leave one out cross validation

When K = n, this is called leave-one-out cross-validation.
That means that n separate data sets are trained on all of the data (except one point) and then prediction is made for that one point.
The evaluation of this method is very good, but often computationally expensive.
Note that the leave-one-out cross validation estimate of prediction error is identical to the PRESS statistic mentioned in Chapter 4.

To split data into training and testing sets

If the time sequence is known, data splitting can be done by time order.
Given other characteristics of the data, data splitting based on group (operator, machine, location, etc.)
Random selection (most frequently used).

Criterion to compare for cross validation

Predicted residual error sum of squares (PRESS) \[PRESS=\sum_{i=1}^n (y_i-\hat{y}_{(i)})^2\]
Mean squared prediction error (MSPE) \[MSPE=\frac{1}{n}\sum_{i=1}^n (y_i-\hat{y}_{(i)})^2\]
Prediction R square \[R^2_{prediction}=1-\frac{\sum_{i=1}^{n} (y_i-\hat{y}_{(i)})^2}{\sum_{i=1}^{n} (y_i-\bar{y})^2}=1-\frac{PRESS}{SST}\]
Interpretation of prediction R square: we expect the model to explain about 100*R^2% of the variability in prediction of a new observation.
Note that the traditional R squared is \[R^2 = 1 - \frac{SSRes}{SST}\] where \[SSRes=\sum_{i=1}^n (y_i-\hat{y}_{i})^2.\]
$\hat{y}_{(i)}$ is the predicted value of based on cross-validation (either K-fold or leave-one-out CV).

Example of delivery time data set Now we perform K-fold cross validation.

#install DAAG from archived source
if(!is.element("DAAG", installed.packages()[,1])){
  packageurl <- "https://cran.r-project.org/src/contrib/Archive/DAAG/DAAG_1.22.tar.gz"
  install.packages("latticeExtra")
  install.packages(packageurl, repos=NULL, type="source")
}

#loda DAAG package
library(DAAG)

## Loading required package: lattice

d <- read.csv("data_delivery.csv",header=TRUE)
head(d)

##   Index DeliveryTime NumberofCases Distance
## 1     1        16.68             7      560
## 2     2        11.50             3      220
## 3     3        12.03             3      340
## 4     4        14.88             4       80
## 5     5        13.75             6      150
## 6     6        18.11             7      330

n = dim(d)[1]
n

## [1] 25

model1 = lm(DeliveryTime ~ NumberofCases + Distance, data=d)
summary(model1)

## 
## Call:
## lm(formula = DeliveryTime ~ NumberofCases + Distance, data = d)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.7880 -0.6629  0.4364  1.1566  7.4197 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.341231   1.096730   2.135 0.044170 *  
## NumberofCases 1.615907   0.170735   9.464 3.25e-09 ***
## Distance      0.014385   0.003613   3.981 0.000631 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.259 on 22 degrees of freedom
## Multiple R-squared:  0.9596, Adjusted R-squared:  0.9559 
## F-statistic: 261.2 on 2 and 22 DF,  p-value: 4.687e-16

KCV=cv.lm(data=d, model1, m=3, seed=123)

## Analysis of Variance Table
## 
## Response: DeliveryTime
##               Df Sum Sq Mean Sq F value  Pr(>F)    
## NumberofCases  1   5382    5382   506.6 < 2e-16 ***
## Distance       1    168     168    15.8 0.00063 ***
## Residuals     22    234      11                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## Warning in cv.lm(data = d, model1, m = 3, seed = 123): 
## 
##  As there is >1 explanatory variable, cross-validation
##  predicted values for a fold are not a linear function
##  of corresponding overall predicted values.  Lines that
##  are shown for the different folds are approximate

## 
## fold 1 
## Observations in test set: 8 
##                  1     4      6    10    11     12   13   19
## Predicted    21.71  9.96 18.400 19.12 38.09 21.593 12.5 7.71
## cvpred       21.69  9.15 17.933 19.29 37.89 20.759 12.0 6.86
## DeliveryTime 16.68 14.88 18.110 21.50 40.33 21.000 13.5 9.50
## CV residual  -5.01  5.73  0.177  2.21  2.44  0.241  1.5 2.64
## 
## Sum of squares = 78.1    Mean square = 9.76    n = 8 
## 
## fold 2 
## Observations in test set: 9 
##                 2      5     15    16    18    20    22    23     25
## Predicted    10.4 14.194 23.329 29.66 15.55 40.89 56.01 23.36 10.963
## cvpred       10.0 14.624 24.141 30.35 16.24 43.24 60.45 24.17 10.918
## DeliveryTime 11.5 13.750 24.000 29.00 19.00 35.10 52.32 18.75 10.750
## CV residual   1.5 -0.874 -0.141 -1.35  2.76 -8.14 -8.13 -5.42 -0.168
## 
## Sum of squares = 174    Mean square = 19.4    n = 9 
## 
## fold 3 
## Observations in test set: 8 
##                   3     7      8    9    14     17    21    24
## Predicted    12.080  7.16 16.673 71.8 18.68 14.914 20.51 24.40
## cvpred       12.685  8.89 17.442 64.6 18.43 15.831 21.28 23.17
## DeliveryTime 12.030  8.00 17.830 79.2 19.75 15.350 17.90 19.83
## CV residual  -0.655 -0.89  0.388 14.6  1.32 -0.481 -3.38 -3.34
## 
## Sum of squares = 240    Mean square = 29.9    n = 8 
## 
## Overall (Sum over all 8 folds) 
##   ms 
## 19.7

# MSPE can be alternatively calculated as
sum((d$DeliveryTime-KCV$cvpred)^2)/n

## [1] 19.7

# PRESS can be calculated as 
sum((d$DeliveryTime-KCV$cvpred)^2)

## [1] 492

# Prediction R squared can be calculated as
1-sum((d$DeliveryTime-KCV$cvpred)^2)/sum((d$DeliveryTime-mean(d$DeliveryTime))^2)

## [1] 0.915

summary(model1)$r.squared

## [1] 0.96

Let us take the 9th observation as an example. The acutal observation of delivery time is d$DeliveryTime[9]=79.24. The fitted value is model1$fitted.values[9]=71.82. The cross validated prediction is KCV$cvpred[9]=64.624.

We further perform leave one out cross validation.

LOOCV=cv.lm(data=d, model1, m=n,seed=123)

## Analysis of Variance Table
## 
## Response: DeliveryTime
##               Df Sum Sq Mean Sq F value  Pr(>F)    
## NumberofCases  1   5382    5382   506.6 < 2e-16 ***
## Distance       1    168     168    15.8 0.00063 ***
## Residuals     22    234      11                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## Warning in cv.lm(data = d, model1, m = n, seed = 123): 
## 
##  As there is >1 explanatory variable, cross-validation
##  predicted values for a fold are not a linear function
##  of corresponding overall predicted values.  Lines that
##  are shown for the different folds are approximate

## Warning in plot.xy(xy, type, ...): unimplemented pch value '26'

## 
## fold 1 
## Observations in test set: 1 
##                  16
## Predicted    29.663
## cvpred       29.795
## DeliveryTime 29.000
## CV residual  -0.795
## 
## Sum of squares = 0.63    Mean square = 0.63    n = 1

## Warning in plot.xy(xy.coords(x, y), type = type, ...): unimplemented pch value
## '26'

## 
## fold 2 
## Observations in test set: 1 
##                 18
## Predicted    15.55
## cvpred       15.18
## DeliveryTime 19.00
## CV residual   3.82
## 
## Sum of squares = 14.6    Mean square = 14.6    n = 1

## Warning in plot.xy(xy.coords(x, y), type = type, ...): unimplemented pch value
## '26'

## 
## fold 3 
## Observations in test set: 1 
##                 21
## Predicted    20.51
## cvpred       21.03
## DeliveryTime 17.90
## CV residual  -3.13
## 
## Sum of squares = 9.81    Mean square = 9.81    n = 1

## Warning in plot.xy(xy.coords(x, y), type = type, ...): unimplemented pch value
## '26'

## 
## fold 4 
## Observations in test set: 1 
##                  4
## Predicted     9.96
## cvpred        9.50
## DeliveryTime 14.88
## CV residual   5.38
## 
## Sum of squares = 29    Mean square = 29    n = 1

## Warning in plot.xy(xy.coords(x, y), type = type, ...): unimplemented pch value
## '26'

## 
## fold 5 
## Observations in test set: 1 
##                  25
## Predicted    10.963
## cvpred       10.978
## DeliveryTime 10.750
## CV residual  -0.228
## 
## Sum of squares = 0.05    Mean square = 0.05    n = 1

## Warning in plot.xy(xy.coords(x, y), type = type, ...): unimplemented pch value
## '26'

## 
## fold 6 
## Observations in test set: 1 
##                  8
## Predicted    16.67
## cvpred       16.59
## DeliveryTime 17.83
## CV residual   1.24
## 
## Sum of squares = 1.53    Mean square = 1.53    n = 1

## Warning in plot.xy(xy.coords(x, y), type = type, ...): unimplemented pch value
## '26'

## 
## fold 7 
## Observations in test set: 1 
##                 10
## Predicted    19.12
## cvpred       18.54
## DeliveryTime 21.50
## CV residual   2.96
## 
## Sum of squares = 8.74    Mean square = 8.74    n = 1

## Warning in plot.xy(xy.coords(x, y), type = type, ...): unimplemented pch value
## '26'

## 
## fold 8 
## Observations in test set: 1 
##                 23
## Predicted    23.36
## cvpred       23.56
## DeliveryTime 18.75
## CV residual  -4.81
## 
## Sum of squares = 23.1    Mean square = 23.1    n = 1

## Warning in plot.xy(xy.coords(x, y), type = type, ...): unimplemented pch value
## '26'

## 
## fold 9 
## Observations in test set: 1 
##                24
## Predicted    24.4
## cvpred       25.0
## DeliveryTime 19.8
## CV residual  -5.2
## 
## Sum of squares = 27    Mean square = 27    n = 1

## Warning in plot.xy(xy.coords(x, y), type = type, ...): unimplemented pch value
## '26'

## 
## fold 10 
## Observations in test set: 1 
##                 11
## Predicted    38.09
## cvpred       37.88
## DeliveryTime 40.33
## CV residual   2.45
## 
## Sum of squares = 5.99    Mean square = 5.99    n = 1

## Warning in plot.xy(xy.coords(x, y), type = type, ...): unimplemented pch value
## '26'

## 
## fold 11 
## Observations in test set: 1 
##                  5
## Predicted    14.19
## cvpred       14.23
## DeliveryTime 13.75
## CV residual  -0.48
## 
## Sum of squares = 0.23    Mean square = 0.23    n = 1

## Warning in plot.xy(xy.coords(x, y), type = type, ...): unimplemented pch value
## '26'

## 
## fold 12 
## Observations in test set: 1 
##                 7
## Predicted    7.16
## cvpred       7.08
## DeliveryTime 8.00
## CV residual  0.92
## 
## Sum of squares = 0.85    Mean square = 0.85    n = 1

## Warning in plot.xy(xy.coords(x, y), type = type, ...): unimplemented pch value
## '26'

## 
## fold 13 
## Observations in test set: 1 
##                19
## Predicted    7.71
## cvpred       7.52
## DeliveryTime 9.50
## CV residual  1.98
## 
## Sum of squares = 3.94    Mean square = 3.94    n = 1

## Warning in plot.xy(xy.coords(x, y), type = type, ...): unimplemented pch value
## '26'

## 
## fold 14 
## Observations in test set: 1 
##                 20
## Predicted    40.89
## cvpred       41.54
## DeliveryTime 35.10
## CV residual  -6.44
## 
## Sum of squares = 41.5    Mean square = 41.5    n = 1

## Warning in plot.xy(xy.coords(x, y), type = type, ...): unimplemented pch value
## '26'

## 
## fold 15 
## Observations in test set: 1 
##                    3
## Predicted    12.0798
## cvpred       12.0852
## DeliveryTime 12.0300
## CV residual  -0.0552
## 
## Sum of squares = 0    Mean square = 0    n = 1

## Warning in plot.xy(xy.coords(x, y), type = type, ...): unimplemented pch value
## '26'

## 
## fold 16 
## Observations in test set: 1 
##                 1
## Predicted    21.7
## cvpred       22.3
## DeliveryTime 16.7
## CV residual  -5.6
## 
## Sum of squares = 31.3    Mean square = 31.3    n = 1

## Warning in plot.xy(xy.coords(x, y), type = type, ...): unimplemented pch value
## '26'

## 
## fold 17 
## Observations in test set: 1 
##                 22
## Predicted    56.01
## cvpred       58.38
## DeliveryTime 52.32
## CV residual  -6.06
## 
## Sum of squares = 36.7    Mean square = 36.7    n = 1

## Warning in plot.xy(xy.coords(x, y), type = type, ...): unimplemented pch value
## '26'

## 
## fold 18 
## Observations in test set: 1 
##                  17
## Predicted    14.914
## cvpred       14.886
## DeliveryTime 15.350
## CV residual   0.464
## 
## Sum of squares = 0.22    Mean square = 0.22    n = 1

## Warning in plot.xy(xy.coords(x, y), type = type, ...): unimplemented pch value
## '26'

## 
## fold 19 
## Observations in test set: 1 
##                   6
## Predicted    18.400
## cvpred       18.413
## DeliveryTime 18.110
## CV residual  -0.303
## 
## Sum of squares = 0.09    Mean square = 0.09    n = 1

## Warning in plot.xy(xy.coords(x, y), type = type, ...): unimplemented pch value
## '26'

## 
## fold 20 
## Observations in test set: 1 
##                  2
## Predicted    10.35
## cvpred       10.27
## DeliveryTime 11.50
## CV residual   1.23
## 
## Sum of squares = 1.52    Mean square = 1.52    n = 1

## Warning in plot.xy(xy.coords(x, y), type = type, ...): unimplemented pch value
## '26'

## 
## fold 21 
## Observations in test set: 1 
##                 14
## Predicted    18.68
## cvpred       18.59
## DeliveryTime 19.75
## CV residual   1.16
## 
## Sum of squares = 1.34    Mean square = 1.34    n = 1

## Warning in plot.xy(xy.coords(x, y), type = type, ...): unimplemented pch value
## '26'

## 
## fold 22 
## Observations in test set: 1 
##                  12
## Predicted    21.593
## cvpred       21.669
## DeliveryTime 21.000
## CV residual  -0.669
## 
## Sum of squares = 0.45    Mean square = 0.45    n = 1

## Warning in plot.xy(xy.coords(x, y), type = type, ...): unimplemented pch value
## '26'

## 
## fold 23 
## Observations in test set: 1 
##                15
## Predicted    23.3
## cvpred       23.3
## DeliveryTime 24.0
## CV residual   0.7
## 
## Sum of squares = 0.49    Mean square = 0.49    n = 1

## Warning in plot.xy(xy.coords(x, y), type = type, ...): unimplemented pch value
## '26'

## 
## fold 24 
## Observations in test set: 1 
##                 9
## Predicted    71.8
## cvpred       64.5
## DeliveryTime 79.2
## CV residual  14.8
## 
## Sum of squares = 219    Mean square = 219    n = 1

## Warning in plot.xy(xy.coords(x, y), type = type, ...): unimplemented pch value
## '26'

## 
## fold 25 
## Observations in test set: 1 
##                 13
## Predicted    12.47
## cvpred       12.41
## DeliveryTime 13.50
## CV residual   1.09
## 
## Sum of squares = 1.2    Mean square = 1.2    n = 1

## Warning in plot.xy(xy.coords(x, y), type = type, ...): unimplemented pch value
## '26'

## Warning in plot.xy(xy.coords(x, y), type = type, ...): unimplemented pch value
## '26'

## 
## Overall (Sum over all 1 folds) 
##   ms 
## 18.4

# MSPE can be alternatively calculated as
sum((d$DeliveryTime-LOOCV$cvpred)^2)/n

## [1] 18.4

# PRESS can be calculated as 
sum((d$DeliveryTime-LOOCV$cvpred)^2)

## [1] 459

# Prediction R squared can be calculated as
1-sum((d$DeliveryTime-LOOCV$cvpred)^2)/sum((d$DeliveryTime-mean(d$DeliveryTime))^2)

## [1] 0.921

summary(model1)$r.squared

## [1] 0.96

Finally, I have include three functions that can quickly compute the PRESS and prediction R squared.

### This calculate the PRESS (predictive residual sum of squares), the lower, the better
#' @title PRESS
#' @author Thomas Hopper
#' @description Returns the PRESS statistic (predictive residual sum of squares).
#'              Useful for evaluating predictive power of regression models.
#' @param linear.model A linear regression model (class 'lm'). Required.
PRESS <- function(linear.model) {
  #' calculate the predictive residuals
  pr <- residuals(linear.model)/(1-lm.influence(linear.model)$hat)
  #' calculate the PRESS
  PRESS <- sum(pr^2)
  return(PRESS)
}
### This calculate the MSPE (mean square prediction error), the lower, the better
#' @title MSPE
#' @author Yichen Qin
#' @description Returns the MSPE statistic (mean square prediction error).
#' @param linear.model A linear regression model (class 'lm'). Required.
MSPE <- function(linear.model) {
  #' calculate the MSPE =PRESS/sample size
  return(PRESS(linear.model)/length(residuals(linear.model)))
}
### This calculate the Prediction r-squared
#' @title Predictive R-squared
#' @author Thomas Hopper
#' @description returns the prediction r-squared. Requires the function PRESS(), which returns
#'              the PRESS statistic.
#' @param linear.model A linear regression model (class 'lm'). Required.
pred_r_squared <- function(linear.model) {
  #' Use anova() to get the sum of squares for the linear model
  lm.anova <- anova(linear.model)
  #' Calculate the total sum of squares
  tss <- sum(lm.anova$'Sum Sq')
  # Calculate the predictive R^2
  pred.r.squared <- 1-PRESS(linear.model)/(tss)
  
  return(pred.r.squared)
}
MSPE(model1)

## [1] 18.4

PRESS(model1)

## [1] 459

pred_r_squared(model1)

## [1] 0.921

summary(model1)$r.squared

## [1] 0.96

End of example

Optional: Validation by collecting of new observations

If we could obtain new data, we could use new data to validate the model.

Usually 15-20 new observations are adequate

Example of the delivery time data Suppose we use the original data set to build a model.

d <- read.csv("data_delivery.csv",header=TRUE)
head(d)

##   Index DeliveryTime NumberofCases Distance
## 1     1         16.7             7      560
## 2     2         11.5             3      220
## 3     3         12.0             3      340
## 4     4         14.9             4       80
## 5     5         13.8             6      150
## 6     6         18.1             7      330

n = dim(d)[1]
n

## [1] 25

In this example, we have in total 25 observations in the original data set data_delivery.csv, we build a multiple linear regression as follows.

model1 = lm(DeliveryTime ~ NumberofCases + Distance, data=d)
summary(model1)

## 
## Call:
## lm(formula = DeliveryTime ~ NumberofCases + Distance, data = d)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -5.788 -0.663  0.436  1.157  7.420 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    2.34123    1.09673    2.13  0.04417 *  
## NumberofCases  1.61591    0.17073    9.46  3.3e-09 ***
## Distance       0.01438    0.00361    3.98  0.00063 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.26 on 22 degrees of freedom
## Multiple R-squared:  0.96,   Adjusted R-squared:  0.956 
## F-statistic:  261 on 2 and 22 DF,  p-value: 4.69e-16

It seems model1 works well for the observed data set data_delivery.csv. However, how do we validate model1? In other words, how will model1 perform for a newly collected data set? Or how well does model1 generalize beyond the observed data?

To validate the performance of model1, we can collect more observations. Suppose the newly collected observations are stored in data_delivery_additional.csv. Then we could validate model1 as follows.

dnew <- read.csv("data_delivery_additional.csv",header=TRUE)

We can first make prediction for these new observations using the fitted model model1 and calculate the predication error.

prediction = predict(model1, dnew, interval = c("confidence"), level = 0.95, type="response")
head(prediction)

##    fit  lwr  upr
## 1 50.9 47.9 53.9
## 2 21.1 19.2 23.1
## 3 30.8 27.5 34.0
## 4 17.6 15.3 20.0
## 5 26.4 21.0 31.9
## 6 15.3 13.7 16.8

prediction_error = dnew$DeliveryTime-prediction[,1]
head(prediction_error)

##       1       2       3       4       5       6 
##  0.0905 -4.3327 -4.5914  2.2868 -2.4215  3.2767

# putting new observations, prediction and prediction error side by side.
head(cbind(dnew,prediction,prediction_error))

##   Index DeliveryTime NumberofCases Distance  fit  lwr  upr prediction_error
## 1    26         51.0            22      905 50.9 47.9 53.9           0.0905
## 2    27         16.8             7      520 21.1 19.2 23.1          -4.3327
## 3    28         26.2            15      290 30.8 27.5 34.0          -4.5914
## 4    29         19.9             5      500 17.6 15.3 20.0           2.2868
## 5    30         24.0             6     1000 26.4 21.0 31.9          -2.4215
## 6    31         18.6             6      225 15.3 13.7 16.8           3.2767

The whole idea is summarized in the figure below End of example

Once the prediction for new observations are obtained, we can use the following criterion to validate the model.

Mean squared prediction error (MSPE) for new observations: the smaller the better \[MSPE=\sum_{i=n+1}^{n+m} (y_i-\hat{y}_{i,predict})^2/m\]
Predicted residual error sum of squares (PRESS) for new observations: the smaller the better \[PRESS=\sum_{i=n+1}^{n+m} (y_i-\hat{y}_{i,predict})^2\]
Prediction R squared for new observations: the higher the better \[R^2_{prediction}=1-\frac{\sum_{i=n+1}^{n+m} (y_i-\hat{y}_{i,predict})^2}{\sum_{i=n+1}^{n+m} (y_i-\bar{y})^2}= 1- \frac{PRESS}{\sum_{i=n+1}^{n+m} (y_i-\bar{y})^2}\]
Note that $n$ is the original sample size and $m$ is the sample size of the new observations. Therefore, $\hat{y}_{i,predict}$ for $i=n+1,...,n+m$ are the predicted values of the new observations. $y_{i}$ for $i=n+1,...,n+m$ are the observed value of the new observations.

Example: For the delivery time data set, we can calculate the mean squared prediction error (MSPE) as follows.

MSPE = sum( (dnew$DeliveryTime - prediction[,1])^2 ) / dim(dnew)[1]
MSPE

## [1] 22.1

We can calculate the predicted residual error sum of squares (PRESS) statistic as follows.

PRESS = sum((dnew$DeliveryTime-prediction[,1])^2)
PRESS

## [1] 332

We can calculate prediction R squared as follows.

pred_Rsq = 1-PRESS/sum((dnew$DeliveryTime-mean(dnew$DeliveryTime))^2)
pred_Rsq

## [1] 0.897

Once we have the prediction R squared pred_Rsq = 0.897, we can compare it with the traditional R square summary(model1)$r.squared=0.96. Once again, we see that the least-sqaure model model1 does not predict the new observations dnew as well as it fits the original data d. However the lost in R squared for prediction is slight. Collecting new data has indicated that the least-squares fit for the delivery time data results in a reasonably good prediction equation.

In addition, we can compare PRESS with the Residual Sum of Squares (SSRes) from model1. To obtain the SSRes of model1, we use the following code

sum(model1$residuals^2)

## [1] 234

Clearly, SSRes=233.732 is lower than PRESS=331.832, which means model1 performs worse for the new observations dnew than the original observations d, but this is often expected.

End of example

Model Validation

Tianhai Zu (zuti@mail.uc.edu)