*Perform Cross Validation for Model Performance in R*

How to Evaluate Your R Model with Cross Validation

Cross validation is a fundamental statistical technique essential for rigorously evaluating the predictive performance of a statistical or machine learning model, particularly within the R environment. This methodology operates by partitioning the dataset used to construct the model into separate subsets: a training set, utilized for model fitting, and a testing set, reserved for unbiased performance assessment. By iterating this split-train-evaluate procedure multiple times, we generate robust estimates of the model’s generalization ability to new, unseen data, effectively minimizing the risk of overfitting.


In the field of statistics and machine learning, modeling serves two primary objectives:

  • To establish and quantify the relationship between one or more predictor variables and a corresponding response variable.
  • To leverage the fitted model for accurate forecasting or classification of future, unobserved observations.

While traditional methods focus on inference, Cross validation is explicitly designed to quantify a model’s robustness and accuracy when predicting new data points.

Consider a scenario involving a multiple linear regression model built to assess loan risk. This model might utilize demographic factors like age and income (the predictor variables) to forecast loan default status (the response variable). The ultimate goal is to apply this trained model to new loan applicants, providing a probabilistic prediction of their default risk based on their supplied attributes.

Assessing true predictive strength requires the model to operate on data outside of its training regimen. By testing the model on novel data, we can accurately estimate the crucial metric of prediction error, which measures how well the model generalizes beyond the sample data.

Using Cross Validation to Estimate Prediction Error

While there are multiple variants, the core methodology of cross-validation remains consistent. The objective is to cyclically partition data to ensure the performance evaluation is independent of the training process.

  1. Data Partitioning: A specific subset of observations—typically 15% to 25% of the total dataset—is randomly set aside to serve as the temporary hold-out or testing data.
  2. Model Training: The statistical model is fitted, or “trained,” exclusively using the remaining majority of the data (the training set).
  3. Performance Evaluation: The trained model is then applied to make predictions on the set-aside observations, allowing for an accurate assessment of its predictive capacity on novel data.

Measuring the Quality of a Model

To quantify the predictive quality of a model when tested against new observations, several key metrics are employed. These indicators help practitioners choose the optimal model architecture by minimizing error or maximizing explained variance.

Multiple R-squared: This metric quantifies the proportion of the variance in the response variable that is predictable from the predictor variables. A value of 1 signifies a perfect linear relationship where the model explains all variability, while a value of 0 suggests no linear relationship exists. Generally, higher R-squared values indicate a superior fit and greater predictive power.

Root Mean Squared Error (RMSE): RMSE represents the average magnitude of the prediction error in the same units as the response variable. It calculates the average distance between the observed true values and the values predicted by the model. Because errors are squared before averaging, RMSE penalizes large errors heavily, making lower values indicative of a more accurate model fit.

Mean Absolute Error (MAE): MAE measures the average of the absolute differences between the actual observations and the predicted values. Unlike RMSE, MAE uses absolute values, making it less sensitive to extreme outliers. As with RMSE, a lower MAE score corresponds to a better model fit.

Implementing Four Different Cross-Validation Techniques in R

We will now demonstrate the practical implementation of four distinct cross validation strategies within the R environment. Understanding these variations is crucial for selecting the most appropriate evaluation method for a given modeling task.

  1. The Validation Set Approach
  2. The k-fold Cross Validation
  3. The Leave One Out Cross Validation (LOOCV)
  4. The Repeated k-fold Cross Validation

To provide concrete examples, we will utilize a specific subset of the widely recognized, built-in R dataset, mtcars. This data allows us to focus on predicting fuel efficiency (miles per gallon) based on engine characteristics.

#define dataset
data <- mtcars[ , c("mpg", "disp", "hp", "drat")]

#view first six rows of new data
head(data)

#                   mpg disp  hp drat
#Mazda RX4         21.0  160 110 3.90
#Mazda RX4 Wag     21.0  160 110 3.90
#Datsun 710        22.8  108  93 3.85
#Hornet 4 Drive    21.4  258 110 3.08
#Hornet Sportabout 18.7  360 175 3.15
#Valiant           18.1  225 105 2.76

Our objective is to construct a multiple linear regression model using displacement (disp), horsepower (hp), and rear axle ratio (drat) as the predictor variables, with miles per gallon (mpg) serving as the response variable.

The Validation Set Approach

The validation set approach is the simplest form of cross-validation, requiring only a single, explicit split of the data. This technique relies on randomly dividing the available observations into two distinct, non-overlapping subsets: a training set and a test (or validation) set.

  1. The data is permanently split into two major portions. Typically, 70%–80% of the data is allocated for training, leaving the remaining 20%–30% for testing.
  2. The model parameters are estimated solely using the designated training data set.
  3. The trained model is then used to generate predictions exclusively on the observations contained within the hold-out test set.
  4. Model quality is assessed using performance metrics such as R-squared, RMSE, and MAE, providing a single estimate of the generalization error.

R Implementation Example

The following R example demonstrates the implementation of the validation set approach. We use the previously defined mtcars subset, splitting it 80/20 into training and testing partitions, respectively. After fitting the linear model, we assess its performance metrics on the unseen test data.

#load dplyr library used for data manipulation
library(dplyr)

#load caret library used for partitioning data into training and test set
library(caret)

#make this example reproducible
set.seed(0)

#define the dataset
data <- mtcars[ , c("mpg", "disp", "hp", "drat")]

#split the dataset into a training set (80%) and test set (20%).
training_obs <- data$mpg %>% createDataPartition(p = 0.8, list = FALSE)

train <- data[training_obs, ]
test <- data[-training_obs, ]

# Build the linear regression model on the training set
model <- lm(mpg ~ ., data = train)

# Use the model to make predictions on the test set
predictions <- model %>% predict(test)

#Examine R-squared, RMSE, and MAE of predictions
data.frame(R_squared = R2(predictions, test$mpg),
           RMSE = RMSE(predictions, test$mpg),
           MAE = MAE(predictions, test$mpg))

#  R_squared     RMSE     MAE
#1 0.9213066 1.876038 1.66614

In practical model selection, the model that yields the lowest RMSE value on the independent test set is generally considered the superior choice for prediction.

Pros and Cons of this Approach

The primary advantage of the validation set approach is its operational simplicity and high computational efficiency, requiring the model to be trained only once. However, a significant drawback is the potential for bias: since the model is built on only a fraction of the total data, if the hold-out test set happens to contain critical, unique information, the final model’s performance estimate may be unreliable or underestimate the true prediction error.

k-fold Cross Validation Approach

The k-fold cross validation approach addresses the data usage limitations inherent in the simple validation set method. Instead of a single split, k-fold CV systematically rotates through all data points, ensuring every observation is used exactly once in the validation process. The process is defined by the parameter k, which specifies the number of partitions (folds) the data is divided into.

  1. The entire dataset is randomly partitioned into k equal-sized subsets, known as folds (e.g., k=5 or k=10).
  2. The model is trained on the data derived from k-1 folds, combining the vast majority of the data.
  3. The model is then tested exclusively on the one remaining, untouched fold (the validation set), and the prediction error is recorded.
  4. This training and testing cycle is repeated k times, ensuring that each of the k folds serves as the validation set exactly once.
  5. The final measure of model quality, or the cross-validation error, is the aggregated average of the k individual test error estimates.

R Implementation Example (k=5)

In this demonstration, we perform 5-fold cross validation. The caret package handles the resampling setup, iteratively training the model on 4 folds and testing on the remaining 1. The output summarizes the averaged metrics (R-squared, RMSE, and MAE) across the five repetitions.

#load dplyr library used for data manipulation
library(dplyr)

#load caret library used for partitioning data into training and test set
library(caret)

#make this example reproducible
set.seed(0)

#define the dataset
data <- mtcars[ , c("mpg", "disp", "hp", "drat")]

#define the number of subsets (or "folds") to use
train_control <- trainControl(method = "cv", number = 5)

#train the model
model <- train(mpg ~ ., data = data, method = "lm", trControl = train_control)

#Summarize the results
print(model)

#Linear Regression 
#
#32 samples
# 3 predictor
#
#No pre-processing
#Resampling: Cross-Validated (5 fold) 
#Summary of sample sizes: 26, 25, 26, 25, 26 
#Resampling results:
#
#  RMSE      Rsquared   MAE     
#  3.095501  0.7661981  2.467427
#
#Tuning parameter 'intercept' was held constant at a value of TRUE

Pros and Cons of this Approach

The key advantage of k-fold cross validation over the validation set method is the comprehensive use of the data; since the model is built multiple times using different data combinations, the risk of omitting vital information is drastically reduced, leading to a more reliable estimate of model performance.

The primary decision point in k-fold CV is selecting the optimal value for k. A low k (e.g., k=3) results in higher bias but lower variance in the error estimate, as the training sets are smaller than the full dataset. Conversely, a high k (e.g., k=100) reduces bias (as training sets are larger) but increases the variance of the estimate. In common practice, choosing k=5 or k=10 is generally recommended, as these values offer a balanced trade-off between bias and variance while maintaining reasonable computational speed.

Leave One Out Cross Validation (LOOCV) Approach

The Leave One Out Cross Validation (LOOCV) approach is an extreme case of k-fold CV where the number of folds, k, is set equal to the total number of observations, N. In this method, the model is trained on almost the entire dataset, maximizing the training sample size in each iteration.

  1. A model is constructed using all available data points except for a single observation (N-1 data points).
  2. This trained model is used to predict the value of the single, excluded observation. The error for this specific prediction is recorded.
  3. This entire sequence (train on N-1, test on 1) is repeated N times, once for every observation in the dataset.
  4. The overall performance is determined by calculating the average of all N recorded prediction errors.

R Implementation Example

The R code below demonstrates LOOCV using the same mtcars subset. Since there are 32 observations in our data, the training process will fit 32 separate models, each leaving one unique observation out for testing.

#load dplyr library used for data manipulation
library(dplyr)

#load caret library used for partitioning data into training and test set
library(caret)

#make this example reproducible
set.seed(0)

#define the dataset
data <- mtcars[ , c("mpg", "disp", "hp", "drat")]

#specify that we want to use LOOCV
train_control <- trainControl(method = "LOOCV")

#train the model
model <- train(mpg ~ ., data = data, method = "lm", trControl = train_control)

#summarize the results
print(model)

#Linear Regression 
#
#32 samples
# 3 predictor
#
#No pre-processing
#Resampling: Leave-One-Out Cross-Validation 
#Summary of sample sizes: 31, 31, 31, 31, 31, 31, ... 
#Resampling results:
#
#  RMSE      Rsquared   MAE     
#  3.168763  0.7170704  2.503544
#
#Tuning parameter 'intercept' was held constant at a value of TRUE

Pros and Cons of this Approach

LOOCV offers the distinct benefit of maximum data utilization, as the training set size (N-1) closely approximates the full dataset size, which generally leads to a low-bias estimate of the prediction error. However, this method suffers from two major disadvantages: first, since the training sets are nearly identical across iterations, the error estimates tend to be highly correlated, leading to high variability in the final error estimate. Second, the requirement to fit N models makes LOOCV significantly more computationally demanding and cumbersome, especially for large datasets.

Repeated k-fold Cross Validation Approach

The repeated k-fold cross validation method is a refinement of the standard k-fold technique designed to provide a more stable and robust estimate of model performance. It achieves this by performing the k-fold process multiple times (e.g., 4 or 10 repeats), with the data being randomly reshuffled before each repetition begins.

The final aggregated measure of performance is calculated as the mean error across all repeats (R repetitions * k folds). This iterative shuffling and re-evaluation smooths out any potential bias introduced by a single random split in the standard k-fold process, thereby yielding a highly reliable assessment of generalization capability.

The subsequent example demonstrates a setup where 5-fold cross validation is executed, and this entire procedure is repeated 4 times.

#load dplyr library used for data manipulation
library(dplyr)

#load caret library used for partitioning data into training and test set
library(caret)

#make this example reproducible
set.seed(0)

#define the dataset
data <- mtcars[ , c("mpg", "disp", "hp", "drat")]

#define the number of subsets to use and number of times to repeat k-fold CV
train_control <- trainControl(method = "repeatedcv", number = 5, repeats = 4)

#train the model
model <- train(mpg ~ ., data = data, method = "lm", trControl = train_control)

#summarize the results
print(model)

#Linear Regression 
#
#32 samples
# 3 predictor
#
#No pre-processing
#Resampling: Cross-Validated (5 fold, repeated 4 times) 
#Summary of sample sizes: 26, 25, 26, 25, 26, 25, ... 
#Resampling results:
#
#  RMSE      Rsquared   MAE     
#  3.176339  0.7909337  2.559131
#
#Tuning parameter 'intercept' was held constant at a value of TRUE

Pros and Cons of this Approach

The chief benefit of repeated k-fold cross validation is that the repeated shufflings and subsequent model fitting provide an even more accurate and unbiased estimate of the true prediction error compared to a single run. The drawback, naturally, is the increased computational burden: since the process involves R repeats of k-fold CV, the total number of models fitted is R * k, making it significantly slower than standard k-fold CV or the validation set approach.

Balancing Bias and Variance: Choosing the Number of Folds (k)

The most critical and subjective aspect of performing k-fold cross validation is determining the optimal number of folds, k. This choice directly influences the trade-off between bias and variance in the resulting error estimate. Generally, a smaller number of folds (low k, e.g., k=3) results in smaller training sets, leading to a higher bias in the estimate but lower variance. Conversely, a larger number of folds (high k, approaching LOOCV) reduces bias because the training sets are larger, but it increases the variability of the estimate due to the high correlation between the resulting models.

Computational cost is another practical constraint. Since a new model must be trained for every fold, selecting a high k significantly increases the time required for evaluation, which can be prohibitive for complex models or very large datasets.

Consequently, in standard practice, k=5 or k=10 folds are most commonly used. This range is widely accepted because it successfully strikes a balance, offering a reasonable compromise between low bias, manageable variance, and acceptable computational efficiency.

How to Choose and Finalize a Model After Cross Validation

The primary utility of cross validation is its ability to objectively assess the predictive performance of candidate models. By evaluating multiple models (e.g., a simple linear model versus a complex polynomial model) using consistent metrics like RMSE or MAE, we can confidently identify the architecture that exhibits the lowest true prediction error.

It is crucial to understand that the model instances trained during the cross-validation cycles are purely for evaluation purposes. Once cross validation identifies the optimal model structure, the final, production-ready model must be refitted using all of the available data.

For instance, if 5-fold cross validation confirms that Model A is superior to Model B, we discard the five instances of Model A created during CV. We then utilize 100% of the dataset to train the final, robust version of Model A, ensuring the ultimate model leverages the maximum possible information for deployment.

Cite this article

stats writer (2025). How to Evaluate Your R Model with Cross Validation. PSYCHOLOGICAL SCALES. Retrieved from https://scales.arabpsychology.com/stats/perform-cross-validation-for-model-performance-in-r/

stats writer. "How to Evaluate Your R Model with Cross Validation." PSYCHOLOGICAL SCALES, 30 Dec. 2025, https://scales.arabpsychology.com/stats/perform-cross-validation-for-model-performance-in-r/.

stats writer. "How to Evaluate Your R Model with Cross Validation." PSYCHOLOGICAL SCALES, 2025. https://scales.arabpsychology.com/stats/perform-cross-validation-for-model-performance-in-r/.

stats writer (2025) 'How to Evaluate Your R Model with Cross Validation', PSYCHOLOGICAL SCALES. Available at: https://scales.arabpsychology.com/stats/perform-cross-validation-for-model-performance-in-r/.

[1] stats writer, "How to Evaluate Your R Model with Cross Validation," PSYCHOLOGICAL SCALES, vol. X, no. Y, ص Z-Z, December, 2025.

stats writer. How to Evaluate Your R Model with Cross Validation. PSYCHOLOGICAL SCALES. 2025;vol(issue):pages.

Download Post (.PDF)
Slide Up
x
PDF
Scroll to Top