How to Perform Stepwise Regression in SAS (With Example)

Stepwise regression in SAS is a method of variable selection that uses a series of forward and backward passes to identify the combination of predictors that best explain the variation in the response. It works by fitting the model with all possible combinations of predictors, then selecting the combination with the best fit according to a specified metric. An example of using stepwise regression in SAS would be to use it to identify which combination of variables best explains the variation in a response variable, such as a customer’s likelihood to make a purchase.


Stepwise regression is a procedure we can use to build a regression model from a set of predictor variables by entering and removing predictors in a stepwise manner into the model until there is no statistically valid reason to enter or remove any more.

The goal of stepwise regression is to build a regression model that includes all of the predictor variables that are statistically significantly related to the response variable.

To perform stepwise regression in SAS, you can use PROC REG with the SELECTION statement.

The following example shows how to perform stepwise regression in SAS in practice.

Example: Perform Stepwise Regression in SAS

Suppose we have the following dataset in SAS that contains four predictor variables (x1, x2, x3, x4) and one response variable (y):

/*create dataset*/
data my_data;
    input x1 x2 x3 x4 y;
    datalines;
1 4 10 13 78
2 4 12 14 81
5 3 7 10 75
8 2 13 9 97
10 5 12 5 95
14 7 8 6 90
17 8 10 6 86 
19 5 15 5 90
20 5 12 4 93
21 4 10 3 95
;
run;

/*view dataset*/
proc print data=my_data;

Now suppose that we would like to find which combination of predictor variables will produce the best .

When we say “best” regression model, we mean the model that maximizes or minimizes some metric.

There are two metrics we commonly use to assess which regression model is best among a group of potential models:

1. Adjusted R-squared: The tells us how useful a model is, adjusted for the number of predictors in a model. The model with the highest adjusted R-squared value is considered the best.

2. AIC: The (AIC) is a metric that is used to compare the fit of different regression models. The model with the lowest AIC value is considered the best.

Fortunately, we can calculate both the adjusted R-squared and AIC values for regression models in SAS by using PROC REG with the SELECTION statement.

The following code shows how to do so:

/*perform stepwise multiple linear regression*/
proc reg data=my_data outest=est;
    model y=x1 x2 x3 x4 / selection=adjrsq aic ;
    output out=out p=p r=r;
run;
quit; 

stepwise regression in SAS

From the output we can see that the value with the highest adjusted R-squared value and the lowest AIC value is the regression model that uses only x3 and x4 as the predictor variables.

Thus, we would declare the following model to be “best” out of all possible models:

y = b0 + b1(x3) + b2(x4)

This particular regression model has the following metrics:

  • Adjusted R-squared value: 0.5923
  • AIC: 34.2921

Notes on Selecting the “Best” Regression Model

Note that sometimes the model with the highest adjusted R-squared value does not always have the lowest AIC value as well.

When it comes to deciding which regression model is best, adjusted R-squared and AIC serve as suggestions but in the real world you may have to use domain expertise to determine which model is best.

It can also be a good idea to choose a , which is a model that achieves a desired level of goodness of fit using as few predictor variables as possible.

The reasoning for this type of model stems from the idea of Occam’s Razor (sometimes called the “Principle of Parsimony”) which says that the simplest explanation is most likely the right one.

Applied to statistics, a model that has few parameters but achieves a satisfactory level of goodness of fit should be preferred over a model that has a ton of parameters and achieves only a slightly higher level of goodness of fit.

The following tutorials explain how to perform other common tasks in SAS:

x