How to Calculate Variance Inflation Factor (VIF) in SAS

The Variance Inflation Factor (VIF) is a measure of multicollinearity in a regression model. It is used to assess the amount of variance in an independent variable that is caused by its association with one or more other independent variables. In SAS, it is calculated using the REG procedure with the VIF option. This option computes the VIF for each independent variable in the model and provides a table of results. It can also be used to test for and identify multicollinearity in the model.


In regression analysis, occurs when two or more predictor variables are highly correlated with each other, such that they do not provide unique or independent information in the regression model.

If the degree of correlation is high enough between variables, it can cause problems when fitting and interpreting the regression model. 

One way to detect multicollinearity is by using a metric known as the variance inflation factor (VIF), which measures the correlation and strength of correlation between the explanatory variables in a .

This tutorial explains how to calculate VIF in SAS.

Example: Calculating VIF in SAS

For this example we’ll create a dataset that describes the attributes of 10 basketball players:

/*create dataset*/
data my_data;
    input rating points assists rebounds;
    datalines;
90 25 5 11
85 20 7 8
82 14 7 10
88 16 8 6
94 27 5 6
90 20 7 9
76 12 6 6
75 15 9 10
87 14 9 10
86 19 5 7
;
run;

/*view dataset*/
proc print data=my_data;

Suppose we would like to fit a multiple linear regression model using rating as the response variable and points, assists, and rebounds as the predictor variables.

We can use to fit this regression model along with the VIF option to calculate the VIF values for each predictor variable in the model:

/*fit regression model and calculate VIF values*/
proc reg data=my_data;
    model rating = points assists rebounds / vif;
run;

VIF in SAS

From the Parameter Estimates table we can see the VIF values for each of the predictor variables:

  • points: 1.76398
  • assists: 1.96591
  • rebounds: 1.17503

Note: Ignore the VIF for the “Intercept” in the model since this value is irrelevant.

The value for VIF starts at 1 and has no upper limit. A rule of thumb for interpreting VIFs is as follows:

  • A value of 1 indicates there is no correlation between a given predictor variable and any other predictor variables in the model.
  • A value between 1 and 5 indicates moderate correlation between a given predictor variable and other predictor variables in the model, but this is often not severe enough to require attention.
  • A value greater than 5 indicates potentially severe correlation between a given predictor variable and other predictor variables in the model. In this case, the coefficient estimates and p-values in the regression output are likely unreliable.

How to Deal with Multicollinearity

If you determine that multicollinearity is a problem in your regression model, there are a few common ways to deal with it:

1. Remove one or more of the highly correlated variables.

This is the quickest fix in most cases and is often an acceptable solution because the variables you’re removing are redundant anyway and add little unique or independent information the model.

2. Linearly combine the predictor variables in some way, such as adding or subtracting them from one way.

By doing so, you can create one new variables that encompasses the information from both variables and you no longer have an issue of multicollinearity.

3. Perform an analysis that is designed to account for highly correlated variables such as principal component analysis or partial least squares (PLS) regression.

These techniques are specifically designed to handle highly correlated predictor variables.

The following tutorials explain how to perform other common tasks in SAS:

x