What is a Good R-squared Value?

A good R-squared value is a statistical measure that indicates the proportion of the variance in the dependent variable that is explained by the independent variable(s). It is typically represented as a percentage and ranges from 0% to 100%. A higher R-squared value indicates a stronger relationship between the variables and a better fit of the model to the data. Generally, a value of 70% or above is considered a good R-squared value, but it may vary depending on the field of study and the complexity of the model. A low R-squared value may suggest that the model is not a good fit for the data and further investigation is needed.

What is a Good R-squared Value?


R-squared is a measure of how well a “fits” a dataset. Also commonly called the coefficient of determination, R-squared is the proportion of the variance in the response variable that can be explained by the predictor variable.

The value for R-squared can range from 0 to 1. A value of 0 indicates that the response variable cannot be explained by the predictor variable at all. A value of 1 indicates that the response variable can be perfectly explained without error by the predictor variable.

In practice, you will likely never see a value of 0 or 1 for R-squared. Instead, you’ll likely encounter some value between 0 and 1.

For example, suppose you have a dataset that contains the population size and number of flower shops in 30 different cities. You fit a simple linear regression model to the dataset, using population size as the predictor variable and flower shops as the response variable. In the output of the regression results, you see that R = 0.2. This indicates that 20% of the variance in the number of flower shops can be explained by the population size.

This leads to an important question: is this a “good” value for R-squared?

The answer to this question depends on your objective for the regression model. Namely:

1. Are you interested in explaining the relationship between the predictor(s) and the response variable?

OR

2. Are you interested in predicting the response variable?

Depending on the objective, the answer to “What is a good value for R-squared?” will be different.

Explaining the Relationship Between the Predictor(s) and the Response Variable

If your main objective for your regression model is to explain the relationship between the predictor(s) and the response variable, the R-squared is mostly irrelevant.

For example, suppose in the regression example from above, you see that the coefficient  for the predictor population size is 0.005 and that it’s statistically significant. This means that an increase of one in population size is associated with an average increase of 0.005 in the number of flower shops in a particular city. Also, population size is a statistically significant predictor of the number of flower shops in a city.

Whether the R-squared value for this regression model is 0.2 or 0.9 doesn’t change this interpretation. Since you are simply interested in the relationship between population size and the number of flower shops, you don’t have to be overly concerned with the R-square value of the model.

Predicting the Response Variable

If your main objective is to predict the value of the response variable accurately using the predictor variable, then R-squared is important.

In general, the larger the R-squared value, the more precisely the predictor variables are able to predict the value of the response variable.

To find out what is considered a “good” R-squared value, you will need to explore what R-squared values are generally accepted in your particular field of study. If you’re performing a regression analysis for a client or a company, you may be able to ask them what is considered an acceptable R-squared value.

Prediction Intervals

A prediction interval specifies a range where a new observation could fall, based on the values of the predictor variables. Narrower prediction intervals indicate that the predictor variables can predict the response variable with more precision.

Often a prediction interval can be more useful than an R-squared value because it gives you an exact range of values in which a new observation could fall. This is particularly useful if your primary objective of regression is to predict new values of the response variable.

For example, suppose a population size of 40,000 produces a prediction interval of 30 to 35 flower shops in a particular city. This may or may not be considered an acceptable range of values, depending on what the regression model is being used for.

Conclusion

In general, the larger the R-squared value, the more precisely the predictor variables are able to predict the value of the response variable.

How high an R-squared value needs to be to be considered “good” varies based on the field. Some fields require higher precision than others. 

To find out what is considered a “good” R-squared value, consider what is generally accepted in the field you’re working in, ask someone with specific subject area knowledge, or ask the client/company you’re performing the regression analysis for what they consider to be acceptable.

If you’re interested in explaining the relationship between the predictor and response variable, the R-squared is largely irrelevant since it doesn’t impact the interpretation of the regression model.

If you’re interested in predicting the response variable, prediction intervals are generally more useful than R-squared values.

Further Reading:

x