What are the significant variables in regression models? 2

What are the significant variables in regression models?

Regression models are statistical tools used to analyze the relationship between a dependent variable and one or more independent variables. The significant variables in regression models are those that have a significant impact on the dependent variable. These variables play a crucial role in predicting the outcome of the dependent variable and their inclusion or exclusion can greatly affect the accuracy and reliability of the model. Some of the important variables in regression models include the strength of the relationship between the dependent and independent variables, the level of significance, and the coefficient of determination. Other factors such as multicollinearity, heteroscedasticity, and autocorrelation can also greatly influence the results of a regression model. Therefore, it is essential to carefully select and analyze the significant variables in order to build a reliable and accurate regression model.

Determine Significant Variables in Regression Models


One of the main questions you’ll have after fitting a is: Which variables are significant?

There are two methods you should not use to determine variable significance:

1. The value of the regression coefficients

A regression coefficient for a given predictor variable tells you the average change in the response variable associated with a one unit increase in that predictor variable.

However, each predictor variable in a model is usually measured on a different scale so it doesn’t make sense to compare the absolute values of the regression coefficients to determine which variables are most important.

2. The p-values of the regression coefficients

The p-values of the regression coefficients can tell you if a given predictor variable has a statistically significant association with the response variable, but they can’t tell you if a given predictor variable is practically significant in the real world.

P-values can also be low due to a large sample size or low variability, which doesn’t actually tell us whether or not a given predictor variable is practically significant.

However, there are two methods you should use to determine variable significance:

1. Standardized Regression Coefficients

Typically when we perform multiple linear regression, the resulting regression coefficients in the model output are unstandardized, meaning they use the raw data to find the line of best fit.

However, it’s possible to standardize each predictor variable and the response variable (by subtracting the mean value of each variable from the original values and then dividing by the variables standard deviation) and then perform regression, which results in standardized regression coefficients.

By standardizing each variable in the model, each variable becomes measured on the same scale. Thus, it makes sense to compare the absolute values of the regression coefficients in the output to understand which variables have the greatest effect on the response variable.

2. Subject Matter Expertise

While p-values can tell you if there is a statistically significant effect between a given predictor variable and the response variable, subject matter expertise is needed to confirm whether or not a predictor variable is actually relevant and should actually be included in a model.

The following example shows how to determine significant variables in a regression model in practice.

Example: How to Determine Significant Variables in Regression Model

Suppose we then perform multiple linear regression, using age and square footage as the predictor variables and price as the response variable.

We receive the following output:

Unstandardized regression coefficients example

The regression coefficients in this table are unstandardized, meaning they used the raw data to fit this regression model.

Upon first glance, it appears that age has a much larger effect on house price since it’s coefficient in the regression table is -409.833 compared to just 100.866 for the predictor variable square footage.

However, the standard error is much larger for age compared to square footage, which is why the corresponding p-value is actually large for age (p=0.520) and small for square footage (p=0.000).

The reason for the extreme differences in regression coefficients is because of the extreme differences in scales for the two variables:

  • The values for age range from 4 to 44.
  • The values for square footage range from 1,200 to 2,800.

Suppose we instead standardize the raw data:

Standardize data in Excel

If we then perform multiple linear regression using the standardized data, we’ll get the following regression output:

Standardized regression coefficients

The regression coefficients in this table are standardized, meaning they used standardized data to fit this regression model.

The way to interpret the coefficients in the table is as follows:

  • A one standard deviation increase in age is associated with a 0.092 standard deviation decrease in house price, assuming square footage is held constant.
  • A one standard deviation increase in square footage is associated with a 0.885 standard deviation increase in house price, assuming age is held constant.

Now we can see that square footage has a much larger effect on house price than age.

Note: The p-values for each predictor variable are the exact same as the previous regression model.

When deciding on the final model to use, we now know that square footage is much more important for predicting the price of a house compared to age.

Ultimately we would need to use subject matter expertise to determine which variables to include in the final model based on existing knowledge about real estate and house prices.

The following tutorials provide additional information about regression models:

How to Read and Interpret a Regression Table
How to Interpret Regression Coefficients

Cite this article

stats writer (2024). What are the significant variables in regression models?. PSYCHOLOGICAL SCALES. Retrieved from https://scales.arabpsychology.com/stats/what-are-the-significant-variables-in-regression-models/

stats writer. "What are the significant variables in regression models?." PSYCHOLOGICAL SCALES, 23 Jun. 2024, https://scales.arabpsychology.com/stats/what-are-the-significant-variables-in-regression-models/.

stats writer. "What are the significant variables in regression models?." PSYCHOLOGICAL SCALES, 2024. https://scales.arabpsychology.com/stats/what-are-the-significant-variables-in-regression-models/.

stats writer (2024) 'What are the significant variables in regression models?', PSYCHOLOGICAL SCALES. Available at: https://scales.arabpsychology.com/stats/what-are-the-significant-variables-in-regression-models/.

[1] stats writer, "What are the significant variables in regression models?," PSYCHOLOGICAL SCALES, vol. X, no. Y, ص Z-Z, June, 2024.

stats writer. What are the significant variables in regression models?. PSYCHOLOGICAL SCALES. 2024;vol(issue):pages.

Download Post (.PDF)
Slide Up
x
PDF
Scroll to Top