Table of Contents
In order to make predictions using a regression model in Statsmodels, we first need to have a dataset with the variables we want to analyze. Then, we can use the Statsmodels library in Python to fit the dataset to a chosen regression model, such as linear or logistic regression. This involves finding the best parameters for the model that will minimize the error between the predicted values and the actual values in the dataset. Once the model is fitted, we can use it to make predictions for new data by inputting the relevant variables and using the model’s coefficients to calculate the predicted outcome. This process allows us to analyze relationships between variables and make accurate predictions based on the data.
Make Predictions Using Regression Model in Statsmodels
You can use the following basic syntax to use a regression model fit using the module in Python to make predictions on new observations:
model.predict(df_new)
This particular syntax will calculate the predicted response values for each row in a new DataFrame called df_new, using a regression model fit with statsmodels called model.
The following example shows how to use this syntax in practice.
Example: Make Predictions Using Regression Model in Statsmodels
Suppose we have the following pandas DataFrame that contains information about hours studied, prep exams taken, and final score received by students in a certain class:
import pandas as pd #create DataFrame df = pd.DataFrame({'hours': [1, 2, 2, 4, 2, 1, 5, 4, 2, 4, 4, 3, 6], 'exams': [1, 3, 3, 5, 2, 2, 1, 1, 0, 3, 4, 3, 2], 'score': [76, 78, 85, 88, 72, 69, 94, 94, 88, 92, 90, 75, 96]}) #view head of DataFrame df.head() hours exams score 0 1 1 76 1 2 3 78 2 2 3 85 3 4 5 88 4 2 2 72
We can use the OLS() function from the statsmodels module to fit a , using “hours” and “exams” as the predictor variables and “score” as the response variable:
import statsmodels.apias sm #define predictor and response variables y = df['score'] x = df[['hours', 'exams']] #add constant to predictor variables x = sm.add_constant(x) #fit linear regression model model = sm.OLS(y, x).fit() #view model summary print(model.summary()) OLS Regression Results ============================================================================== Dep. Variable: score R-squared: 0.718 Model: OLS Adj. R-squared: 0.661 Method: Least Squares F-statistic: 12.70 Date: Fri, 05 Aug 2022 Prob (F-statistic): 0.00180 Time: 09:24:38 Log-Likelihood: -38.618 No. Observations: 13 AIC: 83.24 Df Residuals: 10 BIC: 84.93 Df Model: 2 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ const 71.4048 4.001 17.847 0.000 62.490 80.319 hours 5.1275 1.018 5.038 0.001 2.860 7.395 exams -1.2121 1.147 -1.057 0.315 -3.768 1.344 ============================================================================== Omnibus: 1.103 Durbin-Watson: 1.248 Prob(Omnibus): 0.576 Jarque-Bera (JB): 0.803 Skew: -0.289 Prob(JB): 0.669 Kurtosis: 1.928 Cond. No. 11.7 ==============================================================================
From the coef column in the output, we can write the fitted regression model:
Score = 71.4048 + 5.1275(hours) – 1.2121(exams)
Now suppose we would like to use the fitted regression model to predict the “score” for five new students.
First, let’s create a DataFrame to hold the five new observations:
#create new DataFrame
df_new = pd.DataFrame({'hours': [1, 2, 2, 4, 5],
'exams': [1, 1, 4, 3, 3]})
#add column for constant
df_new = sm.add_constant(df_new)
#view new DataFrame
print(df_new)
const hours exams
0 1.0 1 1
1 1.0 2 1
2 1.0 2 4
3 1.0 4 3
4 1.0 5 3Next, we can use the predict() function to predict the “score” for each of these students, using “hours” and “exams” as the values for the predictor variables in our fitted regression model:
#predict scores for the five new students model.predict(df_new) 0 75.320242 1 80.447734 2 76.811480 3 88.278550 4 93.406042 dtype: float64
Here’s how to interpret the output:
- The first student in the new DataFrame is predicted to get a score of 75.32.
- The second student in the new DataFrame is predicted to get a score of 80.45.
And so on.
To understand how these predictions were calculated, we need to refer to the fitted regression model from earlier:
Score = 71.4048 + 5.1275(hours) – 1.2121(exams)
By plugging in the values for “hours” and “exams” for the new students, we can calculate their predicted score.
For example, the first student in the new DataFrame had a value of 1 for hours and a value of 1 for exams.
Thus, their predicted score was calculated as:
Score = 71.4048 + 5.1275(1) – 1.2121(1) = 75.32.
The score of each student in the new DataFrame was calculated in a similar manner.
The following tutorials explain how to perform other common tasks in Python:
Cite this article
stats writer (2024). How do we make predictions using a regression model in Statsmodels?. PSYCHOLOGICAL SCALES. Retrieved from https://scales.arabpsychology.com/stats/how-do-we-make-predictions-using-a-regression-model-in-statsmodels/
stats writer. "How do we make predictions using a regression model in Statsmodels?." PSYCHOLOGICAL SCALES, 27 Jun. 2024, https://scales.arabpsychology.com/stats/how-do-we-make-predictions-using-a-regression-model-in-statsmodels/.
stats writer. "How do we make predictions using a regression model in Statsmodels?." PSYCHOLOGICAL SCALES, 2024. https://scales.arabpsychology.com/stats/how-do-we-make-predictions-using-a-regression-model-in-statsmodels/.
stats writer (2024) 'How do we make predictions using a regression model in Statsmodels?', PSYCHOLOGICAL SCALES. Available at: https://scales.arabpsychology.com/stats/how-do-we-make-predictions-using-a-regression-model-in-statsmodels/.
[1] stats writer, "How do we make predictions using a regression model in Statsmodels?," PSYCHOLOGICAL SCALES, vol. X, no. Y, ص Z-Z, June, 2024.
stats writer. How do we make predictions using a regression model in Statsmodels?. PSYCHOLOGICAL SCALES. 2024;vol(issue):pages.
