How to calculate SST, SSR, and SSE in Python

SST, SSR, and SSE can be calculated in Python using the scipy.stats.linregress() function. This function calculates the slope, intercept, r-value, p-value, and standard error of a linear regression, as well as the sum of squares of the error (SSE), the total sum of squares (SST), and the regression sum of squares (SSR). These values can then be used to calculate the coefficient of determination (r-squared), which provides an indication of how well the linear regression model fits the data.


We often use three different values to measure how well a fits a dataset:

1. Sum of Squares Total (SST) – The sum of squared differences between individual data points (yi) and the mean of the response variable (y).

  • SST = Σ(yiy)2

2. Sum of Squares Regression (SSR) – The sum of squared differences between predicted data points (ŷi) and the mean of the response variable(y).

  • SSR = Σ(ŷiy)2

3. Sum of Squares Error (SSE) – The sum of squared differences between predicted data points (ŷi) and observed data points (yi).

  • SSE = Σ(ŷi – yi)2

The following step-by-step example shows how to calculate each of these metrics for a given regression model in Python.

Step 1: Create the Data

First, let’s create a dataset that contains the number of hours studied and exam score received for 20 different students at a certain university:

import pandas as pd

#create pandas DataFrame
df = pd.DataFrame({'hours': [1, 1, 1, 2, 2, 2, 2, 2, 3, 3,
                             3, 4, 4, 4, 5, 5, 6, 7, 7, 8],
                   'score': [68, 76, 74, 80, 76, 78, 81, 84, 86, 83,
                             88, 85, 89, 94, 93, 94, 96, 89, 92, 97]})

#view first five rows of DataFrame
df.head()

	hours	score
0	1	68
1	1	76
2	1	74
3	2	80
4	2	76

Step 2: Fit a Regression Model

Next, we’ll use the OLS() function from the library to fit a simple linear regression model using score as the response variable and hours as the predictor variable:

import statsmodels.api as sm

#define response variable
y = df['score']

#define predictor variable
x = df[['hours']]

#add constant to predictor variables
x = sm.add_constant(x)

#fit linear regression model
model = sm.OLS(y, x).fit()

Step 3: Calculate SST, SSR, and SSE

Lastly, we can use the following formulas to calculate the SST, SSR, and SSE values of the model:

import numpy as np

#calculate sse
sse = np.sum((model.fittedvalues - df.score)**2)
print(sse)

331.07488479262696

#calculate ssr
ssr = np.sum((model.fittedvalues - df.score.mean())**2)
print(ssr)

917.4751152073725

#calculate sst
sst = ssr + sse
print(sst)

1248.5499999999995

  • Sum of Squares Total (SST): 1248.55
  • Sum of Squares Regression (SSR): 917.4751
  • Sum of Squares Error (SSE): 331.0749

We can verify that SST = SSR + SSE:

  • SST = SSR + SSE
  • 1248.55 = 917.4751 + 331.0749

You can use the following calculators to automatically calculate SST, SSR, and SSE for any simple linear regression line:

The following tutorials explain how to calculate SST, SSR, and SSE in other statistical software:

x