Table of Contents

Ridge regression in Python can be done step-by-step by first importing necessary packages such as Scikit-learn, pandas, and numpy. After that, pre-process the data by converting categorical variables into dummy variables and standardizing the data. Then, create the X and y variables where X is the independent variables and y is the dependent variable. After that, use Scikit-learn’s Ridge model, which is an extension of the Linear Regression model, and fit the model using the X and y variables. Finally, use the model to predict the values of the dependent variables.

Ridge regression is a method we can use to fit a regression model when multicollinearity is present in the data.

In a nutshell, least squares regression tries to find coefficient estimates that minimize the sum of squared residuals (RSS):

RSS = Σ(y_i – ŷ_i)2

where:

Σ: A greek symbol that means sum
y_i: The actual response value for the i^th observation
ŷ_i: The predicted response value based on the multiple linear regression model

Conversely, ridge regression seeks to minimize the following:

RSS + λΣβ_j²

where j ranges from 1 to p predictor variables and λ ≥ 0.

This second term in the equation is known as a shrinkage penalty. In ridge regression, we select a value for λ that produces the lowest possible test MSE (mean squared error).

This tutorial provides a step-by-step example of how to perform ridge regression in Python.

Step 1: Import Necessary Packages

First, we’ll import the necessary packages to perform ridge regression in Python:

import pandas as pd
from numpy import arange
from sklearn.linear_model import Ridge
from sklearn.linear_model import RidgeCV
from sklearn.model_selection import RepeatedKFold

Step 2: Load the Data

For this example, we’ll use a dataset called mtcars, which contains information about 33 different cars. We’ll use hp as the response variable and the following variables as the predictors:

mpg
wt
drat
qsec

The following code shows how to load and view this dataset:

#define URL where data is located
url = "https://raw.githubusercontent.com/arabpsychology/Python-Guides/main/mtcars.csv"

#read in data
data_full = pd.read_csv(url)

#select subset of data
data = data_full[["mpg", "wt", "drat", "qsec", "hp"]]

#view first six rows of data
data[0:6]

	mpg	wt	drat	qsec	hp
0	21.0	2.620	3.90	16.46	110
1	21.0	2.875	3.90	17.02	110
2	22.8	2.320	3.85	18.61	93
3	21.4	3.215	3.08	19.44	110
4	18.7	3.440	3.15	17.02	175
5	18.1	3.460	2.76	20.22	105

Step 3: Fit the Ridge Regression Model

Next, we’ll use the RidgeCV() function from sklearn to fit the ridge regression model and we’ll use the RepeatedKFold() function to perform k-fold cross-validation to find the optimal alpha value to use for the penalty term.

Note: The term “alpha” is used instead of “lambda” in Python.

For this example we’ll choose k = 10 folds and repeat the cross-validation process 3 times.

Also note that RidgeCV() only tests alpha values .1, 1, and 10 by default. However, we can define our own alpha range from 0 to 1 by increments of 0.01:

#define predictor and response variables
X = data[["mpg", "wt", "drat", "qsec"]]
y = data["hp"]

#define cross-validation method to evaluate model
cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)

#define model
model = RidgeCV(alphas=arange(0, 1, 0.01), cv=cv, scoring='neg_mean_absolute_error')

#fit model
model.fit(X, y)

#display lambda that produced the lowest test MSE
print(model.alpha_)

0.99

The lambda value that minimizes the test MSE turns out to be 0.99.

Step 4: Use the Model to Make Predictions

Lastly, we can use the final ridge regression model to make predictions on new observations. For example, the following code shows how to define a new car with the following attributes:

mpg: 24
wt: 2.5
drat: 3.5
qsec: 18.5

The following code shows how to use the fitted ridge regression model to predict the value for hp of this new observation:

#define new observation
new = [24, 2.5, 3.5, 18.5]

#predict hp value using ridge regression model
model.predict([new])

array([104.16398018])

Based on the input values, the model predicts this car to have an hp value of 104.16398018.

You can find the complete Python code used in this example here.

How to do Ridge Regression in Python (Step-by-Step)

Step 1: Import Necessary Packages

Step 2: Load the Data

Step 3: Fit the Ridge Regression Model

Step 4: Use the Model to Make Predictions

Requst a

Scale

Step 1: Import Necessary Packages

Step 2: Load the Data

Step 3: Fit the Ridge Regression Model

Step 4: Use the Model to Make Predictions

Related terms:

Requst a

Scale