Table of Contents
Quadratic Discriminant Analysis (QDA) is a classification technique used in machine learning. It is used to separate a set of data into different classes by finding a set of decision boundaries. In Python, QDA can be implemented using the Scikit-learn library. The process involves importing the necessary libraries, loading the data, fitting the model, predicting the results, and evaluating the model. Once understood, this process can be easily applied to any dataset to get accurate predictions.
Quadratic discriminant analysis is a method you can use when you have a set of predictor variables and you’d like to classify a response variable into two or more classes.
It is considered to be the non-linear equivalent to linear discriminant analysis.
This tutorial provides a step-by-step example of how to perform quadratic discriminant analysis in Python.
Step 1: Load Necessary Libraries
First, we’ll load the necessary functions and libraries for this example:
from sklearn.model_selection import train_test_split
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.model_selection import cross_val_score
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
from sklearn import datasets
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
Step 2: Load the Data
For this example, we’ll use the iris dataset from the sklearn library. The following code shows how to load this dataset and convert it to a pandas DataFrame to make it easy to work with:
#load iris dataset iris = datasets.load_iris() #convert dataset to pandas DataFrame df = pd.DataFrame(data = np.c_[iris['data'], iris['target']], columns = iris['feature_names'] + ['target']) df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names) df.columns = ['s_length', 's_width', 'p_length', 'p_width', 'target', 'species'] #view first six rows of DataFrame df.head() s_length s_width p_length p_width target species 0 5.1 3.5 1.4 0.2 0.0 setosa 1 4.9 3.0 1.4 0.2 0.0 setosa 2 4.7 3.2 1.3 0.2 0.0 setosa 3 4.6 3.1 1.5 0.2 0.0 setosa 4 5.0 3.6 1.4 0.2 0.0 setosa #find how many total observations are in dataset len(df.index) 150
We can see that the dataset contains 150 total observations.
For this example we’ll build a quadratic discriminant analysis model to classify which species a given flower belongs to.
We’ll use the following predictor variables in the model:
- Sepal length
- Sepal width
- Petal length
- Petal width
And we’ll use them to predict the response variable Species, which takes on the following three potential classes:
- setosa
- versicolor
- virginica
Step 3: Fit the QDA Model
Next, we’ll fit the QDA model to our data using the QuadraticDiscriminantAnalsyis function from sklearn:
#define predictor and response variables X = df[['s_length', 's_width', 'p_length', 'p_width']] y = df['species'] #Fit the QDA model model = QuadraticDiscriminantAnalysis() model.fit(X, y)
Step 4: Use the Model to Make Predictions
Once we’ve fit the model using our data, we can evaluate how well the model performed by using repeated stratified k-fold cross validation.
For this example, we’ll use 10 folds and 3 repeats:
#Define method to evaluate model
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
#evaluate model
scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)
print(np.mean(scores))
0.97333333333334
We can see that the model performed a mean accuracy of 97.33%.
We can also use the model to predict which class a new flower belongs to, based on input values:
#define new observation new = [5, 3, 1, .4] #predict which class the new observation belongs to model.predict([new]) array(['setosa'], dtype='<U10')
We can see that the model predicts this new observation to belong to the species called setosa.
You can find the complete Python code used in this tutorial here.