How to Calculate Mahalanobis Distance in R

Mahalanobis distance is a measure of how far an observation is from the mean of a group of data points, taking into account the correlation between the variables. In R, you can calculate Mahalanobis distance using the mvnorm package. The function mahalanobis() takes a data set and a vector of observations to calculate the distance. You can also use the mahalanobis() function to calculate the distance between two or more observations.


The Mahalanobis distance is the distance between two points in a multivariate space.

It is often used to find outliers in statistical analyses that involve several variables.

This tutorial explains how to calculate the Mahalanobis distance in R.

Example: Mahalanobis Distance in R

Use the following steps to calculate the Mahalanobis distance for every in a dataset in R.

Step 1: Create the dataset.

First, we’ll create a dataset that displays the exam score of 20 students along with the number of hours they spent studying, the number of prep exams they took, and their current grade in the course:

#create data
df = data.frame(score = c(91, 93, 72, 87, 86, 73, 68, 87, 78, 99, 95, 76, 84, 96, 76, 80, 83, 84, 73, 74),
        hours = c(16, 6, 3, 1, 2, 3, 2, 5, 2, 5, 2, 3, 4, 3, 3, 3, 4, 3, 4, 4),
        prep = c(3, 4, 0, 3, 4, 0, 1, 2, 1, 2, 3, 3, 3, 2, 2, 2, 3, 3, 2, 2),
        grade = c(70, 88, 80, 83, 88, 84, 78, 94, 90, 93, 89, 82, 95, 94, 81, 93, 93, 90, 89, 89))

#view first six rows of data
head(df)

  score hours prep grade
1    91    16    3    70
2    93     6    4    88
3    72     3    0    80
4    87     1    3    83
5    86     2    4    88
6    73     3    0    84

Step 2: Calculate the Mahalanobis distance for each observation.

Next, we’ll use the built-in mahalanobis() function in R to calculate the Mahalanobis distance for each observation, which uses the following syntax:

mahalanobis(x, center, cov)

where:

  • x: matrix of data
  • center: mean vector of the distribution
  • cov: covariance matrix of the distribution

The following code shows how to implement this function for our dataset:

#calculate Mahalanobis distance for each observation
mahalanobis(df, colMeans(df), cov(df))

 [1] 16.5019630  2.6392864  4.8507973  5.2012612  3.8287341  4.0905633
 [7]  4.2836303  2.4198736  1.6519576  5.6578253  3.9658770  2.9350178
[13]  2.8102109  4.3682945  1.5610165  1.4595069  2.0245748  0.7502536
[19]  2.7351292  2.2642268

Step 3: Calculate the p-value for each Mahalanobis distance.

x