How to calculate Spearman Rank Correlation in Python


In statistics, correlation refers to the strength and direction of a relationship between two variables. The value of a correlation coefficient can range from -1 to 1, with the following interpretations:

  • -1: a perfect negative relationship between two variables
  • 0: no relationship between two variables
  • 1: a perfect positive relationship between two variables

One special type of correlation is called Spearman Rank Correlation, which is used to measure the correlation between two ranked variables. (e.g. rank of a student’s math exam score vs. rank of their science exam score in a class).

This tutorial explains how to calculate the Spearman rank correlation between two variables in Python

Example: Spearman Rank Correlation in Python

Suppose we have the following pandas DataFrame that contains the math exam score and science exam score of 10 students in a particular class:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'student': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'],
                   'math': [70, 78, 90, 87, 84, 86, 91, 74, 83, 85],
                   'science': [90, 94, 79, 86, 84, 83, 88, 92, 76, 75]})

To calculate the Spearman Rank correlation between the math and science scores, we can use the spearmanr() function from scipy.stats:

from scipy.stats import spearmanr

#calculate Spearman Rank correlation and corresponding p-value
rho, p = spearmanr(df['math'], df['science'])

#print Spearman rank correlation and p-value
print(rho)

-0.41818181818181815

print(p)

0.22911284098281892

From the output we can see that the Spearman rank correlation is -0.41818 and the corresponding p-value is 0.22911.

This indicates that there is a negative correlation between the science and math exam scores.

However, since the p-value of the correlation is not less than 0.05, the correlation is not statistically significant.

Note that we could also use the following syntax to just extract the correlation coefficient or the p-value:

#extract Spearman Rank correlation coefficient
spearmanr(df['math'], df['science'])[0]

-0.41818181818181815

#extract p-value of Spearman Rank correlation coefficient
spearmanr(df['math'], df['science'])[1] 

0.22911284098281892

How to Calculate Spearman Rank Correlation in R
How to Calculate Spearman Rank Correlation in Excel
How to Calculate Spearman Rank Correlation in Stata

x