How do I create a Seaborn scatterplot with a correlation coefficient?

A Seaborn scatterplot with a correlation coefficient can be created by using the Seaborn’s regplot() function and passing the data as arguments. This will create a scatterplot based on the data provided and will also display the correlation coefficient in the top-right corner of the plot. Additionally, you can use the annotate parameter to add the correlation coefficient value to the plot in a text box.


You can use the following basic syntax to create a scatterplot in seaborn and add a to the plot:

import scipy
import matplotlib.pyplot as plt
import seaborn as sns

#calculate correlation coefficient between x and y
r = scipy.stats.pearsonr(x=df.x, y=df.y)[0]

#create scatterplot
sns.scatterplot(data=df, x=df.x, y=df.y)

#add correlation coefficient to plot
plt.text(5, 30, 'r = ' + str(round(r, 2)))

The following example shows how to use this syntax in practice.

Example: Create Seaborn Scatterplot with Correlation Coefficient

Suppose we have the following pandas DataFrame that shows the points and assists for various basketball players:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'B', 'C', 'C', 'C', 'D', 'D'],
                   'points': [12, 11, 18, 15, 14, 20, 25, 24, 32, 30],
                   'assists': [4, 7, 7, 8, 9, 10, 10, 12, 10, 15]})

#view DataFrame
print(df)

  team  points  assists
0    A      12        4
1    A      11        7
2    A      18        7
3    A      15        8
4    B      14        9
5    C      20       10
6    C      25       10
7    C      24       12
8    D      32       10
9    D      30       15

We can use the following syntax to create a scatterplot to visualize the relationship between assists and points and also use the pearsonr() function from scipy to calculate the correlation coefficient between these two variables:

import scipy
import matplotlib.pyplot as plt
import seaborn as sns

#calculate correlation coefficient between assists and points
r = scipy.stats.pearsonr(x=df.assists, y=df.points)[0]

#create scatterplot
sns.scatterplot(data=df, x=df.assists, y=df.points)

#add correlation coefficient to plot
plt.text(5, 30, 'r = ' + str(round(r, 2)))

seaborn scatterplot with correlation coefficient

From the output we can see that the Pearson correlation coefficient between assists and points is 0.78.

Related:

Note that we used the round() function to round the correlation coefficient to two decimal places.

Feel free to round to a different number of decimal places and also feel free to use the fontsize argument to change the font size of the correlation coefficient on the plot:

import scipy
import matplotlib.pyplot as plt
import seaborn as sns

#calculate correlation coefficient between assists and points
r = scipy.stats.pearsonr(x=df.assists, y=df.points)[0]

#create scatterplot
sns.scatterplot(data=df, x=df.assists, y=df.points)

#add correlation coefficient to plot
plt.text(5, 30, 'r = ' + str(round(r, 4)), fontsize=20))

Notice that the correlation coefficient is now rounded to four decimal places and the font size is much larger than the previous example.

Note: You can find the complete documentation for the seaborn scatterplot() function .

x