Q: Calculate & Plot a CDF in Python

A: A cumulative distribution function (CDF) is a type of function that allows you to quickly calculate and plot the cumulative probability of a given set of data. In Python, the NumPy and Matplotlib packages can be used to calculate and plot a CDF. To calculate a CDF, you must first convert your data to a NumPy array and then use the np.histogram() and plt.plot() functions to generate the plot. You can then use the plt.show() function to display the plot.


You can use the following basic syntax to calculate the cumulative distribution function (CDF) in Python:

#sort data
x = np.sort(data)

#calculate CDF values
y = 1. * np.arange(len(data)) / (len(data) - 1)

#plot CDF
plt.plot(x, y)

The following examples show how to use this syntax in practice.

Example 1: CDF of Random Distribution

The following code shows how to calculate and plot a cumulative distribution function (CDF) for a random sample of data in Python:

import numpy as np
import matplotlib.pyplot as plt

#define random sample of data
data = np.random.randn(10000)

#sort data
x = np.sort(data)

#calculate CDF values
y = 1. * np.arange(len(data)) / (len(data) - 1)

#plot CDF
plt.plot(x, y)
plt.xlabel('x')

The x-axis displays the raw data values and the y-axis displays the corresponding CDF values.

Example 2: CDF of Normal Distribution

If you’d like to plot the cumulative distribution function of a known distribution (such as the ) then you can use the following functions from the library:

import numpy as np
import scipy
import matplotlib.pyplot as plt

#generate data from normal distribution
data = np.random.randn(1000)

#sort data
x = np.sort(data)

#calculate CDF values
y = scipy.stats.norm.cdf(x)

#plot CDF
plt.plot(data_sorted, norm_cdf)

#plot CDF
plt.plot(x, y)
plt.xlabel('x')

x