How to Create a Pareto Chart in Python

A Pareto chart is a graphical representation of data showing the relative importance of different factors. It can be created in Python using the matplotlib library. The chart is generated by plotting the cumulative frequency of the data values on the x-axis and the corresponding cumulative percentage on the y-axis. The chart can be customized by changing the parameters like the size of the plot, the color of the bars, and the labels. Once the chart is generated, it can be saved in a file or printed out.


A Pareto chart is a type of chart that displays the ordered frequencies of categories along with the cumulative frequencies of categories.

Pareto chart in Python

This tutorial provides a step-by-step example of how to create a Pareto chart in Python.

Step 1: Create the Data

Suppose we conduct a survey in which we ask 350 different people to identify their favorite cereal brand between brands A, B, C, D, and E.

We can create the following pandas DataFrame to hold the results of the survey:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'count': [97, 140, 58, 6, 17, 32]})
df.index = ['B', 'A', 'C', 'F', 'E', 'D']

#sort DataFrame by count descending
df = df.sort_values(by='count', ascending=False)

#add column to display cumulative percentage
df['cumperc'] = df['count'].cumsum()/df['count'].sum()*100

#view DataFrame
df

	count	cumperc
A	140	40.000000
B	97	67.714286
C	58	84.285714
D	32	93.428571
E	17	98.285714
F	6	100.000000

Step 2: Create the Pareto Chart

We can use the following code to create the Pareto chart:

import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter

#define aesthetics for plot
color1 = 'steelblue'
color2 = 'red'
line_size = 4

#create basic bar plot
fig, ax = plt.subplots()
ax.bar(df.index, df['count'], color=color1)

#add cumulative percentage line to plot
ax2 = ax.twinx()
ax2.plot(df.index, df['cumperc'], color=color2, marker="D", ms=line_size)
ax2.yaxis.set_major_formatter(PercentFormatter())

#specify axis colors
ax.tick_params(axis='y', colors=color1)
ax2.tick_params(axis='y', colors=color2)

#display Pareto chart
plt.show()

Pareto chart in Python

The x-axis displays the different brands ordered from highest to lowest frequency.

The left-hand y-axis shows the frequency of each brand and the right-hand y-axis shows the cumulative frequency of the brands.

For example, we can see:

  • Brand A accounts for about 40% of total survey responses.
  • Brands A and B account for about 70% of total survey responses.
  • Brands A, B, and C account for about 85% of total survey responses.

And so on.

Step 3: Customize the Pareto Chart (Optional)

For example, we could change the bars to be pink and change the line to be purple and slightly thicker:

import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter

#define aesthetics for plot
color1 = 'pink'
color2 = 'purple'
line_size = 6

#create basic bar plot
fig, ax = plt.subplots()
ax.bar(df.index, df['count'], color=color1)

#add cumulative percentage line to plot
ax2 = ax.twinx()
ax2.plot(df.index, df['cumperc'], color=color2, marker="D", ms=line_size)
ax2.yaxis.set_major_formatter(PercentFormatter())

#specify axis colors
ax.tick_params(axis='y', colors=color1)
ax2.tick_params(axis='y', colors=color2)

#display Pareto chart
plt.show()

The following tutorials explain how to create other common visualizations in Python:

x