How to create frequency table based on multiple columns in pandas

Creating a frequency table based on multiple columns in pandas can be done by using the crosstab function. This function takes the columns of interest as parameters and returns a frequency table showing the count of each combination of the values in the given columns. The table can then be further manipulated to get additional information, such as creating a percentage column or sorting the data.


You can use the following basic syntax to create a frequency table in pandas based on multiple columns:

df.value_counts(['column1', 'column2'])

The following example shows how to use this syntax in practice.

Example: Create Frequency Table in Pandas Based on Multiple Columns

Suppose we have the following pandas DataFrame that contains information on team name, position, and points scored by various basketball players:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team' : ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
                   'position' : ['G', 'G', 'G', 'F', 'G', 'G', 'F', 'F'],
                   'points': [24, 33, 20, 15, 16, 16, 29, 25]})

#view DataFrame
print(df)

  team position  points
0    A        G      24
1    A        G      33
2    A        G      20
3    A        F      15
4    B        G      16
5    B        G      16
6    B        F      29
7    B        F      25

We can use the value_counts() function to create a frequency table that shows the occurrence of each combination of values in the team and position columns:

#count frequency of values in team and position columns
df.value_counts(['team', 'position'])

team  position
A     G           3
B     F           2
      G           2
A     F           1
dtype: int64

From the results we can see:

  • There are 3 occurrences of team A and position G
  • There are 2 occurrences of team B and position F
  • There are 2 occurrences of team B and position G
  • There is 1 occurrence of team A and position F

Note that we can use reset_index() to return a DataFrame as a result instead:

#count frequency of values in team and position columns and return DataFrame
df.value_counts(['team', 'position']).reset_index()

        team	position  0
0	A	G	  3
1	B	F	  2
2	B	G	  2
3	A	F	  1

We can use the rename() function to rename the column that contains the counts:

#get frequency of values in team and position column and rename count column
df.value_counts(['team', 'position']).reset_index().rename(columns={0:'count'})

        team	position  count
0	A	G	  3
1	B	F	  2
2	B	G	  2
3	A	F	  1

The end result is a DataFrame that contains the frequency of each unique combination of values in the team and position columns.

x