Table of Contents
Creating a frequency table based on multiple columns in pandas can be done by using the crosstab function. This function takes the columns of interest as parameters and returns a frequency table showing the count of each combination of the values in the given columns. The table can then be further manipulated to get additional information, such as creating a percentage column or sorting the data.
You can use the following basic syntax to create a frequency table in pandas based on multiple columns:
df.value_counts(['column1', 'column2'])
The following example shows how to use this syntax in practice.
Example: Create Frequency Table in Pandas Based on Multiple Columns
Suppose we have the following pandas DataFrame that contains information on team name, position, and points scored by various basketball players:
import pandas as pd #create DataFrame df = pd.DataFrame({'team' : ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'], 'position' : ['G', 'G', 'G', 'F', 'G', 'G', 'F', 'F'], 'points': [24, 33, 20, 15, 16, 16, 29, 25]}) #view DataFrame print(df) team position points 0 A G 24 1 A G 33 2 A G 20 3 A F 15 4 B G 16 5 B G 16 6 B F 29 7 B F 25
We can use the value_counts() function to create a frequency table that shows the occurrence of each combination of values in the team and position columns:
#count frequency of values in team and position columns
df.value_counts(['team', 'position'])
team position
A G 3
B F 2
G 2
A F 1
dtype: int64
From the results we can see:
- There are 3 occurrences of team A and position G
- There are 2 occurrences of team B and position F
- There are 2 occurrences of team B and position G
- There is 1 occurrence of team A and position F
Note that we can use reset_index() to return a DataFrame as a result instead:
#count frequency of values in team and position columns and return DataFrame
df.value_counts(['team', 'position']).reset_index()
team position 0
0 A G 3
1 B F 2
2 B G 2
3 A F 1
We can use the rename() function to rename the column that contains the counts:
#get frequency of values in team and position column and rename count column df.value_counts(['team', 'position']).reset_index().rename(columns={0:'count'}) team position count 0 A G 3 1 B F 2 2 B G 2 3 A F 1
The end result is a DataFrame that contains the frequency of each unique combination of values in the team and position columns.