How do I use aggfunc in crosstab() Function?

The aggfunc argument in the crosstab() function is used to specify the type of aggregation that should be used when computing values in the crosstab. This argument accepts a function or a list of functions that will be applied to the columns of the crosstab to generate the resulting values. The default aggregation is the sum, but other aggregations like mean, count, etc. can also be applied.


You can use the aggfunc argument within the pandas crosstab() function to create a crosstab that aggregates values using a specific metric:

pd.crosstab(index=df.col1, columns=df.col2, values=df.col3, aggfunc='count')

The default value for aggfunc is ‘count’ but you can specify other aggregation methods such as mean, median, sum, min, max, etc.

You can also specify multiple aggregation methods in the aggfunc argument:

pd.crosstab(index=df.col1, columns=df.col2, values=df.col3, aggfunc=['min', 'max'])

The following examples show how to use each of these methods in practice with the following pandas DataFrame:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C', 'C', 'C'],
                   'position':['G', 'G', 'F', 'G', 'F', 'F', 'F', 'G', 'G', 'F', 'F'],
                   'points': [22, 25, 24, 39, 34, 20, 18, 17, 20, 19, 22]})

#view DataFrame
print(df)

   team position  points
0     A        G      22
1     A        G      25
2     A        F      24
3     B        G      39
4     B        F      34
5     B        F      20
6     B        F      18
7     C        G      17
8     C        G      20
9     C        F      19
10    C        F      22

Example 1: Create Crosstab with One Value in aggfunc

We can use the following crosstab() function with the argument aggfunc=’mean’ to create a crosstab that displays the mean value points for each combination of position and team:

#create crosstab that displays mean points value by team and position
pd.crosstab(index=df.team, columns=df.position, values=df.points, aggfunc='mean')

position      F	     G
team		
A	   24.0	  23.5
B	   24.0	  39.0
C	   20.5	  18.5

Here is how to interpret the output:

  • The average points for players on team A in position F is 24.
  • The average points for players on team A in position G is 23.5.

And so on.

We can also use a different aggregation metric, such as the maximum value:

#create crosstab that displays max points value by team and position
pd.crosstab(index=df.team, columns=df.position, values=df.points, aggfunc='max')


position	F	 G
team		
A	       24	25
B	       34	39
C	       22	20

Here is how to interpret the output:

  • The max points for players on team A in position F is 24.
  • The max points for players on team A in position G is 25.

Example 2: Create Crosstab with Multiple Values in aggfunc

We can use the crosstab() function with multiple values in the aggfunc argument to aggregate the points values by multiple metrics for each combination of position and team:

#create crosstab that displays min and max points by team and position
pd.crosstab(df.team, df.position, df.points, aggfunc=['min', 'max']) 

	              min	       max
position	F	G	 F	 G
team				
A	       24	22	24	25
B	       18	39	34	39
C	       19	17	22	20

Here is how to interpret the output:

  • The minimum points value for players on team A in position F is 24.
  • The minimum points value for players on team A in position G is 22.
  • The maximum points value for players on team A in position F is 24.
  • The maximum points value for players on team A in position G is 25.

And so on.

Note: You can find the complete documentation for the pandas crosstab() function .

x