## How to use pandas to do groupby and value counts?

Pandas’ groupby and value_counts functions can be used to quickly explore and analyze categorical data. Groupby allows you to group a dataframe by a column, and then perform an aggregate function on the groups, such as counting the number of rows in each group. The value_counts function provides a convenient way to quickly get the counts of each unique value in a particular column. Together, these functions provide an efficient way to summarize and explore the data.


You can use the following basic syntax to count the frequency of unique values by group in a pandas DataFrame:

df.groupby(['column1', 'column2']).size().unstack(fill_value=0)

The following example shows how to use this syntax in practice.

Example: Use GroupBy and Value Counts in Pandas

Suppose we have the following pandas DataFrame:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B'],
                   'position':['G', 'G', 'F', 'F', 'C', 'G', 'F', 'F', 'F', 'F'],
                   'points': [8, 8, 10, 10, 11, 8, 9, 10, 10, 10]})

#view DataFrame
print(df)

  team position  points
0    A        G       8
1    A        G       8
2    A        F      10
3    A        F      10
4    A        C      11
5    B        G       8
6    B        F       9
7    B        F      10
8    B        F      10
9    B        F      10

We can use the following syntax to count the frequency of the points values, grouped by the team and position columns:

#count frequency of points values, grouped by team and position
df.groupby(['team', 'position', 'points']).size().unstack(fill_value=0)

	points	8	9	10	11
team	position				
A	C	0	0	0	1
        F	0	0	2	0
        G	2	0	0	0
B	F	0	1	3	0
        G	1	0	0	0

Here’s how to interpret the output:

  • The value 8 occurred in the points column 0 times for players on team A and position C.
  • The value 9 occurred in the points column 0 times for players on team A and position C.
  • The value 10 occurred in the points column 0 times for players on team A and position C.
  • The value 11 occurred in the points column 1 time for players on team A and position C.

And so on.

We could also use the following syntax to count the frequency of the positions, grouped by team:

#count frequency of positions, grouped by team
df.groupby(['team', 'position']).size().unstack(fill_value=0)

position	C	F	G
team			
A	        1	2	2
B	        0	4	1

Here’s how to interpret the output:

  • The value ‘C’ occurred 1 time on team A.
  • The value ‘F’ occurred 2 times on team A.
  • The value ‘G’ occurred 2 times on team A.
  • The value ‘C’ occurred 0 times on team B.
  • The value ‘F’ occurred 4 times on team B.
  • The value ‘G’ occurred 1 time on team B.

And so on.

x