How can I group my dataframe by a range of values?

Grouping a dataframe by a range of values can be done using the cut() function, which allows you to create bins or intervals from numeric data. This function creates a new column in the dataframe that holds the group labels for each data point. You can then use the groupby() method to group the dataframe by the new column. This will return a grouped dataframe, in which each row corresponds to a range of values.


You can use the following syntax to use the groupby() function in pandas to group a column by a range of values before performing an aggregation:

df.groupby(pd.cut(df['my_column'], [0, 25, 50, 75, 100])).sum()

This particular example will group the rows of the DataFrame by the following range of values in the column called my_column:

  • (0, 25]
  • (25, 50]
  • (50, 75]
  • (75, 100]

It will then calculate the sum of values in all columns of the DataFrame using these ranges of values as the groups.

The following example shows how to use this syntax in practice.

Example: How to Groupby Range of Values in Pandas

Suppose we have the following pandas DataFrame that contains information about the size of different retail stores and their total sales:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'store_size': [14, 25, 26, 29, 45, 58, 67, 81, 90, 98],
                   'sales': [15, 18, 24, 25, 20, 35, 34, 49, 44, 49]})

#view DataFrame
print(df)

   store_size  sales
0          14     15
1          25     18
2          26     24
3          29     25
4          45     20
5          58     35
6          67     34
7          81     49
8          90     44
9          98     49

We can use the following syntax to group the DataFrame based on specific ranges of the store_size column and then calculate the sum of every other column in the DataFrame using the ranges as groups:

#group by ranges of store_size and calculate sum of all columns
df.groupby(pd.cut(df['store_size'], [0, 25, 50, 75, 100])).sum()

	 store_size    sales
store_size		
(0, 25]	         39	  33
(25, 50]	100	  69
(50, 75]	125	  69
(75, 100]	269	 142

From the output we can see:

  • For rows with a store_size value between 0 and 25, the sum of store_size is 39 and the sum of sales is 33.
  • For rows with a store_size value between 25 and 50, the sum of store_size is 100 and the sum of sales is 69.

And so on.

If you’d like, you can also calculate just the sum of sales for each range of store_size:

#group by ranges of store_size and calculate sum of sales
df.groupby(pd.cut(df['store_size'], [0, 25, 50, 75, 100]))['sales'].sum()

store_size
(0, 25]       33
(25, 50]      69
(50, 75]      69
(75, 100]    142
Name: sales, dtype: int64

You can also use the NumPy arange() function to cut a variable into ranges without manually specifying each cut point:

import numpy as np

#group by ranges of store_size and calculate sum of sales
df.groupby(pd.cut(df['store_size'], np.arange(0, 101, 25)))['sales'].sum()

store_size
(0, 25]       33
(25, 50]      69
(50, 75]      69
(75, 100]    142
Name: sales, dtype: int64

Notice that these results match the previous example.

Note: You can find the complete documentation for the NumPy arange() function .

x