How to Resample Time Series with groupby() in Pandas?

The groupby() and resample() methods only work with Pandas dataframe objects, not with Pandas Series. If you want to use these methods, you need to convert the Series to a DataFrame first. Then you can use the groupby() and resample() methods on the DataFrame to get the desired output.


To resample time series data means to aggregate the data by a new time period.

If you’d like to resample a time series in pandas while using the groupby operator, you can use the following basic syntax:

grouper = df.groupby([pd.Grouper(freq='W'), 'store'])

result = grouper['sales'].sum().unstack('store').fillna(0) 

This particular example groups the rows in the DataFrame by the store column, then resamples the time series by week (freq=’W’), then calculates the sum of values in the sales column.

Note that we can resample the time series data by various time periods, including:

  • S: Seconds
  • min: Minutes
  • H: Hours
  • D: Day
  • W: Week
  • M: Month
  • Q: Quarter
  • A: Year

The following example shows how to resample time series data with a groupby operation in practice.

Example: Resample Time Series with groupby in Pandas

Suppose we have the following pandas DataFrame that shows the total sales made each day at two different stores:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'sales': [13, 14, 17, 17, 16, 22, 28, 10, 17, 10, 11],
                   'store': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'B']},
                   index=pd.date_range('2023-01-06', '2023-01-16', freq='d'))

#view DataFrame
print(df)

            sales store
2023-01-06     13     A
2023-01-07     14     A
2023-01-08     17     A
2023-01-09     17     A
2023-01-10     16     A
2023-01-11     22     B
2023-01-12     28     B
2023-01-13     10     B
2023-01-14     17     B
2023-01-15     10     B
2023-01-16     11     B

Suppose we would like to group the rows by store, then resamples the time series by week, then calculates the sum of values in the sales column.

We can use the following syntax to do so:

#group by store and resample time series by week
grouper = df.groupby([pd.Grouper(freq='W'), 'store'])

#calculate sum of sales each week by store
result = grouper['sales'].sum().unstack('store').fillna(0)

#view results
print(result)

store          A     B
2023-01-08  14.0   0.0
2023-01-15  16.5  17.0
2023-01-22   0.0  11.0

From the output we cans see:

  • The sum of sales on the week ending 1/8/2023 at store A is 14.
  • The sum of sales on the week ending 1/8/2023 at store B is 0.

And so on.

Note that in this example we chose to calculate the sum of values in the sales column.

Simply replace sum() in the code above with count(), mean(), median(), etc. to calculate whatever metric you’d like.

The following tutorials explain how to perform other common operations in Python:

 

x