Table of Contents

Resampling a time series involves changing the frequency of the data, such as converting hourly data to daily data. This can be done using the groupby() function in Pandas, which allows you to group the data by a certain time period (e.g. month, year) and then apply a resampling method (e.g. sum, mean) to the grouped data. This allows you to aggregate the data based on a specific time period, making it easier to analyze and visualize trends in the data.

To resample time series data means to aggregate the data by a new time period.

If you’d like to resample a time series in pandas while using the groupby operator, you can use the following basic syntax:

grouper = df.groupby([pd.Grouper(freq='W'), 'store'])

result = grouper['sales'].sum().unstack('store').fillna(0)

This particular example groups the rows in the DataFrame by the store column, then resamples the time series by week (freq=’W’), then calculates the sum of values in the sales column.

Note that we can resample the time series data by various time periods, including:

S: Seconds
min: Minutes
H: Hours
D: Day
W: Week
M: Month
Q: Quarter
A: Year

The following example shows how to resample time series data with a groupby operation in practice.

Example: Resample Time Series with groupby in Pandas

Suppose we have the following pandas DataFrame that shows the total sales made each day at two different stores:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'sales': [13, 14, 17, 17, 16, 22, 28, 10, 17, 10, 11],
                   'store': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'B']},
                   index=pd.date_range('2023-01-06', '2023-01-16', freq='d'))

#view DataFrame
print(df)

            sales store
2023-01-06     13     A
2023-01-07     14     A
2023-01-08     17     A
2023-01-09     17     A
2023-01-10     16     A
2023-01-11     22     B
2023-01-12     28     B
2023-01-13     10     B
2023-01-14     17     B
2023-01-15     10     B
2023-01-16     11     B

Suppose we would like to group the rows by store, then resamples the time series by week, then calculates the sum of values in the sales column.

We can use the following syntax to do so:

#group by store and resample time series by week
grouper = df.groupby([pd.Grouper(freq='W'), 'store'])

#calculate sum of sales each week by store
result = grouper['sales'].sum().unstack('store').fillna(0)

#view results
print(result)

store          A     B
2023-01-08  14.0   0.0
2023-01-15  16.5  17.0
2023-01-22   0.0  11.0

From the output we cans see:

The sum of sales on the week ending 1/8/2023 at store A is 14.
The sum of sales on the week ending 1/8/2023 at store B is 0.

And so on.

Note that in this example we chose to calculate the sum of values in the sales column.

Simply replace sum() in the code above with count(), mean(), median(), etc. to calculate whatever metric you’d like.

Additional Resources

The following tutorials explain how to perform other common operations in Python:

How can I resample a time series using groupby() in Pandas?

Example: Resample Time Series with groupby in Pandas

Additional Resources

Requst a

Scale

Example: Resample Time Series with groupby in Pandas

Additional Resources

Related terms:

Requst a

Scale