Table of Contents
Resampling a time series involves changing the frequency of the data, such as converting hourly data to daily data. This can be done using the groupby() function in Pandas, which allows you to group the data by a certain time period (e.g. month, year) and then apply a resampling method (e.g. sum, mean) to the grouped data. This allows you to aggregate the data based on a specific time period, making it easier to analyze and visualize trends in the data.
To resample time series data means to aggregate the data by a new time period.
If you’d like to resample a time series in pandas while using the groupby operator, you can use the following basic syntax:
grouper = df.groupby([pd.Grouper(freq='W'), 'store']) result = grouper['sales'].sum().unstack('store').fillna(0)
This particular example groups the rows in the DataFrame by the store column, then resamples the time series by week (freq=’W’), then calculates the sum of values in the sales column.
Note that we can resample the time series data by various time periods, including:
- S: Seconds
- min: Minutes
- H: Hours
- D: Day
- W: Week
- M: Month
- Q: Quarter
- A: Year
The following example shows how to resample time series data with a groupby operation in practice.
Example: Resample Time Series with groupby in Pandas
Suppose we have the following pandas DataFrame that shows the total sales made each day at two different stores:
import pandas as pd #create DataFrame df = pd.DataFrame({'sales': [13, 14, 17, 17, 16, 22, 28, 10, 17, 10, 11], 'store': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'B']}, index=pd.date_range('2023-01-06', '2023-01-16', freq='d')) #view DataFrame print(df) sales store 2023-01-06 13 A 2023-01-07 14 A 2023-01-08 17 A 2023-01-09 17 A 2023-01-10 16 A 2023-01-11 22 B 2023-01-12 28 B 2023-01-13 10 B 2023-01-14 17 B 2023-01-15 10 B 2023-01-16 11 B
Suppose we would like to group the rows by store, then resamples the time series by week, then calculates the sum of values in the sales column.
We can use the following syntax to do so:
#group by store and resample time series by week grouper = df.groupby([pd.Grouper(freq='W'), 'store']) #calculate sum of sales each week by store result = grouper['sales'].sum().unstack('store').fillna(0) #view results print(result) store A B 2023-01-08 14.0 0.0 2023-01-15 16.5 17.0 2023-01-22 0.0 11.0
From the output we cans see:
- The sum of sales on the week ending 1/8/2023 at store A is 14.
- The sum of sales on the week ending 1/8/2023 at store B is 0.
And so on.
Note that in this example we chose to calculate the sum of values in the sales column.
Simply replace sum() in the code above with count(), mean(), median(), etc. to calculate whatever metric you’d like.
Additional Resources
The following tutorials explain how to perform other common operations in Python: