What is the Best way to Resample Time Series Data in Python?

The best way to resample time series data in Python is to use a library such as Pandas, which provides a wide range of functions for manipulating time series data. This library allows users to easily resample data by frequency, series length, or time period. Additionally, Pandas also allows for other operations such as shifting, interpolating, and filling missing values. Using this library is a great way to quickly and easily manipulate time series data in Python.


To resample time series data means to summarize or aggregate the data by a new time period.

We can use the following basic syntax to resample time series data in Python:

#find sum of values in column1 by month
weekly_df['column1'] = df['column1'].resample('M').sum()

#find mean of values in column1 by week
weekly_df['column1'] = df['column1'].resample('W').mean() 

Note that we can resample the time series data by various time periods, including:

  • S: Seconds
  • min: Minutes
  • H: Hours
  • D: Day
  • W: Week
  • M: Month
  • Q: Quarter
  • A: Year

The following example shows how to resample time series data in practice.

Example: Resample Time Series Data in Python

Suppose we have the following pandas DataFrame that shows the total sales made each hour by some company during a one-year period:

import pandas as pd
import numpy as np

#make this example reproducible
np.random.seed(0)

#create DataFrame with hourly index
df = pd.DataFrame(index=pd.date_range('2020-01-06', '2020-12-27', freq='h'))

#add column to show sales by hour
df['sales'] = np.random.randint(low=0, high=20, size=len(df.index))

#view first five rows of DataFrame
df.head()

	             sales
2020-01-06 00:00:00	12
2020-01-06 01:00:00	15
2020-01-06 02:00:00	0
2020-01-06 03:00:00	3
2020-01-06 04:00:00	3

If we create a line plot to visualize the sales data, it would look like this:

import matplotlib.pyplot as plt

#plot time series data
plt.plot(df.index, df.sales, linewidth=3)

This plot is difficult to interpret, so we may instead summarize the sales data by week:

#create new DataFrame
weekly_df = pd.DataFrame()

#create 'sales' column that summarizes total sales by week
weekly_df['sales'] = df['sales'].resample('W').sum()

#view first five rows of DataFrame
weekly_df.head()

                sales
2020-01-12	1519
2020-01-19	1589
2020-01-26	1540
2020-02-02	1562
2020-02-09	1614

This new DataFrame shows the sum of sales by week.

We can then create a time series plot using this weekly data:

import matplotlib.pyplot as plt

#plot weekly sales data
plt.plot(weekly_df.index, weekly_df.sales, linewidth=3)

This plot is much easier to read because we only plot sales data for 51 individual weeks as opposed to sales data for 8,545 individual hours in the first example.

Note: In this example, we summarized the sales data by week but we could also summarize by month or quarter if we would like to plot even fewer data points.

The following tutorials explain how to perform other common operations in Python:

x