Table of Contents
The rolling median in Pandas is a statistical measure used to calculate the median of the previous n number of data points in a series. It is used to smooth out short term fluctuations and highlight longer term trends in the data. It is calculated with the rolling() method, which takes an argument for the window size. This is the number of data points to consider for calculating the median.
A rolling median is the median of a certain number of previous periods in a time series.
To calculate the rolling median for a column in a pandas DataFrame, we can use the following syntax:
#calculate rolling median of previous 3 periods df['column_name'].rolling(3).median()
The following example shows how to use this function in practice.
Example: Calculate Rolling Median of Column
Suppose we have the following pandas DataFrame:
import pandas as pd #create DataFrame df = pd.DataFrame({'month': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], 'leads': [13, 15, 16, 15, 17, 20, 22, 24, 25, 26, 23, 24], 'sales': [22, 24, 23, 27, 26, 26, 27, 30, 33, 32, 27, 25]}) #view DataFrame df month leads sales 0 1 13 22 1 2 15 24 2 3 16 23 3 4 15 27 4 5 17 26 5 6 20 26 6 7 22 27 7 8 24 30 8 9 25 33 9 10 26 32 10 11 23 27 11 12 24 25
We can use the following syntax to create a new column that contains the rolling median of ‘sales’ for the previous 3 periods:
#calculate 3-month rolling median df['sales_rolling3'] = df['sales'].rolling(3).median() #view updated data frame df month leads sales sales_rolling3 0 1 13 22 NaN 1 2 15 24 NaN 2 3 16 23 23.0 3 4 15 27 24.0 4 5 17 26 26.0 5 6 20 26 26.0 6 7 22 27 26.0 7 8 24 30 27.0 8 9 25 33 30.0 9 10 26 32 32.0 10 11 23 27 32.0 11 12 24 25 27.0
We can manually verify that the rolling median sales displayed for month 3 is the median of the previous 3 months:
- Median of 22, 24, 23 = 23.0
Similarly, we can verify the rolling median sales of month 4:
- Median of 24, 23, 27 = 24.0
We can use similar syntax to calculate the rolling 6-month median:
#calculate 6-month rolling median df['sales_rolling6'] = df['sales'].rolling(6).median() #view updated data frame df month leads sales sales_rolling3 sales_rolling6 0 1 13 22 NaN NaN 1 2 15 24 NaN NaN 2 3 16 23 23.0 NaN 3 4 15 27 24.0 NaN 4 5 17 26 26.0 NaN 5 6 20 26 26.0 25.0 6 7 22 27 26.0 26.0 7 8 24 30 27.0 26.5 8 9 25 33 30.0 27.0 9 10 26 32 32.0 28.5 10 11 23 27 32.0 28.5 11 12 24 25 27.0 28.5