Pandas: Use ffill Based on Condition?

Pandas ffill based on condition is a method used to fill missing values in a dataframe. This method uses a condition to decide which values should be filled with the values from the preceding row or column. This is useful when dealing with datasets with missing values, as it helps create a more complete dataset. It is also useful for creating smoother data sets, as it reduces the impact of outliers.


You can use the following basic syntax to use the ffill() function in pandas to forward fill values based on a condition in another column:

df['sales'] = df.groupby('store')['sales'].ffill()

This particular example will forward fill values in the sales column only if the previous value in the store column is equal to the current value in the store column.

The following example shows how to use this syntax in practice.

Example: Use ffill Based on Condition in Pandas

Suppose we have the following pandas DataFrame that contains information about the total sales made by two different retail stores during four business quarters:

import pandas as pd
import numpy as np

#create DataFrame
df = pd.DataFrame({'store': ['A', 'A', 'B', 'A', 'B', 'A', 'B', 'B'],
                   'quarter': [1, 2, 1, 3, 2, 4, 3, 4],
                   'sales': [12, 22, 30, np.nan, 24, np.nan, np.nan, np.nan]})

#view DataFrame
print(df)

  store  quarter  sales
0     A        1   12.0
1     A        2   22.0
2     B        1   30.0
3     A        3    NaN
4     B        2   24.0
5     A        4    NaN
6     B        3    NaN
7     B        4    NaN

Notice that there are multiple NaN values in the sales column.

Suppose we would like to fill in these NaN values using the previous value in the sales column but we want to make sure that values correspond to the correct store.

We can use the following syntax to do so:

#group by store and forward fill values in sales column
df['sales'] = df.groupby('store')['sales'].ffill()

#view updated DataFrame
print(df)

  store  quarter  sales
0     A        1   12.0
1     A        2   22.0
2     B        1   30.0
3     A        3   22.0
4     B        2   24.0
5     A        4   22.0
6     B        3   24.0
7     B        4   24.0

Notice that the NaN values in the sales column have been replaced by the previous sales value and that the values correspond to the correct store.

For example:

  • The NaN value in row index position 3 has been replaced by the value 22, which was the most recent value in the sales column that corresponded to store A.
  • The NaN value in row index position 6 has been replaced by the value 24, which was the most recent value in the sales column that corresponded to store B.

And so on.

Note: You can find the complete documentation for the pandas ffill() function .

x