Pandas: How to coalesce values from multiple columns into one

Pandas provides a method called ‘coalesce’ which allows you to combine values from multiple columns into one. It takes a list of column names as an argument and returns a new column which contains the combined values. This is useful when you want to combine related information from multiple columns into one column for analysis.


You can use the following methods to coalesce the values from multiple columns of a pandas DataFrame into one column:

Method 1: Coalesce Values by Default Column Order

df['coalesce'] = df.bfill(axis=1).iloc[:, 0]

Method 2: Coalesce Values Using Specific Column Order

df['coalesce'] = df[['col3', 'col1', 'col2']].bfill(axis=1).iloc[:, 0]

The following examples show how to use each method in practice with the following pandas DataFrame:

import pandas as pd
import numpy as np

#create DataFrame
df = pd.DataFrame({'points': [np.nan, np.nan, 19, np.nan, 14],
                   'assists': [np.nan, 7, 7, 9, np.nan],
                   'rebounds': [3, 4, np.nan, np.nan, 6]})

#view DataFrame
print(df)

   points  assists  rebounds
0     NaN      NaN       3.0
1     NaN      7.0       4.0
2    19.0      7.0       NaN
3     NaN      9.0       NaN
4    14.0      NaN       6.0

Method 1: Coalesce Values by Default Column Order

The following code shows how to coalesce the values in the points, assists, and rebounds columns into one column, using the first non-null value across the three columns as the coalesced value:

#create new column that contains first non-null value from three existing columns 
df['coalesce'] = df.bfill(axis=1).iloc[:, 0]

#view updated DataFrame
print(df)

   points  assists  rebounds  coalesce
0     NaN      NaN       3.0       3.0
1     NaN      7.0       4.0       7.0
2    19.0      7.0       NaN      19.0
3     NaN      9.0       NaN       9.0
4    14.0      NaN       6.0      14.0

Here’s how the value in the coalesce column was chosen:

  • First row: The first non-null value was 3.0.
  • Second row: The first non-null value was 7.0.
  • Third row: The first non-null value was 19.0.
  • Fourth row: The first non-null value was 9.0.
  • Fifth row: The first non-null value was 14.0.

Method 2: Coalesce Values Using Specific Column Order

The following code shows how to coalesce the values in the three columns by analyzing the columns in the following order: assists, rebounds, points.

#coalesce values in specific column order
df['coalesce'] = df[['assists', 'rebounds', 'points']].bfill(axis=1).iloc[:, 0]

#view updated DataFrame
print(df)

   points  assists  rebounds  coalesce
0     NaN      NaN       3.0       3.0
1     NaN      7.0       4.0       7.0
2    19.0      7.0       NaN       7.0
3     NaN      9.0       NaN       9.0
4    14.0      NaN       6.0       6.0

Here’s the logic that was used to decide which value to place in the coalesce column:

  • If the value in the assists column is non-null then use that value.
  • Otherwise, if the value in the rebounds column is non-null then use that value.
  • Otherwise, if the value in the points column is non-null then use that value.

x