Compare Two Columns in Pandas (With Examples)?

Pandas DataFrames provide an efficient way to compare two columns by using the method “eq()”. This method compares the values of two columns and returns a Boolean array indicating the results of the comparison. For example, if you wanted to compare two DataFrames and determine the rows where a particular column in one DataFrame is equal to a particular column in the other DataFrame, you could use the code df1[‘col1’].eq(df2[‘col2’]). This would return a Boolean array indicating which rows in the two DataFrames have columns that are equal.


Often you may want to compare two columns in a Pandas DataFrame and write the results of the comparison to a third column.

You can easily do this by using the following syntax:

conditions=[(condition1),(condition2)]
choices=["choice1","choice2"]

df["new_column_name"]=np.select(conditions, choices, default)

Here’s what this code does:

  • conditions are the conditions to check for between the two columns
  • choices are the results to return based on the conditions
  • np.select is used to return the results to the new column

The following example shows how to use this code in practice.

Example: Compare Two Columns in Pandas

Suppose we have the following DataFrame that shows the number of goals scored by two soccer teams in five different matches:

import numpy as np
import pandas as pd

#create DataFrame
df = pd.DataFrame({'A_points': [1, 3, 3, 3, 5],
                   'B_points': [4, 5, 2, 3, 2]})
             
#view DataFrame      
df

          A_points  B_points
0         1         4
1         3         5
2         3         2
3         3         3
4         5         2

We can use the following code to compare the number of goals by row and output the winner of the match in a third column:

#define conditions
conditions = [df['A_points'] > df['B_points'], 
              df['A_points'] < df['B_points']]

#define choices
choices = ['A', 'B']

#create new column in DataFrame that displays results of comparisons
df['winner'] = np.select(conditions, choices, default='Tie')

#view the DataFrame
df

          A_points  B_points  winner
0         1         4         B
1         3         5         B
2         3         2         A
3         3         3         Tie
4         5         2         A

The results of the comparison are shown in the new column called winner.

Notes

Here are a few things to keep in mind when comparing two columns in a pandas DataFrame:

  • The number of conditions and choices should be equal.
  • The default value specifies the value to display in the new column if none of the conditions are met.
  • Both NumPy and Pandas are required to make this code work.

x