How to compare columns in two different data frames in Pandas?

Pandas provides a built-in function, .merge(), which allows to easily compare columns of two different data frames. This function can be used to join, merge, and concatenate data frames based on one or more keys. The function takes in arguments that specify the left and right data frames, the column to be used for the comparison, and the how argument to specify the kind of join to be performed. The result is a new data frame containing only the rows that match the specified criteria.


You can use the following methods to compare columns in two different pandas DataFrames:

Method 1: Count Matching Values Between Columns

df1['my_column'].isin(df2['my_column']).value_counts()

Method 2: Display Matching Values Between Columns

pd.merge(df1, df2, on=['my_column'], how='inner')

The following examples show how to use each method with the following pandas DataFrames:

import numpy as np
import pandas as pd

#create first DataFrame
df1 = pd.DataFrame({'team': ['Mavs', 'Rockets', 'Spurs', 'Heat', 'Nets'],
                    'points': [22, 30, 15, 17, 14]})

#view DataFrame
print(df1)

      team  points
0     Mavs      22
1  Rockets      30
2    Spurs      15
3     Heat      17
4     Nets      14

#create second DataFrame
df2 = pd.DataFrame({'team': ['Mavs', 'Thunder', 'Spurs', 'Nets', 'Cavs'],
                    'points': [25, 40, 31, 32, 22]})

#view DataFrame
print(df2)

      team  points
0     Mavs      25
1  Thunder      40
2    Spurs      31
3     Nets      32
4     Cavs      22

Example 1: Count Matching Values Between Columns

The following code shows how to count the number of matching values between the team columns in each DataFrame:

#count matching values in team columns
df1['team'].isin(df2['team']).value_counts()

True     3
False    2
Name: team, dtype: int64

We can see that the two DataFrames have 3 team names in common and 2 team names that are different.

Example 2: Display Matching Values Between Columns

The following code shows how to display the actual matching values between the team columns in each DataFrame:

#display matching values between team columns
pd.merge(df1, df2, on=['team'], how='inner')

	team	points_x  points_y
0	Mavs	22	  25
1	Spurs	15	  31
2	Nets	14	  32

From the output we can see that the two DataFrames have the following values in common in the team columns:

  • Mavs
  • Spurs
  • Nets

Related:

 

x