Table of Contents
Pandas provides a built-in function, .merge(), which allows to easily compare columns of two different data frames. This function can be used to join, merge, and concatenate data frames based on one or more keys. The function takes in arguments that specify the left and right data frames, the column to be used for the comparison, and the how argument to specify the kind of join to be performed. The result is a new data frame containing only the rows that match the specified criteria.
You can use the following methods to compare columns in two different pandas DataFrames:
Method 1: Count Matching Values Between Columns
df1['my_column'].isin(df2['my_column']).value_counts()
Method 2: Display Matching Values Between Columns
pd.merge(df1, df2, on=['my_column'], how='inner')
The following examples show how to use each method with the following pandas DataFrames:
import numpy as np import pandas as pd #create first DataFrame df1 = pd.DataFrame({'team': ['Mavs', 'Rockets', 'Spurs', 'Heat', 'Nets'], 'points': [22, 30, 15, 17, 14]}) #view DataFrame print(df1) team points 0 Mavs 22 1 Rockets 30 2 Spurs 15 3 Heat 17 4 Nets 14 #create second DataFrame df2 = pd.DataFrame({'team': ['Mavs', 'Thunder', 'Spurs', 'Nets', 'Cavs'], 'points': [25, 40, 31, 32, 22]}) #view DataFrame print(df2) team points 0 Mavs 25 1 Thunder 40 2 Spurs 31 3 Nets 32 4 Cavs 22
Example 1: Count Matching Values Between Columns
The following code shows how to count the number of matching values between the team columns in each DataFrame:
#count matching values in team columns
df1['team'].isin(df2['team']).value_counts()
True 3
False 2
Name: team, dtype: int64
We can see that the two DataFrames have 3 team names in common and 2 team names that are different.
Example 2: Display Matching Values Between Columns
The following code shows how to display the actual matching values between the team columns in each DataFrame:
#display matching values between team columns
pd.merge(df1, df2, on=['team'], how='inner')
team points_x points_y
0 Mavs 22 25
1 Spurs 15 31
2 Nets 14 32
From the output we can see that the two DataFrames have the following values in common in the team columns:
- Mavs
- Spurs
- Nets
Related: