How can I compare strings between two columns in Pandas?

Pandas offers various methods for comparing strings between two columns. For example, you can use the str.contains() method to check if a string is present in another string. You can also use the str.match() method to check for exact matches between strings in two columns. Other methods such as str.startswith() and str.endswith() can also be used to compare strings between two columns.


You can use the following basic syntax to compare strings between two columns in a pandas DataFrame:

df['col1'].str.strip().str.lower() == df['col2'].str.strip().str.lower()

The str.strip() function strips the whitespace from each string and the str.lower() function converts each string to lowercase before performing the comparison.

The following example shows how to use this syntax in practice.

Example: Compare Strings Between Two Columns in Pandas

Suppose we have the following pandas DataFrame that contains two columns with basketball team names:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team1': ['Mavs', 'Hawks', 'Nets', 'Hornets', 'Lakers'],
                   'team2': [' Mavs ', 'Jazz', 'Nets', 'Hornets ', 'LAKERS']})

#view DataFrame
print(df)

     team1     team2
0     Mavs     Mavs 
1    Hawks      Jazz
2     Nets      Nets
3  Hornets  Hornets 
4   Lakers    LAKERS

Notice that some team names contain whitespaces in random places and some team names are capitalized.

Suppose that we would like to compare the strings in each row to see if the team names are equal in each row.

If we only use the == symbol when comparing the strings, pandas will only return True if they are the same case and contain the whitespaces in the exact same positions:

#create new column that tests if strings in team columns are equal
df['equal'] = df['team1'] == df['team2']

#view updated DataFrame
print(df)

     team1     team2  equal
0     Mavs     Mavs   False
1    Hawks      Jazz  False
2     Nets      Nets   True
3  Hornets  Hornets   False
4   Lakers    LAKERS  False

Notice that only one row returns True, since this is the only row in which the strings contain the same case and the same whitespaces positions.

However, we can use the str.strip() function to strip the whitespace from each string and the str.lower() function to convert each string to lowercase before performing the comparison:

#remove whitespace and convert each string to lowercase, then compare strings
df['equal'] = df['team1'].str.strip().str.lower()==df['team2'].str.strip().str.lower()

#view updated DataFrame
print(df)

     team1     team2  equal
0     Mavs     Mavs    True
1    Hawks      Jazz  False
2     Nets      Nets   True
3  Hornets  Hornets    True
4   Lakers    LAKERS   True

Now each row returns True except for the row where the team names are “Hawks” and “Jazz” because even after stripping the whitespace and converting the strings to lowercase, these strings are not equal.

 

x