How do you perform an Anti-Join in Pandas?

An anti-join in Pandas is a method of joining two DataFrames to exclude information from one based on the information in the other. This is done by using the merge() function with the “indicator=True” argument and the “how=’left_anti’” option. This allows for the exclusion of the information from the right DataFrame based on the information found in the left DataFrame. The result is a DataFrame that contains only the information that isn’t present in the right DataFrame.


An anti-join allows you to return all rows in one dataset that do not have matching values in another dataset.

You can use the following syntax to perform an anti-join between two pandas DataFrames:

outer = df1.merge(df2, how='outer', indicator=True)

anti_join = outer[(outer._merge=='left_only')].drop('_merge', axis=1)

The following example shows how to use this syntax in practice.

Example: Perform an Anti-Join in Pandas

Suppose we have the following two pandas DataFrames:

import pandas as pd

#create first DataFrame
df1 = pd.DataFrame({'team': ['A', 'B', 'C', 'D', 'E'],
                    'points': [18, 22, 19, 14, 30]})

print(df1)

  team  points
0    A      18
1    B      22
2    C      19
3    D      14
4    E      30

#create second DataFrame
df2 = pd.DataFrame({'team': ['A', 'B', 'C', 'F', 'G'],
                    'points': [18, 22, 19, 22, 29]})

print(df2)

  team  points
0    A      18
1    B      22
2    C      19
3    F      22
4    G      29

We can use the following code to return all rows in the first DataFrame that do not have a matching team in the second DataFrame:

#perform outer join
outer = df1.merge(df2, how='outer', indicator=True)

#perform anti-join
anti_join = outer[(outer._merge=='left_only')].drop('_merge', axis=1)

#view results
print(anti_join)

  team  points
3    D      14
4    E      30

We can see that there are exactly two teams from the first DataFrame that do not have a matching team name in the second DataFrame.

The anti-join worked as expected.

The end result is one DataFrame that only contains the rows where the team name belongs to the first DataFrame but not the second DataFrame.

x