Table of Contents
An anti-join in Pandas is a method of joining two DataFrames to exclude information from one based on the information in the other. This is done by using the merge() function with the “indicator=True” argument and the “how=’left_anti’” option. This allows for the exclusion of the information from the right DataFrame based on the information found in the left DataFrame. The result is a DataFrame that contains only the information that isn’t present in the right DataFrame.
An anti-join allows you to return all rows in one dataset that do not have matching values in another dataset.
You can use the following syntax to perform an anti-join between two pandas DataFrames:
outer = df1.merge(df2, how='outer', indicator=True) anti_join = outer[(outer._merge=='left_only')].drop('_merge', axis=1)
The following example shows how to use this syntax in practice.
Example: Perform an Anti-Join in Pandas
Suppose we have the following two pandas DataFrames:
import pandas as pd
#create first DataFrame
df1 = pd.DataFrame({'team': ['A', 'B', 'C', 'D', 'E'],
'points': [18, 22, 19, 14, 30]})
print(df1)
team points
0 A 18
1 B 22
2 C 19
3 D 14
4 E 30
#create second DataFrame
df2 = pd.DataFrame({'team': ['A', 'B', 'C', 'F', 'G'],
'points': [18, 22, 19, 22, 29]})
print(df2)
team points
0 A 18
1 B 22
2 C 19
3 F 22
4 G 29
We can use the following code to return all rows in the first DataFrame that do not have a matching team in the second DataFrame:
#perform outer join outer = df1.merge(df2, how='outer', indicator=True) #perform anti-join anti_join = outer[(outer._merge=='left_only')].drop('_merge', axis=1) #view results print(anti_join) team points 3 D 14 4 E 30
We can see that there are exactly two teams from the first DataFrame that do not have a matching team name in the second DataFrame.
The anti-join worked as expected.
The end result is one DataFrame that only contains the rows where the team name belongs to the first DataFrame but not the second DataFrame.