How can an anti-join be performed in Pandas?

How can an anti-join be performed in Pandas?

An anti-join in Pandas is a method for combining two datasets while excluding rows that have matching values in a specified column. This can be performed by using the “merge” function in Pandas, with the “how” parameter set to “outer” and the “indicator” parameter set to True. This will create a column indicating which rows are present in the original datasets and which are not. The resulting dataset will only contain non-matching rows from the original datasets. This method is useful for finding differences between datasets or identifying unique values.

Perform an Anti-Join in Pandas


An anti-join allows you to return all rows in one dataset that do not have matching values in another dataset.

You can use the following syntax to perform an anti-join between two pandas DataFrames:

outer = df1.merge(df2, how='outer', indicator=True)

anti_join = outer[(outer._merge=='left_only')].drop('_merge', axis=1)

The following example shows how to use this syntax in practice.

Example: Perform an Anti-Join in Pandas

Suppose we have the following two pandas DataFrames:

import pandas as pd

#create first DataFrame
df1 = pd.DataFrame({'team': ['A', 'B', 'C', 'D', 'E'],
                    'points': [18, 22, 19, 14, 30]})

print(df1)

  team  points
0    A      18
1    B      22
2    C      19
3    D      14
4    E      30

#create second DataFrame
df2 = pd.DataFrame({'team': ['A', 'B', 'C', 'F', 'G'],
                    'points': [18, 22, 19, 22, 29]})

print(df2)

  team  points
0    A      18
1    B      22
2    C      19
3    F      22
4    G      29

We can use the following code to return all rows in the first DataFrame that do not have a matching team in the second DataFrame:

#perform outer join
outer = df1.merge(df2, how='outer', indicator=True)

#perform anti-join
anti_join = outer[(outer._merge=='left_only')].drop('_merge', axis=1)

#view results
print(anti_join)

  team  points
3    D      14
4    E      30

We can see that there are exactly two teams from the first DataFrame that do not have a matching team name in the second DataFrame.

The anti-join worked as expected.

The end result is one DataFrame that only contains the rows where the team name belongs to the first DataFrame but not the second DataFrame.

The following tutorials explain how to perform other common tasks in pandas:

Cite this article

stats writer (2024). How can an anti-join be performed in Pandas?. PSYCHOLOGICAL SCALES. Retrieved from https://scales.arabpsychology.com/stats/how-can-an-anti-join-be-performed-in-pandas/

stats writer. "How can an anti-join be performed in Pandas?." PSYCHOLOGICAL SCALES, 27 Jun. 2024, https://scales.arabpsychology.com/stats/how-can-an-anti-join-be-performed-in-pandas/.

stats writer. "How can an anti-join be performed in Pandas?." PSYCHOLOGICAL SCALES, 2024. https://scales.arabpsychology.com/stats/how-can-an-anti-join-be-performed-in-pandas/.

stats writer (2024) 'How can an anti-join be performed in Pandas?', PSYCHOLOGICAL SCALES. Available at: https://scales.arabpsychology.com/stats/how-can-an-anti-join-be-performed-in-pandas/.

[1] stats writer, "How can an anti-join be performed in Pandas?," PSYCHOLOGICAL SCALES, vol. X, no. Y, ص Z-Z, June, 2024.

stats writer. How can an anti-join be performed in Pandas?. PSYCHOLOGICAL SCALES. 2024;vol(issue):pages.

Download Post (.PDF)
Slide Up
x
PDF
Scroll to Top