Table of Contents

In order to find duplicates in a Pandas DataFrame, you can use the duplicated() method. This method will return a Boolean series indicating whether each row is a duplicate or not. You can pass in additional parameters such as subset to check only a specific subset of columns for duplicates, or keep to indicate how to handle the duplicate rows. You can also use the drop_duplicates() method to drop the duplicate rows altogether.

You can use the function to find duplicate values in a pandas DataFrame.

This function uses the following basic syntax:

#find duplicate rows across all columns
duplicateRows = df[df.duplicated()]

#find duplicate rows across specific columns
duplicateRows = df[df.duplicated(['col1', 'col2'])]

The following examples show how to use this function in practice with the following pandas DataFrame:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
                   'points': [10, 10, 12, 12, 15, 17, 20, 20],
                   'assists': [5, 5, 7, 9, 12, 9, 6, 6]})

#view DataFrame
print(df)

  team  points  assists
0    A      10        5
1    A      10        5
2    A      12        7
3    A      12        9
4    B      15       12
5    B      17        9
6    B      20        6
7    B      20        6

Example 1: Find Duplicate Rows Across All Columns

The following code shows how to find duplicate rows across all of the columns of the DataFrame:

#identify duplicate rows
duplicateRows = df[df.duplicated()]

#view duplicate rows
duplicateRows

        team	points	assists
1	A	10	5
7	B	20	6

There are two rows that are exact duplicates of other rows in the DataFrame.

Note that we can also use the argument keep=’last’ to display the first duplicate rows instead of the last:

#identify duplicate rows
duplicateRows = df[df.duplicated(keep='last')]

#view duplicate rows
print(duplicateRows)

	team	points	assists
0	A	10	5
6	B	20	6

Example 2: Find Duplicate Rows Across Specific Columns

The following code shows how to find duplicate rows across just the ‘team’ and ‘points’ columns of the DataFrame:

#identify duplicate rows across 'team' and 'points' columns
duplicateRows = df[df.duplicated(['team', 'points'])]

#view duplicate rows
print(duplicateRows)

        team	points	assists
1	A	10	5
3	A	12	9
7	B	20	6

There are three rows where the values for the ‘team’ and ‘points’ columns are exact duplicates of previous rows.

Example 3: Find Duplicate Rows in One Column

The following code shows how to find duplicate rows in just the ‘team’ column of the DataFrame:

#identify duplicate rows in 'team' column
duplicateRows = df[df.duplicated(['team'])]

#view duplicate rows
print(duplicateRows)

	team	points	assists
1	A	10	5
2	A	12	7
3	A	12	9
5	B	17	9
6	B	20	6
7	B	20	6

There are six total rows where the values in the ‘team’ column are exact duplicates of previous rows.

How to Find Duplicates in Pandas DataFrame (With Examples)

Example 1: Find Duplicate Rows Across All Columns

Example 2: Find Duplicate Rows Across Specific Columns

Example 3: Find Duplicate Rows in One Column

Requst a

Scale

Example 1: Find Duplicate Rows Across All Columns

Example 2: Find Duplicate Rows Across Specific Columns

Example 3: Find Duplicate Rows in One Column

Related terms:

Requst a

Scale