Table of Contents
Counting duplicates in Pandas is a process of identifying and quantifying the number of duplicate values present in a dataset using the Pandas library in Python. This can be achieved by applying various methods such as the “duplicated()” function, which flags duplicate values as True and unique values as False, and the “value_counts()” function, which returns a count of each unique value in a specified column. These methods provide a convenient and efficient way to analyze and handle duplicate values in a Pandas dataframe.
Count Duplicates in Pandas (With Examples)
You can use the following methods to count duplicates in a pandas DataFrame:
Method 1: Count Duplicate Values in One Column
len(df['my_column'])-len(df['my_column'].drop_duplicates())
Method 2: Count Duplicate Rows
len(df)-len(df.drop_duplicates())Method 3: Count Duplicates for Each Unique Row
df.groupby(df.columns.tolist(), as_index=False).size()The following examples show how to use each method in practice with the following pandas DataFrame:
import pandas as pd #create DataFrame df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'], 'position': ['G', 'G', 'G', 'F', 'G', 'G', 'F', 'F'], 'points': [5, 5, 8, 10, 5, 7, 10, 10]}) #view DataFrame print(df) team position points 0 A G 5 1 A G 5 2 A G 8 3 A F 10 4 B G 5 5 B G 7 6 B F 10 7 B F 10
Example 1: Count Duplicate Values in One Column
The following code shows how to count the number of duplicate values in the points column:
#count duplicate values in points column
len(df['points'])-len(df['points'].drop_duplicates())
4We can see that there are 4 duplicate values in the points column.
Example 2: Count Duplicate Rows
The following code shows how to count the number of duplicate rows in the DataFrame:
#count number of duplicate rows
len(df)-len(df.drop_duplicates())
2We can see that there are 2 duplicate rows in the DataFrame.
#display duplicated rowsdf[df.duplicated()]
team position points
1 A G 5
7 B F 10Example 3: Count Duplicates for Each Unique Row
The following code shows how to count the number of duplicates for each unique row in the DataFrame:
#display number of duplicates for each unique row
df.groupby(df.columns.tolist(), as_index=False).size()
team position points size
0 A F 10 1
1 A G 5 2
2 A G 8 1
3 B F 10 2
4 B G 5 1
5 B G 7 1
The size column displays the number of duplicates for each unique row.
The following tutorials explain how to perform other common operations in pandas:
Cite this article
stats writer (2024). How can I count duplicates in Pandas?. PSYCHOLOGICAL SCALES. Retrieved from https://scales.arabpsychology.com/stats/how-can-i-count-duplicates-in-pandas/
stats writer. "How can I count duplicates in Pandas?." PSYCHOLOGICAL SCALES, 27 Jun. 2024, https://scales.arabpsychology.com/stats/how-can-i-count-duplicates-in-pandas/.
stats writer. "How can I count duplicates in Pandas?." PSYCHOLOGICAL SCALES, 2024. https://scales.arabpsychology.com/stats/how-can-i-count-duplicates-in-pandas/.
stats writer (2024) 'How can I count duplicates in Pandas?', PSYCHOLOGICAL SCALES. Available at: https://scales.arabpsychology.com/stats/how-can-i-count-duplicates-in-pandas/.
[1] stats writer, "How can I count duplicates in Pandas?," PSYCHOLOGICAL SCALES, vol. X, no. Y, ص Z-Z, June, 2024.
stats writer. How can I count duplicates in Pandas?. PSYCHOLOGICAL SCALES. 2024;vol(issue):pages.
