How can I select only unique rows in a Pandas DataFrame?

How can I select only unique rows in a Pandas DataFrame?

Pandas DataFrame is a popular data manipulation tool used in Python. It allows users to store and manipulate tabular data efficiently. One common problem that arises while working with data is the presence of duplicate rows. To address this issue, Pandas provides a function called “drop_duplicates()” that allows users to easily select only unique rows in a DataFrame. This function identifies and removes duplicate rows, leaving behind a DataFrame with only unique rows. This ensures that the data being analyzed is accurate and avoids any bias caused by duplicate entries. The “drop_duplicates()” function in Pandas makes it easy to handle duplicate rows and is a useful tool for data cleaning and analysis.

Select Unique Rows in a Pandas DataFrame


You can use the following syntax to select unique rows in a pandas DataFrame:

df = df.drop_duplicates()

And you can use the following syntax to select unique rows across specific columns in a pandas DataFrame:

df = df.drop_duplicates(subset=['col1', 'col2', ...])

The following examples show how to use this syntax in practice with the following pandas DataFrame:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'a': [4, 4, 3, 8],
                   'b': [2, 2, 6, 8],
                   'c': [2, 2, 9, 9]})

#view DataFrame
df

	a	b	c
0	4	2	2
1	4	2	2
2	3	6	9
3	8	8	9

Example 1: Select Unique Rows Across All Columns

The following code shows how to select unique rows across all columns of the pandas DataFrame:

#drop duplicates from DataFrame
df = df.drop_duplicates()

#view DataFrame
df

	a	b	c
0	4	2	2
2	3	6	9
3	8	8	9

The first and second row were duplicates, so pandas dropped the second row.

By default, the drop_duplicates() function will keep the first duplicate. However, you can specify to keep the last duplicate instead:

#drop duplicates from DataFrame, keep last duplicate
df = df.drop_duplicates(keep='last')

#view DataFrame
df

	a	b	c
1	4	2	2
2	3	6	9
3	8	8	9

Example 2: Select Unique Rows Across Specific Columns

The following code shows how to select unique rows across just column ‘c’ in the DataFrame:

#drop duplicates from column 'c' in DataFrame
df = df.drop_duplicates(subset=['c'])

#view DataFrame
df
	a	b	c
0	4	2	2
2	3	6	9

Two rows were dropped from the DataFrame.

Cite this article

stats writer (2024). How can I select only unique rows in a Pandas DataFrame?. PSYCHOLOGICAL SCALES. Retrieved from https://scales.arabpsychology.com/stats/how-can-i-select-only-unique-rows-in-a-pandas-dataframe/

stats writer. "How can I select only unique rows in a Pandas DataFrame?." PSYCHOLOGICAL SCALES, 1 May. 2024, https://scales.arabpsychology.com/stats/how-can-i-select-only-unique-rows-in-a-pandas-dataframe/.

stats writer. "How can I select only unique rows in a Pandas DataFrame?." PSYCHOLOGICAL SCALES, 2024. https://scales.arabpsychology.com/stats/how-can-i-select-only-unique-rows-in-a-pandas-dataframe/.

stats writer (2024) 'How can I select only unique rows in a Pandas DataFrame?', PSYCHOLOGICAL SCALES. Available at: https://scales.arabpsychology.com/stats/how-can-i-select-only-unique-rows-in-a-pandas-dataframe/.

[1] stats writer, "How can I select only unique rows in a Pandas DataFrame?," PSYCHOLOGICAL SCALES, vol. X, no. Y, ص Z-Z, May, 2024.

stats writer. How can I select only unique rows in a Pandas DataFrame?. PSYCHOLOGICAL SCALES. 2024;vol(issue):pages.

Download Post (.PDF)
Slide Up
x
PDF
Scroll to Top