How do I find unique values in Pandas?

In Pandas, you can use the unique() function to find unique values in a DataFrame or Series. This function returns the unique values as a NumPy array, and you can use it to assess the number of unique values in a column or subset of a DataFrame. You can also use the drop_duplicates() function to remove duplicate rows in a DataFrame, allowing you to isolate the unique values in a given column.


You can define the following custom function to find unique values in pandas and ignore NaN values:

def unique_no_nan(x):
    return x.dropna().unique()

This function will return a pandas Series that contains each unique value except for NaN values.

The following examples show how to use this function in different scenarios with the following pandas DataFrame:

import pandas as pd
import numpy as np

#create DataFrame
df = pd.DataFrame({'team': ['Mavs', 'Mavs', 'Mavs', 'Celtics', 'Celtics', 'Celtics'],
                   'points': [95, 95, 100, 113, 100, np.nan]})

#view DataFrame
print(df)

      team  points
0     Mavs    95.0
1     Mavs    95.0
2     Mavs   100.0
3  Celtics   113.0
4  Celtics   100.0
5  Celtics     NaN

Example 1: Find Unique Values in Pandas Column and Ignore NaN Values

Suppose we use the pandas unique() function to display all of the unique values in the points column of the DataFrame:

#display unique values in 'points' column
df['points'].unique()

array([ 95., 100., 113.,  nan])

Notice that the unique() function includes nan in the results by default.

However, suppose we instead use our custom function unique_no_nan() to display the unique values in the points column:

#display unique values in 'points' column and ignore NaN
unique_no_nan(df['points'])

array([ 95., 100., 113.])

Our function returns each unique value in the points column, not including NaN.

Example 2: Find Unique Values in Pandas Groupby and Ignore NaN Values

Suppose we use the pandas groupby() and agg() functions to display all of the unique values in the points column, grouped by the team column:

#display unique values in 'points' column grouped by team
df.groupby('team')['points'].agg(['unique'])

	unique
team	
Celtics	[113.0, 100.0, nan]
Mavs	[95.0, 100.0]

Notice that the unique() function includes nan in the results by default.

However, suppose we instead use our custom function unique_no_nan() to display the unique values in the points column, grouped by the team column:

#display unique values in 'points' column grouped by team and ignore NaN
df.groupby('team')['points'].apply(lambda x: unique_no_nan(x))

team
Celtics    [113.0, 100.0]
Mavs        [95.0, 100.0]
Name: points, dtype: object

Our function returns each unique value in the points column for each team, not including NaN values.

The following tutorials explain how to perform other common functions in pandas:

x