How can the “NOT IN” filter be used in Pandas, and what are some examples of its implementation?

How can the “NOT IN” filter be used in Pandas, and what are some examples of its implementation?

The “NOT IN” filter in Pandas is a logical operator used to filter out specific values from a dataset. It is commonly used in combination with the “IN” filter to create more complex filtering conditions. The “NOT IN” filter returns rows that do not match the specified values, while the “IN” filter returns rows that match the specified values.

For example, if we have a dataset of student grades and we want to retrieve all the grades except for A and B, we can use the “NOT IN” filter to exclude those grades. This would be done by specifying the values “A” and “B” in the filter, and the resulting dataset would only contain grades other than A and B.

Another example is using the “NOT IN” filter to filter out data from multiple categories. For instance, if we have a dataset of sales by product category and we want to retrieve all the sales except for those from the categories “Electronics” and “Toys”, we can use the “NOT IN” filter to exclude those categories and get the desired result.

In summary, the “NOT IN” filter is a useful tool in Pandas for creating more specific and refined filtering conditions. It allows for the exclusion of certain values or categories from a dataset, providing more flexibility in data analysis and manipulation.

Use “NOT IN” Filter in Pandas (With Examples)


You can use the following syntax to perform a “NOT IN” filter in a pandas DataFrame:

df[~df['col_name'].isin(values_list)]

Note that the values in values_list can be either numeric values or character values.

The following examples show how to use this syntax in practice.

Example 1: Perform “NOT IN” Filter with One Column

The following code shows how to filter a pandas DataFrame for rows where a team name is not in a list of names:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'B', 'B', 'B', 'B', 'C', 'C'],
                   'points': [25, 12, 15, 14, 19, 23, 25, 29],
                   'assists': [5, 7, 7, 9, 12, 9, 9, 4],
                   'rebounds': [11, 8, 10, 6, 6, 5, 9, 12]})

#define list of teams we don't want
values_list = ['A', 'B']
#filter for rows where team name is not in list
df[~df['team'].isin(values_list)]

        team	points	assists	rebounds
6	C	25	9	9
7	C	29	4	12

And the following code shows how to filter a pandas DataFrame for rows where the ‘points’ column does not contain certain values:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'B', 'B', 'B', 'B', 'C', 'C'],
                   'points': [25, 12, 15, 14, 19, 23, 25, 29],
                   'assists': [5, 7, 7, 9, 12, 9, 9, 4],
                   'rebounds': [11, 8, 10, 6, 6, 5, 9, 12]})

#define list of values we don't want
values_list = [12, 15, 25]

#filter for rows where team name is not in list
df[~df['team'].isin(values_list)]

	team	points	assists	rebounds
3	B	14	9	6
4	B	19	12	6
5	B	23	9	5
7	C	29	4	12

Example 2: Perform “NOT IN” Filter with Multiple Columns

The following code shows how to filter a pandas DataFrame for rows where certain team names are not in one of several columns:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'star_team': ['A', 'A', 'B', 'B', 'B', 'B', 'C', 'C'],
                   'backup_team': ['B', 'B', 'C', 'C', 'D', 'D', 'D', 'E'],
                   'points': [25, 12, 15, 14, 19, 23, 25, 29],
                   'assists': [5, 7, 7, 9, 12, 9, 9, 4],
                   'rebounds': [11, 8, 10, 6, 6, 5, 9, 12]})

#define list of teams we don't want
values_list = ['C', 'E']

#filter for rows where team name is not in one of several columns
df[~df[['star_team', 'backup_team']].isin(values_list).any(axis=1)] 

        star_team backup_team  points	assists	rebounds
0	A	  B	       25	5	11
1	A	  B	       12	7	8
4	B	  D	       19	12	6
5	B	  D	       23	9	5

Notice that we filtered out every row where teams ‘C’ or ‘E’ appeared in either the ‘star_team’ column or the ‘backup_team’ column.

The following tutorials explain how to perform other common filtering operations in pandas:

How to Use “Is Not Null” in Pandas

Cite this article

stats writer (2024). How can the “NOT IN” filter be used in Pandas, and what are some examples of its implementation?. PSYCHOLOGICAL SCALES. Retrieved from https://scales.arabpsychology.com/stats/how-can-the-not-in-filter-be-used-in-pandas-and-what-are-some-examples-of-its-implementation/

stats writer. "How can the “NOT IN” filter be used in Pandas, and what are some examples of its implementation?." PSYCHOLOGICAL SCALES, 12 May. 2024, https://scales.arabpsychology.com/stats/how-can-the-not-in-filter-be-used-in-pandas-and-what-are-some-examples-of-its-implementation/.

stats writer. "How can the “NOT IN” filter be used in Pandas, and what are some examples of its implementation?." PSYCHOLOGICAL SCALES, 2024. https://scales.arabpsychology.com/stats/how-can-the-not-in-filter-be-used-in-pandas-and-what-are-some-examples-of-its-implementation/.

stats writer (2024) 'How can the “NOT IN” filter be used in Pandas, and what are some examples of its implementation?', PSYCHOLOGICAL SCALES. Available at: https://scales.arabpsychology.com/stats/how-can-the-not-in-filter-be-used-in-pandas-and-what-are-some-examples-of-its-implementation/.

[1] stats writer, "How can the “NOT IN” filter be used in Pandas, and what are some examples of its implementation?," PSYCHOLOGICAL SCALES, vol. X, no. Y, ص Z-Z, May, 2024.

stats writer. How can the “NOT IN” filter be used in Pandas, and what are some examples of its implementation?. PSYCHOLOGICAL SCALES. 2024;vol(issue):pages.

Download Post (.PDF)
Slide Up
x
PDF
Scroll to Top