How can I filter a pandas dataframe using a boolean column?

In order to filter a pandas dataframe using a boolean column, you can use the .loc method to select the rows and columns of the dataframe that correspond to the values of the boolean column. This will allow you to select only the values that meet the criteria of the boolean column and create a new dataframe with only those values remaining.


You can use the following methods to filter the rows of a pandas DataFrame based on the values in Boolean columns:

Method 1: Filter DataFrame Based on One Boolean Column

#filter for rows where value in 'my_column' is True
df.loc[df.my_column]

Method 2: Filter DataFrame Based on Multiple Boolean Columns

#filter for rows where value in 'column1' or 'column2' is True
df.loc[df.column1 | df.column2]

The following examples show how to use each method in practice with the following pandas DataFrame:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'B', 'C', 'D', 'E', 'F', 'G'],
                   'points': [18,20, 25, 40, 34, 32, 19],
                   'all_star': [True, False, True, True, True, False, False],
                   'starter': [False, True, True, True, False, False, False]})

#view DataFrame
print(df)

  team  points  all_star  starter
0    A      18      True    False
1    B      20     False     True
2    C      25      True     True
3    D      40      True     True
4    E      34      True    False
5    F      32     False    False
6    G      19     False    False

Example 1: Filter DataFrame Based on One Boolean Column

We can use the following syntax to filter the pandas DataFrame to only contain rows where the value in the all_star column is True:

#filter for rows where 'all_star' is True
df.loc[df.all_star]

	team	points	all_star  starter
0	A	18	True	  False
2	C	25	True	  True
3	D	40	True	  True
4	E	34	True	  False

Notice that the DataFrame has been filtered to only contain rows where the value in the all_star column is True.

If you would instead like to filter for rows where all_star is False, simply type a tilde (~) in front of the column name:

#filter for rows where 'all_star' is False
df.loc[~df.all_star]

        team	points	all_star  starter
1	B	20	False	  True
5	F	32	False	  False
6	G	19	False	  False

Now the DataFrame has been filtered to only contain rows where the value in the all_star column is False.

Example 2: Filter DataFrame Based on Multiple Boolean Columns

We can use the following syntax to filter the pandas DataFrame to only contain rows where the value in the all_star column or the starter column is True:

#filter for rows where 'all_star' or 'starter' is True
df.loc[df.all_star | df.starter]

        team	points	all_star  starter
0	A	18	True	  False
1	B	20	False	  True
2	C	25	True	  True
3	D	40	True	  True
4	E	34	True	  False

If you would like to filter for rows where the value in both the all_star and starter column is True, you can use the & operator instead of the | operator:

#filter for rows where 'all_star' and 'starter' is True
df.loc[df.all_star & df.starter]

	team	points	all_star	starter
2	C	25	True	True
3	D	40	True	True

Now that the DataFrame has been filtered to only contain rows where the value in the all_star and starter column is True.

Related:

x