Table of Contents
To search for a string in all columns of a Pandas DataFrame, one can use the DataFrame.eq() function to check if the string is present in each column’s values. This function will return a boolean DataFrame with True values where the string is found in the column. The boolean DataFrame can be used to subset the original DataFrame to the desired rows.
You can use the following syntax to search for a particular string in each column of a pandas DataFrame and filter for rows that contain the string in at least one column:
#define filter mask = np.column_stack([df[col].str.contains(r"my_string", na=False) for col in df]) #filter for rows where any column contains 'my_string' df.loc[mask.any(axis=1)]
The following example shows how to use this syntax in practice.
Example: Search for String in All Columns of Pandas DataFrame
Suppose we have the following pandas DataFrame that contains information about the first role and second role of various basketball players on a team:
import pandas as pd #create DataFrame df = pd.DataFrame({'player': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'], 'first_role': ['P Guard', 'P Guard', 'S Guard', 'S Forward', 'P Forward', 'Center', 'Center', 'Center'], 'second_role': ['S Guard', 'S Guard', 'Forward', 'S Guard', 'S Guard', 'S Forward', 'P Forward', 'P Forward']}) #view DataFrame print(df) player first_role second_role 0 A P Guard S Guard 1 B P Guard S Guard 2 C S Guard Forward 3 D S Forward S Guard 4 E P Forward S Guard 5 F Center S Forward 6 G Center P Forward 7 H Center P Forward
The following code shows how to filter the pandas DataFrame for rows where the string “Guard” occurs in any column:
import numpy as np
#define filter
mask = np.column_stack([df[col].str.contains(r"Guard", na=False) for col in df])
#filter for rows where any column contains 'Guard'
df.loc[mask.any(axis=1)]
player first_role second_role
0 A P Guard S Guard
1 B P Guard S Guard
2 C S Guard Forward
3 D S Forward S Guard
4 E P Forward S Guard
Notice that each row in the resulting DataFrame contains the string “Guard” in at least one column.
You could also filter for rows where one of several strings occurs in at least one column by using the “OR” ( | ) operator in pandas.
For example, the following code shows how to filter for rows where either “P Guard” or “Center” occurs in at least one column:
import numpy as np
#define filter
mask = np.column_stack([df[col].str.contains(r"P Guard|Center", na=False) for col in df])
#filter for rows where any column contains 'P Guard' or 'Center'
df.loc[mask.any(axis=1)]
player first_role second_role
0 A P Guard S Guard
1 B P Guard S Guard
5 F Center S Forward
6 G Center P Forward
7 H Center P Forward
Notice that each row in the resulting DataFrame contains “P Guard” or “Center” in at least one column.
Note: It’s important to include the argument na=False within the contains() function or else you will encounter if NaN values are present in the DataFrame.
The following tutorials explain how to perform other common filtering operations in pandas: