Table of Contents
Pandas provides a convenient way to filter rows that contain a specific string using the str.contains() method. This method takes a string as input and returns a boolean series indicating whether the string is present in each row. This makes it easy to filter a DataFrame by a given substring, as each row in the DataFrame is examined for the presence of the string in order to determine if it should be included or excluded from the resulting filtered DataFrame.
You can use the following syntax to filter for rows that contain a certain string in a pandas DataFrame:
df[df["col"].str.contains("this string")]
This tutorial explains several examples of how to use this syntax in practice with the following DataFrame:
import pandas as pd #create DataFrame df = pd.DataFrame({'team': ['A', 'A', 'A', 'B', 'B', 'C'], 'conference': ['East', 'East', 'East', 'West', 'West', 'East'], 'points': [11, 8, 10, 6, 6, 5]}) #view DataFrame df team conference points 0 A East 11 1 A East 8 2 A East 10 3 B West 6 4 B West 6 5 C East 5
Example 1: Filter Rows that Contain a Specific String
The following code shows how to filter for rows in the DataFrame that contain ‘A’ in the team column:
df[df["team"].str.contains("A")] team conference points 0 A East 11 1 A East 8 2 A East 10
Only the rows where the team column contains ‘A’ are kept.
Example 2: Filter Rows that Contain a String in a List
The following code shows how to filter for rows in the DataFrame that contain ‘A’ or ‘B’ in the team column:
df[df["team"].str.contains("A|B")] team conference points 0 A East 11 1 A East 8 2 A East 10 3 B West 6 4 B West 6
Only the rows where the team column contains ‘A’ or ‘B’ are kept.
Example 3: Filter Rows that Contain a Partial String
In the previous examples, we filtered based on rows that exactly matched one or more strings.
However, if we’d like to filter for rows that contain a partial string then we can use the following syntax:
#identify partial string to look for keep= ["Wes"] #filter for rows that contain the partial string "Wes" in the conference column df[df.conference.str.contains('|'.join(keep))] team conference points 3 B West 6 4 B West 6
Only the rows where the conference column contains “Wes” are kept.