How can I drop rows in a Pandas dataframe that contain a specific string?

This process involves using the “drop” function in Pandas to remove rows from a dataframe that contain a specific string. This can be achieved by first identifying the rows that contain the string using the “str.contains” function, and then using the “drop” function to remove those rows from the dataframe. This method is useful for data cleaning and filtering out unwanted data from a dataframe.

Pandas: Drop Rows that Contain a Specific String


You can use the following syntax to drop rows that contain a certain string in a pandas DataFrame:

df[df["col"].str.contains("this string")==False]

This tutorial explains several examples of how to use this syntax in practice with the following DataFrame:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'B', 'B', 'C'],
                   'conference': ['East', 'East', 'East', 'West', 'West', 'East'],
                   'points': [11, 8, 10, 6, 6, 5]})

#view DataFrame
df

        team	conference   points
0	A	East         11
1	A	East	     8
2	A	East	     10
3	B	West         6
4	B	West         6
5	C	East         5

Example 1: Drop Rows that Contain a Specific String

The following code shows how to drop all rows in the DataFrame that contain ‘A’ in the team column:

df[df["team"].str.contains("A")==False]

        team	conference  points
3	B	West	    6
4	B	West	    6
5	C	East	    5

Example 2: Drop Rows that Contain a String in a List

The following code shows how to drop all rows in the DataFrame that contain ‘A’ or ‘B’ in the team column:

df[df["team"].str.contains("A|B")==False]

	team	conference   points
5	C	East	     5

Example 3: Drop Rows that Contain a Partial String

In the previous examples, we dropped rows based on rows that exactly matched one or more strings.

However, if we’d like to drop rows that contain a partial string then we can use the following syntax:

#identify partial string to look for
discard = ["Wes"]

#drop rows that contain the partial string "Wes" in the conference column
df[~df.conference.str.contains('|'.join(discard))]

team	conference	points
0	A	East	11
1	A	East	8
2	A	East	10
5	C	East	5

You can find more pandas tutorials on .

x