Table of Contents
In pandas, you can use the “loc” method to drop a column by its name if it contains a specific string. You can do this by simply providing the column name (or column index) as the first argument, and the string that you want to search for as the second argument. The “loc” method will then return a dataframe that excludes the column(s) whose name contains the specified string. You can also use the “drop” function to achieve the same thing.
You can use the following methods to drop columns from a pandas DataFrame whose name contains specific strings:
Method 1: Drop Columns if Name Contains Specific String
df.drop(list(df.filter(regex='this_string')), axis=1, inplace=True)
Method 2: Drop Columns if Name Contains One of Several Specific Strings
df.drop(list(df.filter(regex='string1|string2|string3')), axis=1, inplace=True)
The following examples show how to use each method in practice with the following pandas DataFrame:
import pandas as pd #create DataFrame df = pd.DataFrame({'team_name': ['A', 'B', 'C', 'D', 'E', 'F'], 'team_location': ['AU', 'AU', 'EU', 'EU', 'AU', 'EU'], 'player_name': ['Andy', 'Bob', 'Chad', 'Dan', 'Ed', 'Fran'], 'points': [22, 29, 35, 30, 18, 12]}) #view DataFrame print(df) team_name team_location player_name points 0 A AU Andy 22 1 B AU Bob 29 2 C EU Chad 35 3 D EU Dan 30 4 E AU Ed 18 5 F EU Fran 12
Example 1: Drop Columns if Name Contains Specific String
We can use the following syntax to drop all columns in the DataFrame that contain ‘team’ anywhere in the column name:
#drop columns whose name contains 'team' df.drop(list(df.filter(regex='team')), axis=1, inplace=True) #view updated DataFrame print(df) player_name points 0 Andy 22 1 Bob 29 2 Chad 35 3 Dan 30 4 Ed 18 5 Fran 12
Notice that both columns that contained ‘team’ in the name have been dropped from the DataFrame.
Example 2: Drop Columns if Name Contains One of Several Specific Strings
We can use the following syntax to drop all columns in the DataFrame that contain ‘player’ or ‘points’ anywhere in the column name:
#drop columns whose name contains 'player' or 'points' df.drop(list(df.filter(regex='player|points')), axis=1, inplace=True) #view updated DataFrame print(df) team_name team_location 0 A AU 1 B AU 2 C EU 3 D EU 4 E AU 5 F EU
Notice that both columns that contained either ‘player’ or ‘points’ in the name have been dropped from the DataFrame.
Note: The | symbol in pandas is used as an “OR” operator.