Table of Contents
Pandas’ dropna() function can be used to drop rows containing NaN values from a DataFrame. By passing a list of column names to the subset parameter, the dropna() function can be used to drop rows containing NaN values in only certain columns. This is useful for selectively removing rows containing missing data from specific columns while leaving the other columns intact.
You can use the dropna() function with the subset argument to drop rows from a pandas DataFrame which contain missing values in specific columns.
Here are the most common ways to use this function in practice:
Method 1: Drop Rows with Missing Values in One Specific Column
df.dropna(subset = ['column1'], inplace=True)
Method 2: Drop Rows with Missing Values in One of Several Specific Columns
df.dropna(subset = ['column1', 'column2', 'column3'], inplace=True)
The following examples show how to use each method in practice with the following pandas DataFrame:
import pandas as pd import numpy as np #create DataFrame df = pd.DataFrame({'team': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'], 'points': [18, np.nan, 19, 14, 14, 11, 20, 28], 'assists': [5, np.nan, np.nan, 9, 12, 9, 9, 4], 'rebounds': [11, 8, 10, 6, 6, 5, 9, np.nan]}) #view DataFrame print(df) team points assists rebounds 0 A 18.0 5.0 11.0 1 B NaN NaN 8.0 2 C 19.0 NaN 10.0 3 D 14.0 9.0 6.0 4 E 14.0 12.0 6.0 5 F 11.0 9.0 5.0 6 G 20.0 9.0 9.0 7 H 28.0 4.0 NaN
Example 1: Drop Rows with Missing Values in One Specific Column
We can use the following syntax to drop rows with missing values in the ‘assists’ column:
#drop rows with missing values in 'assists' column df.dropna(subset = ['assists'], inplace=True) #view updated DataFrame print(df) team points assists rebounds 0 A 18.0 5.0 11.0 3 D 14.0 9.0 6.0 4 E 14.0 12.0 6.0 5 F 11.0 9.0 5.0 6 G 20.0 9.0 9.0 7 H 28.0 4.0 NaN
Notice that the two rows with missing values in the ‘assists’ column have both been removed from the DataFrame.
Also note that the last row in the DataFrame is kept even though it has a missing value because the missing value is not located in the ‘assists’ column.
Example 2: Drop Rows with Missing Values in One of Several Specific Columns
We can use the following syntax to drop rows with missing values in the ‘points’ or ‘rebounds’ columns:
#drop rows with missing values in 'points' or 'rebounds' column df.dropna(subset = ['points', 'rebounds'], inplace=True) #view updated DataFrame print(df) team points assists rebounds 0 A 18.0 5.0 11.0 2 C 19.0 NaN 10.0 3 D 14.0 9.0 6.0 4 E 14.0 12.0 6.0 5 F 11.0 9.0 5.0 6 G 20.0 9.0 9.0
Notice that the two rows with missing values in the ‘points’ or ‘rebounds’ columns have been removed from the DataFrame.