Table of Contents
In pandas, you can drop rows with NaN values by using the DataFrame.dropna() method. This will drop all rows containing at least one NaN value. This can be further specified by providing the axis, how=’all’ or thresh parameters which will respectively drop all rows containing all NaN values or drop all rows containing a certain threshold of NaN values.
Often you may be interested in dropping rows that contain NaN values in a pandas DataFrame. Fortunately this is easy to do using the pandas function.
This tutorial shows several examples of how to use this function on the following pandas DataFrame:
import numpy as np import scipy.stats as stats #create DataFrame with some NaN values df = pd.DataFrame({'rating': [np.nan, 85, np.nan, 88, 94, 90, 76, 75, 87, 86], 'points': [np.nan, 25, 14, 16, 27, 20, 12, 15, 14, 19], 'assists': [5, 7, 7, np.nan, 5, 7, 6, 9, 9, 5], 'rebounds': [11, 8, 10, 6, 6, 9, 6, 10, 10, 7]}) #view DataFrame df rating points assists rebounds 0 NaN NaN 5.0 11 1 85.0 25.0 7.0 8 2 NaN 14.0 7.0 10 3 88.0 16.0 NaN 6 4 94.0 27.0 5.0 6 5 90.0 20.0 7.0 9 6 76.0 12.0 6.0 6 7 75.0 15.0 9.0 10 8 87.0 14.0 9.0 10 9 86.0 19.0 5.0 7
Example 1: Drop Rows with Any NaN Values
We can use the following syntax to drop all rows that have any NaN values:
df.dropna()
rating points assists rebounds
1 85.0 25.0 7.0 8
4 94.0 27.0 5.0 6
5 90.0 20.0 7.0 9
6 76.0 12.0 6.0 6
7 75.0 15.0 9.0 10
8 87.0 14.0 9.0 10
9 86.0 19.0 5.0 7
Example 2: Drop Rows with All NaN Values
We can use the following syntax to drop all rows that have all NaN values in each column:
df.dropna(how='all') rating points assists rebounds 0 NaN NaN 5.0 11 1 85.0 25.0 7.0 8 2 NaN 14.0 7.0 10 3 88.0 16.0 NaN 6 4 94.0 27.0 5.0 6 5 90.0 20.0 7.0 9 6 76.0 12.0 6.0 6 7 75.0 15.0 9.0 10 8 87.0 14.0 9.0 10 9 86.0 19.0 5.0 7
There were no rows with all NaN values in this particular DataFrame, so none of the rows were dropped.
Example 3: Drop Rows Below a Certain Threshold
We can use the following syntax to drop all rows that don’t have a certain at least a certain number of non-NaN values:
df.dropna(thresh=3) rating points assists rebounds 1 85.0 25.0 7.0 8 2 NaN 14.0 7.0 10 3 88.0 16.0 NaN 6 4 94.0 27.0 5.0 6 5 90.0 20.0 7.0 9 6 76.0 12.0 6.0 6 7 75.0 15.0 9.0 10 8 87.0 14.0 9.0 10 9 86.0 19.0 5.0 7
The very first row in the original DataFrame did not have at least 3 non-NaN values, so it was the only row that got dropped.
Example 4: Drop Row with Nan Values in a Specific Column
We can use the following syntax to drop all rows that have a NaN value in a specific column:
df.dropna(subset=['assists']) rating points assists rebounds 0 NaN NaN 5.0 11 1 85.0 25.0 7.0 8 2 NaN 14.0 7.0 10 4 94.0 27.0 5.0 6 5 90.0 20.0 7.0 9 6 76.0 12.0 6.0 6 7 75.0 15.0 9.0 10 8 87.0 14.0 9.0 10 9 86.0 19.0 5.0 7
Example 5: Reset Index After Dropping Rows with NaNs
We can use the following syntax to reset the index of the DataFrame after dropping the rows with the NaN values:
#drop all rows that have any NaN values df = df.dropna() #reset index of DataFrame df = df.reset_index(drop=True) #view DataFrame df rating points assists rebounds 0 85.0 25.0 7.0 8 1 94.0 27.0 5.0 6 2 90.0 20.0 7.0 9 3 76.0 12.0 6.0 6 4 75.0 15.0 9.0 10 5 87.0 14.0 9.0 10 6 86.0 19.0 5.0 77
You can find the complete documentation for the dropna() function .