Table of Contents
In Pandas, NaN values can be filled with the median value of a column by using the fillna() function and passing the median value of the column as an argument. This will replace any NaN values with the median of that column. The median is calculated by first sorting the values and then taking the middle value. This method of filling NaN values is useful when the data has outliers that can skew the mean.
You can use the fillna() function to replace NaN values in a pandas DataFrame.
Here are three common ways to use this function:
Method 1: Fill NaN Values in One Column with Median
df['col1'] = df['col1'].fillna(df['col1'].median())
Method 2: Fill NaN Values in Multiple Columns with Median
df[['col1', 'col2']] = df[['col1', 'col2']].fillna(df[['col1', 'col2']].median())
Method 3: Fill NaN Values in All Columns with Median
df = df.fillna(df.median())
The following examples show how to use each method in practice with the following pandas DataFrame:
import numpy as np import pandas as pd #create DataFrame with some NaN values df = pd.DataFrame({'rating': [np.nan, 85, np.nan, 88, 94, 90, 76, 75, 87, 86], 'points': [25, np.nan, 14, 16, 27, 20, 12, 15, 14, 19], 'assists': [5, 7, 7, np.nan, 5, 7, 6, 9, 9, 5], 'rebounds': [11, 8, 10, 6, 6, 9, 6, 10, 10, 7]}) #view DataFrame df rating points assists rebounds 0 NaN 25.0 5.0 11 1 85.0 NaN 7.0 8 2 NaN 14.0 7.0 10 3 88.0 16.0 NaN 6 4 94.0 27.0 5.0 6 5 90.0 20.0 7.0 9 6 76.0 12.0 6.0 6 7 75.0 15.0 9.0 10 8 87.0 14.0 9.0 10 9 86.0 19.0 5.0 7
Example 1: Fill NaN Values in One Column with Median
The following code shows how to fill the NaN values in the rating column with the median value of the rating column:
#fill NaNs with column median in 'rating' column df['rating'] = df['rating'].fillna(df['rating'].median()) #view updated DataFrame df rating points assists rebounds 0 86.5 25.0 5.0 11 1 85.0 NaN 7.0 8 2 86.5 14.0 7.0 10 3 88.0 16.0 NaN 6 4 94.0 27.0 5.0 6 5 90.0 20.0 7.0 9 6 76.0 12.0 6.0 6 7 75.0 15.0 9.0 10 8 87.0 14.0 9.0 10 9 86.0 19.0 5.0 7
The median value in the rating column was 86.5 so each of the NaN values in the rating column were filled with this value.
Example 2: Fill NaN Values in Multiple Columns with Median
The following code shows how to fill the NaN values in both the rating and points columns with their respective column medians:
#fill NaNs with column medians in 'rating' and 'points' columns df[['rating', 'points']] = df[['rating', 'points']].fillna(df[['rating', 'points']].median()) #view updated DataFrame df rating points assists rebounds 0 86.5 25.0 5.0 11 1 85.0 16.0 7.0 8 2 86.5 14.0 7.0 10 3 88.0 16.0 NaN 6 4 94.0 27.0 5.0 6 5 90.0 20.0 7.0 9 6 76.0 12.0 6.0 6 7 75.0 15.0 9.0 10 8 87.0 14.0 9.0 10 9 86.0 19.0 5.0 7
Example 3: Fill NaN Values in All Columns with Median
The following code shows how to fill the NaN values in each column with their column median:
#fill NaNs with column medians in each column df = df.fillna(df.median()) #view updated DataFrame df rating points assists rebounds 0 86.5 25.0 5.0 11 1 85.0 16.0 7.0 8 2 86.5 14.0 7.0 10 3 88.0 16.0 7.0 6 4 94.0 27.0 5.0 6 5 90.0 20.0 7.0 9 6 76.0 12.0 6.0 6 7 75.0 15.0 9.0 10 8 87.0 14.0 9.0 10 9 86.0 19.0 5.0 7
Notice that the NaN values in each column were filled with their column median.
You can find the complete online documentation for the fillna() function .