How do I use groupby to calculate the mean, and not ignore NaN values in Pandas?

In Pandas, you can use the groupby method to calculate the mean, while also specifying the parameter ‘skipna=False’ to ensure that NaN values are not ignored. This will ensure that the mean is calculated while taking into account the NaN values. For example, df.groupby(‘column_name’).mean(skipna=False). This will return the mean of the column taking into account any NaN values.


When using the pandas groupby() function to group by one column and calculate the mean value of another column, pandas will ignore NaN values by default.

If you would instead like to display NaN if there are NaN values present in a column, you can use the following basic syntax:

df.groupby('team').agg({'points': lambda x: x.mean(skipna=False)})

This particular example will group the rows of the DataFrame by the team column and then calculate the mean value of the points column without ignoring NaN values.

The following example shows how to use this syntax in practice.

Example: Use pandas groupby() and Don’t Ignore NaNs

Suppose we have the following pandas DataFrame that contains information about various basketball players:

import pandas as pd
import numpy as np

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B'],
                   'points': [15, np.nan, 24, 25, 20, 35, 34, 19, 14, 12]})

#view DataFrame
print(df)

  team  points
0    A    15.0
1    A     NaN
2    A    24.0
3    A    25.0
4    A    20.0
5    B    35.0
6    B    34.0
7    B    19.0
8    B    14.0
9    B    12.0

Suppose we use the following syntax to calculate the mean value of points, grouped by team:

#calculate mean of points, grouped by team
df.groupby('team')['points'].mean()

team
A    21.0
B    22.8
Name: points, dtype: float64

Notice that the mean value of points for each team is returned, even though there is a NaN value for team A in the points column.

By default, pandas simply ignores the NaN value when calculating the mean.

If you would instead like to display NaN as the mean value if there are indeed NaNs present, you can use the following syntax:

#calculate mean points value grouped by team and don't ignore NaNs
df.groupby('team').agg({'points': lambda x: x.mean(skipna=False)})

      points
team	
A	 NaN
B	22.8

Notice that a NaN value is returned as the mean points value for team A this time.

By using the argument skipna=False, we told pandas not to ignore the NaN values when calculating the mean.

x