How do I get the mean and standard deviation for the dataframe without using describe()?

In order to calculate the mean and standard deviation of a dataframe without using describe(), you can use the mean() and std() methods on the dataframe. This will return the mean and standard deviation of the dataframe as a series.


You can use the describe() function to generate for variables in a pandas DataFrame.

By default, the describe() function calculates the following metrics for each numeric variable in a DataFrame:

  • count (number of values)
  • mean (mean value)
  • std (standard deviation)
  • min (minimum value)
  • 25% (25th percentile)
  • 50% (50th percentile)
  • 75% (75th percentile)
  • max (max value)

However you can use the following syntax to only calculate the mean and standard deviation for each numeric variable:

df.describe().loc[['mean', 'std']]

The following example shows how to use this syntax in practice.

Example: Use describe() in Pandas to Only Calculate Mean and Std

Suppose we have the following pandas DataFrame that contains information about various basketball players:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'],
                   'points': [18, 22, 19, 14, 14, 11, 20, 28],
                   'assists': [5, 7, 7, 9, 12, 9, 9, 4],
                   'rebounds': [11, 8, 10, 6, 6, 5, 9, 12]})

#view DataFrame
print(df)

  team  points  assists  rebounds
0    A      18        5        11
1    B      22        7         8
2    C      19        7        10
3    D      14        9         6
4    E      14       12         6
5    F      11        9         5
6    G      20        9         9
7    H      28        4        12

If we use the describe() function, we can calculate descriptive statistics for each numeric variable in the DataFrame:

#calculate descriptive statistics for each numeric variable
df.describe()

	   points	assists	   rebounds
count	 8.000000	8.00000	   8.000000
mean	18.250000	7.75000	   8.375000
std	 5.365232	2.54951	   2.559994
min	11.000000	4.00000	   5.000000
25%	14.000000	6.50000	   6.000000
50%	18.500000	8.00000	   8.500000
75%	20.500000	9.00000	  10.250000
max	28.000000	12.00000  12.000000

However, we can use the following syntax to only calculate the and for each numeric variable:

#only calculate mean and standard deviation of each numeric variable
df.describe().loc[['mean', 'std']]

           points  assists  rebounds
mean	18.250000  7.75000  8.375000
std	 5.365232  2.54951  2.559994

Notice that the output only includes the mean and standard deviation for each numeric variable.

Note that the describe() function still calculated each descriptive statistic as earlier but we used the loc function to select only the rows with the names mean and std in the output.

Related:

x