calculate mean median mode with pandas

How to Calculate Mean, Median, and Mode Using Pandas: A Step-by-Step Guide

Pandas is an indispensable library within the Python ecosystem, specifically designed to handle and analyze data efficiently. It offers robust tools for performing complex calculations, particularly in the realm of descriptive statistics. Among the most frequently sought measures are the indicators of central tendency: the mean, the median, and the mode. This powerful library simplifies the process of obtaining these metrics, transforming raw data into meaningful insights with minimal code.

For data analysts and scientists, understanding the central characteristics of a dataset is a foundational requirement. Pandas provides dedicated, high-performance functions—specifically, .mean(), .median(), and .mode()—that rapidly calculate these measures across entire columns or specific data subsets. Utilizing these built-in functions ensures both computational speed and accuracy, making Pandas the preferred tool for initial data exploration and statistical summarization in Python environments.


Accessing Central Tendency Measures

When working with a Pandas DataFrame, you often need to calculate the measures of central tendency for multiple columns simultaneously. The methods .mean(), .median(), and .mode() are applied directly to the DataFrame object. It is crucial to use the argument numeric_only=True to ensure these calculations are restricted solely to the columns containing numerical data, thereby preventing errors when non-numeric columns (like strings or dates) are present.

The following snippet demonstrates the standard syntax used to calculate the mean, median, and mode for every applicable column within your dataset. The output will be a Pandas Series object, where the index represents the column name and the values represent the calculated statistic.

print(df.mean(numeric_only=True))
print(df.median(numeric_only=True))
print(df.mode(numeric_only=True))

To illustrate these concepts effectively, we will walk through a complete, hands-on example, starting with the creation of a sample DataFrame tailored for this statistical analysis.

Example: Setting up the Sample Data

For our practical demonstration, we will analyze hypothetical performance data. Imagine a scenario where we track the points scored by several basketball players across four different games. This data structure naturally fits into a Pandas DataFrame, allowing us to easily apply statistical functions to the numerical game columns.

The code below initializes the DataFrame named df. It includes one categorical column (player) and four quantitative columns (game1 through game4), which represent the individual scores achieved by eight distinct players.

import pandas as pd

# Create the DataFrame containing player scores across four games
df = pd.DataFrame({'player': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'],
                   'game1': [18, 22, 19, 14, 14, 11, 20, 28],
                   'game2': [5, 7, 7, 9, 12, 9, 9, 4],
                   'game3': [11, 8, 10, 6, 6, 5, 9, 12],
                   'game4': [9, 8, 10, 9, 14, 15, 10, 11]})
                   
# Display the complete DataFrame
print(df)

  player  game1  game2  game3  game4
0      A     18      5     11      9
1      B     22      7      8      8
2      C     19      7     10     10
3      D     14      9      6      9
4      E     14     12      6     14
5      F     11      9      5     15
6      G     20      9      9     10
7      H     28      4     12     11

Calculating the Mean (Average) Score

The arithmetic mean is perhaps the most common measure of central tendency. It is calculated by summing all the values in a dataset and dividing by the total count of observations. In Pandas, the .mean() function calculates this average across all specified numerical columns swiftly, providing a single typical value for each game’s performance.

We apply the .mean() function to our DataFrame df, again specifying numeric_only=True to target only the scoring columns. The resulting output clearly shows the average points scored across all players for each respective game, giving us a baseline understanding of player performance for each matchup.

# Calculate the arithmetic mean of each numerical column
print(df.mean(numeric_only=True))

game1    18.250
game2     7.750
game3     8.375
game4    10.750
dtype: float64

Based on the results generated by the .mean() function, we can draw the following conclusions regarding the average performance in each game:

  • The average score (mean value) recorded in the game1 column is 18.25 points.
  • The average score (mean value) recorded in the game2 column is 7.75 points.
  • The average score (mean value) recorded in the game3 column is 8.375 points.
  • The average score (mean value) recorded in the game4 column is 10.75 points.

Determining the Median (Middle Value) Score

While the mean is sensitive to outliers and skewed distributions, the median provides a more robust measure of central tendency. The median is defined as the value that physically separates the higher half from the lower half of a data sample when the data is sorted.

When dealing with an even number of observations, the median is calculated as the average of the two central values. To compute this midpoint for each game, we utilize the .median() function. This calculation is vital because it offers a center point that is not disproportionately influenced by extremely high or low individual scores, giving a clearer picture of typical player performance without statistical bias from extremes.

# Calculate the median value of each numerical column
print(df.median(numeric_only=True))

game1    18.5
game2     8.0
game3     8.5
game4    10.0
dtype: float64

Examining the output of the .median() function reveals the following middle scores for our dataset:

  • The median score in the game1 column is 18.5 points.
  • The median score in the game2 column is 8 points.
  • The median score in the game3 column is 8.5 points.
  • The median score in the game4 column is 10 points.

Finding the Mode (Most Frequent Value) Score

The mode represents the value that appears most frequently in a dataset. Understanding the mode is essential for categorical data, but it also highlights the most common occurrences in numerical data. Unlike the mean and median, a dataset can be unimodal, multimodal (having multiple modes), or have no mode at all if all values are unique. The .mode() function in Pandas is specifically designed to handle this complexity by returning all modes found within the data.

Because a column might possess multiple modes, the output structure of .mode() differs from the single-value results of the mean and median functions. It returns a DataFrame where multiple rows may be used to list all modes for a specific column. This structure ensures that no frequent value is missed, with missing modes represented by NaN values in the subsequent rows.

# Calculate the mode(s) of each numerical column
print(df.mode(numeric_only=True))

   game1  game2  game3  game4
0   14.0    9.0    6.0      9
1    NaN    NaN    NaN     10

Analyzing the resulting DataFrame provides a comprehensive view of the most common scores for each game:

  • The mode in the game1 column is 14.
  • The mode in the game2 column is 9.
  • The mode in the game3 column is 6.
  • The game4 column is bimodal, possessing two modes: 9 and 10.

It is important to notice how game4 required two rows to display its results. This indicates that the values 9 and 10 occurred with equal, highest frequency in that particular game column. If a column had only one mode, the subsequent rows for that column would contain NaN (Not a Number) entries.

Summary and Further Exploration

We have successfully demonstrated how to leverage the core statistical functions in Pandas to rapidly calculate the three primary measures of central tendency across a DataFrame containing numerical data. The functions .mean(), .median(), and .mode() are essential tools for any data analysis workflow, providing immediate, crucial insights into the typical values within a dataset.

For researchers requiring a broader overview that goes beyond central tendency, Pandas provides an even more comprehensive function: .describe(). This method generates a table containing eight key descriptive statistics, including measures of dispersion (like standard deviation) and location (quartiles).

Note: You can also use the .describe() function in pandas to generate a richer set of descriptive statistics for each numerical column in your DataFrame, which is highly recommended for initial data profiling.

Cite this article

stats writer (2025). How to Calculate Mean, Median, and Mode Using Pandas: A Step-by-Step Guide. PSYCHOLOGICAL SCALES. Retrieved from https://scales.arabpsychology.com/stats/calculate-mean-median-mode-with-pandas/

stats writer. "How to Calculate Mean, Median, and Mode Using Pandas: A Step-by-Step Guide." PSYCHOLOGICAL SCALES, 21 Nov. 2025, https://scales.arabpsychology.com/stats/calculate-mean-median-mode-with-pandas/.

stats writer. "How to Calculate Mean, Median, and Mode Using Pandas: A Step-by-Step Guide." PSYCHOLOGICAL SCALES, 2025. https://scales.arabpsychology.com/stats/calculate-mean-median-mode-with-pandas/.

stats writer (2025) 'How to Calculate Mean, Median, and Mode Using Pandas: A Step-by-Step Guide', PSYCHOLOGICAL SCALES. Available at: https://scales.arabpsychology.com/stats/calculate-mean-median-mode-with-pandas/.

[1] stats writer, "How to Calculate Mean, Median, and Mode Using Pandas: A Step-by-Step Guide," PSYCHOLOGICAL SCALES, vol. X, no. Y, ص Z-Z, November, 2025.

stats writer. How to Calculate Mean, Median, and Mode Using Pandas: A Step-by-Step Guide. PSYCHOLOGICAL SCALES. 2025;vol(issue):pages.

Download Post (.PDF)
Slide Up
x
PDF
Scroll to Top