How to Calculate Sample & Population Variance in Python

Calculating sample and population variance in Python can be done using the statistics module. This module provides functions to calculate the sample variance (using the variance() function) and population variance (using the pvariance() function). When calculating variance, the data set needs to be provided as a list, and the functions will return the variance for the given data set. It is also possible to calculate variance for a single data point. However, the provided functions will calculate the standard variance of the entire data set.


The variance is a way to measure of values in a dataset.

The formula to calculate population variance is:

σ2 = Σ (xi – μ)2 / N

where:

  • Σ: A symbol that means “sum”
  • μ: Population mean
  • xi: The ith element from the population
  • N: Population size

The formula to calculate sample variance is:

s2 = Σ (xix)2 / (n-1)

where:

  • x: Sample mean
  • xi: The ith element from the sample
  • n: Sample size

We can use the variance and pvariance functions from the library in Python to quickly calculate the sample variance and population variance (respectively) for a given array.

from statistics import variance, pvariance

#calculate sample variance
variance(x)

#calculate population variance
pvariance(x)

The following examples show how to use each function in practice.

Example 1: Calculating Sample Variance in Python

The following code shows how to calculate the sample variance of an array in Python:

from statistics import variance 

#define data
data = [4, 8, 12, 15, 9, 6, 14, 18, 12, 9, 16, 17, 17, 20, 14]

#calculate sample variance
variance(data)

22.067

The sample variance turns out to be 22.067.

Example 2: Calculating Population Variance in Python

from statistics import pvariance 

#define data
data = [4, 8, 12, 15, 9, 6, 14, 18, 12, 9, 16, 17, 17, 20, 14]

#calculate sample variance
pvariance(data)

20.596

The population variance turns out to be 20.596.

Notes on Calculating Sample & Population Variance

Keep in mind the following when calculating the sample and population variance:

  • You should calculate the population variance when the dataset you’re working with represents an entire population, i.e. every value that you’re interested in.
  • You should calculate the sample variance when the dataset you’re working with represents a a sample taken from a larger population of interest.
  • The sample variance of a given array of data will always be larger than the population variance for the same array of a data because there is more uncertainty when calculating the sample variance, thus our estimate of the variance will be larger.

The following tutorials explain how to calculate other measures of spread in Python:

x