How to Calculate Sample & Population Variance in R?

In R, the var() function can be used to calculate the sample variance, while the var.test() function is used to calculate the population variance. To calculate the sample variance, the user must provide the data set as an input to the function. To calculate the population variance, the user must provide the data set, the degrees of freedom, and the expected mean as inputs to the var.test() function. Both functions return the variance of the data set as output.


The variance is a way to measure how spread out data values are around the mean.

The formula to find the variance of a population is:

σ2 = Σ (xi – μ)2 / N

where μ is the population mean, xi is the ith element from the population, N is the population size, and Σ is just a fancy symbol that means “sum.”

The formula to find the variance of a sample is:

s2 = Σ (xix)2 / (n-1)

where x is the sample mean, xi is the ith element in the sample, and n is the sample size.

Example: Calculate Sample & Population Variance in R

Suppose we have the following dataset in R:

#define dataset
data <- c(2, 4, 4, 7, 8, 12, 14, 15, 19, 22)

We can calculate the sample variance by using the var() function in R:

#calculate sample variance
var(data)

[1] 46.01111

And we can calculate the population variance by simply multiplying the sample variance by (n-1)/n as follows:

#determine length of data
n <- length(data)

#calculate population variance
var(data) * (n-1)/n

[1] 41.41

Note that the population variance will always be smaller than the sample variance.

In practice, we typically calculate sample variances for datasets since it’s unusual to collect data for an entire population.

Example: Calculate Sample Variance of Multiple Columns

#create data frame
data <- data.frame(a=c(1, 3, 4, 4, 6, 7, 8, 12),
                   b=c(2, 4, 4, 5, 5, 6, 7, 16),
                   c=c(6, 6, 7, 8, 8, 9, 9, 12))

#view data frame
data

   a  b  c
1  1  2  6
2  3  4  6
3  4  4  7
4  4  5  8
5  6  5  8
6  7  6  9
7  8  7  9
8 12 16 12

We can use the sapply() function to calculate the sample variance of each column in the data frame:

#find sample variance of each column
sapply(data, var)

        a         b         c 
11.696429 18.125000  3.839286 

And we can use the following code to calculate the sample standard deviation of each column, which is simply the square root of the sample variance:

#find sample standard deviation of each column
sapply(data, sd)

       a        b        c 
3.420004 4.257347 1.959410 

You can find more R tutorials here.

x