How can we calculate the standard deviation in R? Can you provide some examples?

The standard deviation is a measure of the spread or variability of a set of data values from the mean. In R, the standard deviation can be calculated using the “sd()” function. This function takes in a vector or a data frame as its input and returns the standard deviation of the data. Some examples of calculating the standard deviation in R include:

1. Calculating the standard deviation of a vector “x”:
sd(x)

2. Calculating the standard deviation of a column “y” in a data frame “df”:
sd(df$y)

3. Calculating the standard deviation of a subset of data in a data frame “df” using a condition “x>5”:
sd(df$x[df$x>5])

By using the “sd()” function, we can easily and accurately calculate the standard deviation of our data in R. This allows us to better understand the variability of our data and make informed decisions in our analyses.

Calculate Standard Deviation in R (With Examples)


You can use the following syntax to calculate the standard deviation of a vector in R:

sd(x)

Note that this formula calculates the sample standard deviation using the following formula:

Σ (xi – μ)2/ (n-1)

where:

  • Σ: A fancy symbol that means “sum”
  • xi: The ith value in the dataset
  • μ: The mean value of the dataset
  • n: The sample size

The following examples show how to use this function in practice.

Example 1: Calculate Standard Deviation of Vector

The following code shows how to calculate the standard deviation of a single vector in R:

#create dataset
data <- c(1, 3, 4, 6, 11, 14, 17, 20, 22, 23)

#find standard deviation
sd(data)

[1] 8.279157

Note that you must use na.rm = TRUE to calculate the standard deviation if there are missing values in the dataset:

#create dataset with missing values
data <- c(1, 3, 4, 6, NA, 14, NA, 20, 22, 23)

#attempt to find standard deviation
sd(data)

[1] NA

#find standard deviation and specify to ignore missing values
sd(data, na.rm = TRUE)

[1] 9.179753

Example 2: Calculate Standard Deviation of Column in Data Frame

The following code shows how to calculate the standard deviation of a single column in a data frame:

#create data frame
data <- data.frame(a=c(1, 3, 4, 6, 8, 9),
                   b=c(7, 8, 8, 7, 13, 16),
                   c=c(11, 13, 13, 18, 19, 22),
                   d=c(12, 16, 18, 22, 29, 38))

#find standard deviation of column a
sd(data$a)

[1] 3.060501

Example 3: Calculate Standard Deviation of Several Columns in Data Frame

The following code shows how to calculate the standard deviation of several columns in a data frame:

#create data frame
data <- data.frame(a=c(1, 3, 4, 6, 8, 9),
                   b=c(7, 8, 8, 7, 13, 16),
                   c=c(11, 13, 13, 18, 19, 22),
                   d=c(12, 16, 18, 22, 29, 38))

#find standard deviation of specific columns in data frame
apply(data[ , c('a', 'c', 'd')], 2, sd)

       a        c        d 
3.060501 4.289522 9.544632 

Additional Resources

x