Table of Contents
The standard deviation is a measure of the spread or variability of a set of data values from the mean. In R, the standard deviation can be calculated using the “sd()” function. This function takes in a vector or a data frame as its input and returns the standard deviation of the data. Some examples of calculating the standard deviation in R include:
1. Calculating the standard deviation of a vector “x”:
sd(x)
2. Calculating the standard deviation of a column “y” in a data frame “df”:
sd(df$y)
3. Calculating the standard deviation of a subset of data in a data frame “df” using a condition “x>5”:
sd(df$x[df$x>5])
By using the “sd()” function, we can easily and accurately calculate the standard deviation of our data in R. This allows us to better understand the variability of our data and make informed decisions in our analyses.
Calculate Standard Deviation in R (With Examples)
You can use the following syntax to calculate the standard deviation of a vector in R:
sd(x)
Note that this formula calculates the sample standard deviation using the following formula:
√Σ (xi – μ)2/ (n-1)
where:
- Σ: A fancy symbol that means “sum”
- xi: The ith value in the dataset
- μ: The mean value of the dataset
- n: The sample size
The following examples show how to use this function in practice.
Example 1: Calculate Standard Deviation of Vector
The following code shows how to calculate the standard deviation of a single vector in R:
#create dataset data <- c(1, 3, 4, 6, 11, 14, 17, 20, 22, 23) #find standard deviation sd(data) [1] 8.279157
Note that you must use na.rm = TRUE to calculate the standard deviation if there are missing values in the dataset:
#create dataset with missing values data <- c(1, 3, 4, 6, NA, 14, NA, 20, 22, 23) #attempt to find standard deviation sd(data) [1] NA #find standard deviation and specify to ignore missing values sd(data, na.rm = TRUE) [1] 9.179753
Example 2: Calculate Standard Deviation of Column in Data Frame
The following code shows how to calculate the standard deviation of a single column in a data frame:
#create data frame data <- data.frame(a=c(1, 3, 4, 6, 8, 9), b=c(7, 8, 8, 7, 13, 16), c=c(11, 13, 13, 18, 19, 22), d=c(12, 16, 18, 22, 29, 38)) #find standard deviation of column a sd(data$a) [1] 3.060501
Example 3: Calculate Standard Deviation of Several Columns in Data Frame
The following code shows how to calculate the standard deviation of several columns in a data frame:
#create data frame data <- data.frame(a=c(1, 3, 4, 6, 8, 9), b=c(7, 8, 8, 7, 13, 16), c=c(11, 13, 13, 18, 19, 22), d=c(12, 16, 18, 22, 29, 38)) #find standard deviation of specific columns in data frame apply(data[ , c('a', 'c', 'd')], 2, sd) a c d 3.060501 4.289522 9.544632