What is the purpose of apply(), lapply(), sapply(), and tapply() in R and how can they be used in data analysis?

The apply(), lapply(), sapply(), and tapply() functions in R are useful tools for data analysis. They all serve the purpose of applying a function to a vector or data frame in a structured manner. This allows for efficient and consistent data manipulation and analysis.

The apply() function takes in a data frame or matrix and applies a function to each row or column, depending on the specified argument. This can be useful for summarizing or transforming data.

The lapply() function applies a function to each element of a list and returns a list of the results. This can be helpful for performing the same operation on multiple elements of a list.

The sapply() function is similar to lapply(), but it simplifies the output by converting it into a vector or matrix. This can be useful for creating summary statistics or applying a function to multiple elements and returning a single vector or matrix.

Lastly, the tapply() function applies a function to subsets of a vector or data frame based on a specified factor variable. This can be useful for performing calculations or analyses on different groups within a dataset.

Overall, these functions are powerful tools for data analysis as they allow for efficient and consistent application of functions to large datasets, making it easier to extract meaningful insights and draw conclusions.

A Guide to apply(), lapply(), sapply(), and tapply() in R


This tutorial explains the differences between the built-in R functions apply(), sapply(), lapply(), and tapply() along with examples of when and how to use each function.

apply()

Use the apply() function when you want to apply a function to the rows or columns of a matrix or data frame.

The basic syntax for the apply() function is as follows:

apply(X, MARGIN, FUN)

  • X is the name of the matrix or data frame
  • MARGIN indicates which dimension to perform an operation across (1 = row, 2 = column)
  • FUN is the specific operation you want to perform (e.g. min, max, sum, mean, etc.)

The following code illustrates several examples of apply() in action.

#create a data frame with three columns and five rows
data <- data.frame(a = c(1, 3, 7, 12, 9),
                   b = c(4, 4, 6, 7, 8),
                   c = c(14, 15, 11, 10, 6))
data

#   a b  c
#1  1 4 14
#2  3 4 15
#3  7 6 11
#4 12 7 10
#5  9 8  6

#find the sum of each row
apply(data, 1, sum)

#[1] 19 22 24 29 23

#find the sum of each column
apply(data, 2, sum)

# a  b  c 
#32 29 56 

#find the mean of each row
apply(data, 1, mean)

#[1] 6.333333 7.333333 8.000000 9.666667 7.666667

#find the mean of each column, rounded to one decimal place
round(apply(data, 2, mean), 1)

#  a    b     c 
#6.4  5.8  11.2 

#find the standard deviation of each row
apply(data, 1, sd)

#[1] 6.806859 6.658328 2.645751 2.516611 1.527525

#find the standard deviation of each column
apply(data, 2, sd)

#       a        b        c 
#4.449719 1.788854 3.563706 

lapply()

Use the lapply() function when you want to apply a function to each element of a list, vector, or data frame and obtain a list as a result.

The basic syntax for the lapply() function is as follows:

lapply(X, FUN)

  • X is the name of the list, vector, or data frame
  • FUN is the specific operation you want to perform

The following code illustrates several examples of using lapply() on the columns of a data frame.

#create a data frame with three columns and five rows
data <- data.frame(a = c(1, 3, 7, 12, 9),
                   b = c(4, 4, 6, 7, 8),
                   c = c(14, 15, 11, 10, 6))
data

#   a b  c
#1  1 4 14
#2  3 4 15
#3  7 6 11
#4 12 7 10
#5  9 8  6

#find mean of each column and return results as a list
lapply(data, mean)

# $a
# [1] 6.4
#
# $b
# [1] 5.8
#
# $c
# [1] 11.2

#multiply values in each column by 2 and return results as a list
lapply(data, function(data) data*2)

# $a
# [1]  2  6 14 24 18
#
# $b
# [1]  8  8 12 14 16
#
# $c
# [1] 28 30 22 20 12

We can also use lapply() to perform operations on lists. The following examples show how to do so.

#create a list
x <- list(a=1, b=1:5, c=1:10) 
x

# $a
# [1] 1
#
# $b
# [1] 1 2 3 4 5
#
# $c
# [1]  1  2  3  4  5  6  7  8  9 10

#find the sum of each element in the list
lapply(x, sum)

# $a
# [1] 1
#
# $b
# [1] 15
#
# $c
# [1] 55

#find the mean of each element in the list
lapply(x, mean)

# $a
# [1] 1
#
# $b
# [1] 3
#
# $c
# [1] 5.5

#multiply values of each element by 5 and return results as a list
lapply(x, function(x) x*5)

# $a
# [1] 5
#
# $b
# [1]  5 10 15 20 25
#
# $c
# [1]  5 10 15 20 25 30 35 40 45 50

sapply()

Use the sapply() function when you want to apply a function to each element of a list, vector, or data frame and obtain a vector instead of a list as a result.

The basic syntax for the sapply() function is as follows:

sapply(X, FUN)

  • X is the name of the list, vector, or data frame
  • FUN is the specific operation you want to perform

The following code illustrates several examples of using sapply() on the columns of a data frame.

#create a data frame with three columns and five rows
data <- data.frame(a = c(1, 3, 7, 12, 9),
                   b = c(4, 4, 6, 7, 8),
                   c = c(14, 15, 11, 10, 6))
data

#   a b  c
#1  1 4 14
#2  3 4 15
#3  7 6 11
#4 12 7 10
#5  9 8  6

#find mean of each column and return results as a vector
sapply(data, mean)

#  a   b    c 
#6.4 5.8 11.2 

#multiply values in each column by 2 and return results as a matrix
sapply(data, function(data) data*2)

#      a  b  c
#[1,]  2  8 28
#[2,]  6  8 30
#[3,] 14 12 22
#[4,] 24 14 20
#[5,] 18 16 12

We can also use sapply() to perform operations on lists. The following examples show how to do so.

#create a list
x <- list(a=1, b=1:5, c=1:10) 
x

# $a
# [1] 1
#
# $b
# [1] 1 2 3 4 5
#
# $c
# [1]  1  2  3  4  5  6  7  8  9 10

#find the sum of each element in the list
sapply(x, sum)

# a  b  c 
# 1 15 55 

#find the mean of each element in the list
sapply(x, mean)

#  a   b   c 
#1.0 3.0 5.5

tapply()

Use the tapply() function when you want to apply a function to subsets of a vector and the subsets are defined by some other vector, usually a factor.

The basic syntax for the tapply() function is as follows:

tapply(X, INDEX, FUN)

  • X is the name of the object, typically a vector
  • INDEX is a list of one or more factors
  • FUN is the specific operation you want to perform

The following code illustrates an example of using tapply() on the built-in R dataset iris.

#view first six lines of iris dataset
head(iris)

#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1          5.1         3.5          1.4         0.2  setosa
#2          4.9         3.0          1.4         0.2  setosa
#3          4.7         3.2          1.3         0.2  setosa
#4          4.6         3.1          1.5         0.2  setosa
#5          5.0         3.6          1.4         0.2  setosa
#6          5.4         3.9          1.7         0.4  setosa

#find the max Sepal.Length of each of the three Species
tapply(iris$Sepal.Length, iris$Species, max)

#setosa versicolor  virginica 
#   5.8        7.0        7.9 

#find the mean Sepal.Width of each of the three Species
tapply(iris$Sepal.Width, iris$Species, mean)

# setosa versicolor virginica 
#  3.428      2.770     2.974 

#find the minimum Petal.Width of each of the three Species
tapply(iris$Petal.Width, iris$Species, min)

#  setosa versicolor virginica 
#     0.1        1.0       1.4 
x