How can I loop through column names in R and perform operations on each column? Can you provide some examples?

In R, looping through column names allows the user to perform operations on each individual column of a dataset. This can be achieved by using the “for” loop function, which iterates through a set of instructions for each column name. An example of this would be calculating the mean of each column using the “mean()” function within the “for” loop. Another example could be creating a new column by combining two existing columns using the “+” operator within the “for” loop. By using this method, the user can efficiently apply operations to multiple columns in a dataset.

Loop Through Column Names in R (With Examples)


Often you may want to loop through the column names of a data frame in R and perform some operation on each column. There are two common ways to do this:

Method 1: Use a For Loop

for (i in colnames(df)){
   some operation
}

Method 2: Use sapply()

sapply(df, some operation)

This tutorial shows an example of how to use each of these methods in practice.

Method 1: Use a For Loop

The following code shows how to loop through the column names of a data frame using a for loop and output the mean value of each column:

#create data frame
df <- data.frame(var1=c(1, 3, 3, 4, 5),
                 var2=c(7, 7, 8, 3, 2),
                 var3=c(3, 3, 6, 6, 8),
                 var4=c(1, 1, 2, 8, 9))

#view data frame
df

  var1 var2 var3 var4
1    1    7    3    1
2    3    7    3    1
3    3    8    6    2
4    4    3    6    8
5    5    2    8    9

#loop through each column and print mean of column
for (i in colnames(df)){
    print(mean(df[[i]]))
}

[1] 3.2
[1] 5.4
[1] 5.2
[1] 4.2

Method 2: Use sapply()

The following code shows how to loop through the column names of a data frame using sapply() and output the mean value of each column:

#create data frame
df <- data.frame(var1=c(1, 3, 3, 4, 5),
                 var2=c(7, 7, 8, 3, 2),
                 var3=c(3, 3, 6, 6, 8),
                 var4=c(1, 1, 2, 8, 9))

#view data frame
df

  var1 var2 var3 var4
1    1    7    3    1
2    3    7    3    1
3    3    8    6    2
4    4    3    6    8
5    5    2    8    9

#loop through each column and print mean of column
sapply(df, mean)

var1 var2 var3 var4 
 3.2  5.4  5.2  4.2 

Notice that the two methods return identical results.

Related: A Guide to apply(), lapply(), sapply(), and tapply() in R

x