How do I Sum Columns Based on a Condition in R

In R, you can sum columns based on a condition using the aggregate and sum commands. This allows you to group data by one or more variables and sum the values in the columns for each group. This can be useful for summarizing data and creating meaningful insights.


You can use the following basic syntax to sum columns based on condition in R:

#sum values in column 3 where col1 is equal to 'A'
sum(df[which(df$col1=='A'), 3])

The following examples show how to use this syntax in practice with the following data frame:

#create data frame
df <- data.frame(conference = c('East', 'East', 'East', 'West', 'West', 'East'),
                 team = c('A', 'A', 'A', 'B', 'B', 'C'),
                 points = c(11, 8, 10, 6, 6, 5),
                 rebounds = c(7, 7, 6, 9, 12, 8))

#view data frame
df

  conference team points rebounds
1       East    A     11        7
2       East    A      8        7
3       East    A     10        6
4       West    B      6        9
5       West    B      6       12
6       East    C      5        8

Example 1: Sum One Column Based on One Condition

The following code shows how to find the sum of the points column for the rows where team is equal to ‘A’:

#sum values in column 3 (points column) where team is equal to 'A'
sum(df[which(df$team=='A'), 3])

[1] 29

The following code shows how to find the sum of the rebounds column for the rows where points is greater than 9:

#sum values in column 4 (rebounds column) where points is greater than 9
sum(df[which(df$points > 9), 4])

[1] 13

Example 2: Sum One Column Based on Multiple Conditions

The following code shows how to find the sum of the points column for the rows where team is equal to ‘A’ and conference is equal to ‘East’:

#sum values in column 3 (points column) where team is 'A' and conference is 'East'
sum(df[which(df$team=='A' & df$conference=='East'), 3])

[1] 29

Note that the & operator stands for “and” in R.

Example 3: Sum One Column Based on One of Several Conditions

The following code shows how to find the sum of the points column for the rows where team is equal to ‘A’ or ‘C’:

#sum values in column 3 (points column) where team is 'A' or 'C'
sum(df[which(df$team == 'A' | df$team =='C'), 3])

[1] 34

Note that the | operator stands for “or” in R.

The following tutorials explain how to perform other common functions in R:

x