Table of Contents
Finding the maximum value by group in R can be achieved through the use of the “aggregate” function. This function allows for the grouping of data based on a specific variable and then calculates the maximum value for each group. The syntax for this function is “aggregate(x, by, FUN)” where “x” is the data to be grouped, “by” is the variable used for grouping, and “FUN” is the function used to calculate the maximum value. The result is a new data frame with the maximum value for each group. This method is useful for comparing data trends within different groups and identifying the highest value within each group.
Find the Maximum Value by Group in R
Often you may want to find the maximum value of each group in a data frame in R. Fortunately this is easy to do using functions from the dplyr package.
This tutorial explains how to do so using the following data frame:
#create data frame df <- data.frame(team = c('A', 'A', 'A', 'B', 'B', 'B', 'B'), position = c('G', 'F', 'F', 'G', 'G', 'G', 'F'), points = c(12, 15, 19, 22, 34, 34, 39)) #view data frame df team position points 1 A G 12 2 A F 15 3 A F 19 4 B G 22 5 B G 34 6 B G 34 7 B F 39
Example 1: Find Max Value by Group
The following code shows how to find the max value by team and position:
library(dplyr) #find max value by team and position df %>% group_by(team, position) %>% summarise(max = max(points, na.rm=TRUE)) # A tibble: 4 x 3 # Groups: team [?] team position max 1 A F 19.0 2 A G 12.0 3 B F 39.0 4 B G 34.0
Example 2: Return Rows that Contains Max Value by Group
The following code shows how to return the rows that contain the max value by team and position:
library(dplyr) #find rows that contain max points by team and position df %>% group_by(team, position) %>% filter(points == max(points, na.rm=TRUE)) # A tibble: 5 x 3 # Groups: team, position [4] team position points 1 A G 12.0 2 A F 19.0 3 B G 34.0 4 B G 34.0 5 B F 39.0
Example 3: Return a Single Row that Contains Max Value by Group
In the previous example, there were two players who had the max amount of points on team A who were both in position G. If you only want to return the first player with the max value in a group, you can use the slice() function as follows:
library(dplyr) #find rows that contain max points by team and position df %>% group_by(team, position) %>% slice(which.max(points)) # A tibble: 4 x 3 # Groups: team, position [4] team position points 1 A F 19.0 2 A G 12.0 3 B F 39.0 4 B G 34.0
Additional Resources
The Complete Guide: How to Group & Summarize Data in R
How to Filter Rows in R
How to Remove Duplicate Rows in R