How can I group data by year in R, using the example provided?

Grouping data by year in R allows for the organization and analysis of data based on specific time periods. This can be achieved by utilizing the “group_by” function in R, as demonstrated in the given example. This function allows for the creation of subsets of data based on the desired grouping variable, in this case, the year column. By specifying the grouping variable and desired calculations, such as mean or sum, the resulting output will show the data grouped by year with the corresponding calculation for each year. This method is useful for gaining insights into trends and patterns over time and can be applied to various types of data.

Group Data by Year in R (With Example)


You can use the year function from the package in R to quickly group data by year.

This function uses the following basic syntax:

library(tidyverse)df %>% 
    group_by(year = lubridate::year(date_column)) %>%
    summarize(sum = sum(value_column))

The following example shows how to use this function in practice.

Example: Group Data by Year in R

Suppose we have the following data frame in R that shows the total sales of some item on various dates:

#create data frame 
df <- data.frame(date=as.Date(c('1/4/2021', '1/9/2021', '2/10/2022', '2/15/2022',
                                '3/5/2022', '3/22/2023', '3/27/2023'), '%m/%d/%Y'),
                 sales=c(8, 14, 22, 23, 16, 17, 23))

#view data frame
df

        date sales
1 2021-01-04     8
2 2021-01-09    14
3 2022-02-10    22
4 2022-02-15    23
5 2022-03-05    16
6 2023-03-22    17
7 2023-03-27    23

We can use the following code to calculate the sum of sales, grouped by year:

library(tidyverse)

#group data by year and sum sales
df %>% 
    group_by(year = lubridate::year(date)) %>%
    summarize(sum_sales = sum(sales))

# A tibble: 3 x 2
   year sum_sales
       
1  2021        22
2  2022        61
3  2023        40

From the output we can see:

  • A total of 22 sales were made in 2021.
  • A total of 61 sales were made in 2022.
  • A total of 40 sales were made in 2023.

We can also aggregate the data using some other metric.

For example, we could calculate the max sales made in one day, grouped by year:

library(tidyverse)

#group data by year and find max sales
df %>% 
    group_by(year = lubridate::year(date)) %>%
    summarize(max_sales = max(sales))

# A tibble: 3 x 2
   year max_sales
       
1  2021        14
2  2022        23
3  2023        23

From the output we can see:

  • The max sales made in one day in 2021 was 14.
  • The max sales made in one day in 2022 was 23.
  • The max sales made in one day in 2023 was 23.

Feel free to use whatever metric you’d like within the summarize() function.

Additional Resources

The following tutorials explain how to perform other common operations in R:

x