How can data be grouped by year in R, with an example provided?

In R, data can be grouped by year using the “group_by” function from the dplyr package. This function allows for the creation of groups based on a specific variable, in this case, the year. For example, if we have a dataset containing sales data from the past 10 years, we can group the data by year using the following code:

sales_data %>%
group_by(year)

This will create separate groups for each year in the dataset, allowing for easier analysis and visualization of the data by year.


You can use the year function from the package in R to quickly group data by year.

This function uses the following basic syntax:

library(tidyverse)

df %>% 
    group_by(year = lubridate::year(date_column)) %>%
    summarize(sum = sum(value_column))

The following example shows how to use this function in practice.

Example: Group Data by Year in R

Suppose we have the following data frame in R that shows the total sales of some item on various dates:

#create data frame 
df <- data.frame(date=as.Date(c('1/4/2021', '1/9/2021', '2/10/2022', '2/15/2022',
                                '3/5/2022', '3/22/2023', '3/27/2023'), '%m/%d/%Y'),
                 sales=c(8, 14, 22, 23, 16, 17, 23))

#view data frame
df

        date sales
1 2021-01-04     8
2 2021-01-09    14
3 2022-02-10    22
4 2022-02-15    23
5 2022-03-05    16
6 2023-03-22    17
7 2023-03-27    23

We can use the following code to calculate the sum of sales, grouped by year:

library(tidyverse)

#group data by year and sum sales
df %>% 
    group_by(year = lubridate::year(date)) %>%
    summarize(sum_sales = sum(sales))

# A tibble: 3 x 2
   year sum_sales
       
1  2021        22
2  2022        61
3  2023        40

From the output we can see:

  • A total of 22 sales were made in 2021.
  • A total of 61 sales were made in 2022.
  • A total of 40 sales were made in 2023.

We can also aggregate the data using some other metric.

For example, we could calculate the max sales made in one day, grouped by year:

library(tidyverse)

#group data by year and find max sales
df %>% 
    group_by(year = lubridate::year(date)) %>%
    summarize(max_sales = max(sales))

# A tibble: 3 x 2
   year max_sales
       
1  2021        14
2  2022        23
3  2023        23

From the output we can see:

  • The max sales made in one day in 2021 was 14.
  • The max sales made in one day in 2022 was 23.
  • The max sales made in one day in 2023 was 23.

Feel free to use whatever metric you’d like within the summarize() function.

Additional Resources

The following tutorials explain how to perform other common operations in R:

x