How to Split Data into Equal Sized Groups in R

In R, the split() function can be used to divide a set of data into equal sized groups. This function takes two arguments: the data frame to be split and the number of groups that the data should be divided into. The resulting list contains the same number of items as the number of groups specified in the argument. The data frame is randomly split into the specified number of groups and each group is assigned to a list element in the output. The output from the split() function can then be used for further analysis.


You can use the cut_number() function from the ggplot2 package in R to split a vector into equal sized groups.

This function uses the following basic syntax:

cut_number(x, n)

where:

  • x: Name of numeric vector to split
  • n: Number of groups

The following example shows how to use this function in practice.

Example: How to Split Data into Equal Sized Groups in R

Suppose we have the following data frame in R that contains information about the points scored by 12 different basketball players

#create data frame
df <- data.frame(player=LETTERS[1:12],
                 points=c(1, 2, 2, 2, 4, 5, 7, 9, 12, 14, 15, 22))

#view data frame
df

   player points
1       A      1
2       B      2
3       C      2
4       D      2
5       E      4
6       F      5
7       G      7
8       H      9
9       I     12
10      J     14
11      K     15
12      L     22

Related: How to Use LETTERS Function in R

We can use the cut_number() function from the ggplot2 package to create a new column called group that splits each row in the data frame into one of three groups based on the value in the points column:

library(ggplot2)

#create new column that splits data into three equal sized groups based on points
df$group <- cut_number(df$points, 3)

#view updated data frame
df

   player points     group
1       A      1  [1,3.33]
2       B      2  [1,3.33]
3       C      2  [1,3.33]
4       D      2  [1,3.33]
5       E      4 (3.33,10]
6       F      5 (3.33,10]
7       G      7 (3.33,10]
8       H      9 (3.33,10]
9       I     12   (10,22]
10      J     14   (10,22]
11      K     15   (10,22]
12      L     22   (10,22]

Each of the 12 players have been placed into one of three groups based on the value in the points column.

From the output we can see that there are 3 distinct groups:

  • group 1: points value is between 1 and 3.33.
  • group 2: points value is between 3.33 and 10.
  • group 3: points value is between 10 and 22.

We can see that four players have been placed into each group.

If you would like the group column to display the groups as integer values instead, you can wrap the cut_number() function in an as.numeric() function:

library(ggplot2)

#create new column that splits data into three equal sized groups based on points
df$group <- as.numeric(cut_number(df$points, 3))

#view updated data frame
df

   player points group
1       A      1     1
2       B      2     1
3       C      2     1
4       D      2     1
5       E      4     2
6       F      5     2
7       G      7     2
8       H      9     2
9       I     12     3
10      J     14     3
11      K     15     3
12      L     22     3

The new group column now contains the values 1, 2 and 3 to indicate which group the player belongs to.

Once again, each group contains four players.

Note: To split the points column into more than three groups, simply change the 3 in the cut_number() function to a different number.

The following tutorials explain how to perform other common tasks in R:

x