How to Use split() Function in R to Split Data?

The split() function in R is used to divide a data set into subsets according to a user-defined factor. The function requires two arguments, a data set and a factor, and returns a list of subsets of the data set. The split() function can be used to create a variety of subsets of a data set, including subsets based on the levels of a factor, k-folds of a data set, and random samples from a data set.


The split() function in R can be used to split data into groups based on factor levels.

This function uses the following basic syntax:

split(x, f, …)

where:

  • x: Name of the vector or data frame to divide into groups
  • f: A factor that defines the groupings

The following examples show how to use this function to split vectors and data frames into groups.

Example 1: Use split() to Split Vector Into Groups

The following code shows how to split a vector of data values into groups based on a vector of factor levels:

#create vector of data values
data <- c(1, 2, 3, 4, 5, 6)

#create vector of groupings
groups <- c('A', 'B', 'B', 'B', 'C', 'C')

#split vector of data values into groups
split(x = data, f = groups)

$A
[1] 1

$B
[1] 2 3 4

$C
[1] 5 6

The result is three groups.

Note that you can use indexing to retrieve specific groups as well:

#split vector of data values into groups and only display second group
split(x = data, f = groups)[2]

$B
[1] 2 3 4

Example 2: Use split() to Split Data Frame Into Groups

Suppose we have the following data frame in R:

#create data frame
df <- data.frame(team=c('A', 'A', 'A', 'B', 'B', 'B'),
                 position=c('G', 'G', 'F', 'G', 'F', 'F'),
                 points=c(33, 28, 31, 39, 34, 44),
                 assists=c(30, 28, 24, 24, 28, 19))

#view data frame
df

  team position points assists
1    A        G     33      30
2    A        G     28      28
3    A        F     31      24
4    B        G     39      24
5    B        F     34      28
6    B        F     44      19

We can use the following code to split the data frame into groups based on the ‘team’ variable:

#split data frame into groups based on 'team'
split(df, f = df$team)

$A
  team position points assists
1    A        G     33      30
2    A        G     28      28
3    A        F     31      24

$B
  team position points assists
4    B        G     39      24
5    B        F     34      28
6    B        F     44      19

Note that we can also split the data into groups using multiple factor variables. For example, the following code shows how to split the data into groups based on the ‘team’ and ‘position’ variables:

#split data frame into groups based on 'team' and 'position' variables
split(df, f = list(df$team, df$position))

$A.F
  team position points assists
3    A        F     31      24

$B.F
  team position points assists
5    B        F     34      28
6    B        F     44      19

$A.G
  team position points assists
1    A        G     33      30
2    A        G     28      28

$B.G
  team position points assists
4    B        G     39      24

The result is four groups.

The following tutorials explain how to use other common functions in R:

x