Table of Contents

The split() function in R is used to divide a data set into subsets according to a user-defined factor. The function requires two arguments, a data set and a factor, and returns a list of subsets of the data set. The split() function can be used to create a variety of subsets of a data set, including subsets based on the levels of a factor, k-folds of a data set, and random samples from a data set.

The **split()** function in R can be used to split data into groups based on factor levels.

This function uses the following basic syntax:

**split(x, f, …)**

where:

**x**: Name of the vector or data frame to divide into groups**f**: A factor that defines the groupings

The following examples show how to use this function to split vectors and data frames into groups.

**Example 1: Use split() to Split Vector Into Groups**

The following code shows how to split a vector of data values into groups based on a vector of factor levels:

#create vector of data values data <- c(1, 2, 3, 4, 5, 6) #create vector of groupings groups <- c('A', 'B', 'B', 'B', 'C', 'C') #split vector of data values into groups split(x = data, f = groups) $A [1] 1 $B [1] 2 3 4 $C [1] 5 6

The result is three groups.

Note that you can use indexing to retrieve specific groups as well:

#split vector of data values into groups and only display second group split(x = data, f = groups)[2] $B [1] 2 3 4

**Example 2: Use split() to Split Data Frame Into Groups**

Suppose we have the following data frame in R:

#create data frame df <- data.frame(team=c('A', 'A', 'A', 'B', 'B', 'B'), position=c('G', 'G', 'F', 'G', 'F', 'F'), points=c(33, 28, 31, 39, 34, 44), assists=c(30, 28, 24, 24, 28, 19)) #view data frame df team position points assists 1 A G 33 30 2 A G 28 28 3 A F 31 24 4 B G 39 24 5 B F 34 28 6 B F 44 19

We can use the following code to split the data frame into groups based on the ‘team’ variable:

#split data frame into groups based on 'team' split(df, f = df$team) $A team position points assists 1 A G 33 30 2 A G 28 28 3 A F 31 24 $B team position points assists 4 B G 39 24 5 B F 34 28 6 B F 44 19

Note that we can also split the data into groups using multiple factor variables. For example, the following code shows how to split the data into groups based on the ‘team’ and ‘position’ variables:

#split data frame into groups based on 'team' and 'position' variables split(df, f = list(df$team, df$position)) $A.F team position points assists 3 A F 31 24 $B.F team position points assists 5 B F 34 28 6 B F 44 19 $A.G team position points assists 1 A G 33 30 2 A G 28 28 $B.G team position points assists 4 B G 39 24

The result is four groups.

The following tutorials explain how to use other common functions in R: