How to Split a Data Frame in R (With Examples)

In R, the split() function can be used to divide a data frame into subsets according to the values of a given vector or factor. This is useful for creating separate data frames based on the values of a specific column or for randomly sampling subsets of a data frame. Examples are provided to illustrate how to use the split() function to perform these tasks.


You can use one of the following three methods to split a data frame into several smaller data frames in R:

Method 1: Split Data Frame Manually Based on Row Values

#define first n rows to include in first data frame
n <- 4

#split data frame into two smaller data frames
df1 <- df[row.names(df) %in% 1:n, ]
df2 <- df[row.names(df) %in% (n+1):nrow(df), ]

Method 2: Split Data Frame into n Equal-Sized Data Frames

#define number of data frames to split into
n <- 3

#split data frame into n equal-sized data frames
split(df, factor(sort(rank(row.names(df))%%n)))

Method 3: Split Data Frame Based on Column Value

#split data frame based on particular column value
df1 <- df[df$column_name == 0, ]
df2 <- df[df$column_name != 0, ]

The following examples show how to use each method in practice with the following data frame:

#create data frame
df <- data.frame(ID=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12),
                 sales=c(7, 8, 8, 7, 9, 7, 8, 9, 3, 3, 14, 10),
                 leads=c(0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0))

#view data frame
df

   ID sales leads
1   1     7     0
2   2     8     0
3   3     8     1
4   4     7     1
5   5     9     0
6   6     7     1
7   7     8     1
8   8     9     0
9   9     3     1
10 10     3     0
11 11    14     1
12 12    10     0

Method 1: Split Data Frame Manually Based on Row Values

The following code shows how to split a data frame into two smaller data frames where the first one contains rows 1 through 4 and the second contains rows 5 through the last row:

#define row to split on
n <- 4

#split into two data frames
df1 <- df[row.names(df) %in% 1:n, ]
df2 <- df[row.names(df) %in% (n+1):nrow(df), ]

#view resulting data frames
df1

  ID sales leads
1  1     7     0
2  2     8     0
3  3     8     1
4  4     7     1

df2

   ID sales leads
5   5     9     0
6   6     7     1
7   7     8     1
8   8     9     0
9   9     3     1
10 10     3     0
11 11    14     1
12 12    10     0

Method 2: Split Data Frame into n Equal-Sized Data Frames

The following code shows how to split a data frame into n equal-sized data frames:

#define number of data frames to split into
n <- 3

#split data frame into n equal-sized data frames
split(df, factor(sort(rank(row.names(df))%%n)))

$`0`
  ID sales leads
1  1     7     0
2  2     8     0
3  3     8     1
4  4     7     1

$`1`
  ID sales leads
5  5     9     0
6  6     7     1
7  7     8     1
8  8     9     0

$`2`
   ID sales leads
9   9     3     1
10 10     3     0
11 11    14     1
12 12    10     0

The result is three data frames of equal size.

Method 3: Split Data Frame Based on Column Value

#split data frame based on particular column value
df1 <- df[df$leads == 0, ]
df2 <- df[df$leads != 0, ]

#view resulting data frames
df1

   ID sales leads
1   1     7     0
2   2     8     0
5   5     9     0
8   8     9     0
10 10     3     0
12 12    10     0

df2

   ID sales leads
3   3     8     1
4   4     7     1
6   6     7     1
7   7     8     1
9   9     3     1
11 11    14     1

Note that df1 contains all rows where ‘leads’ was equal to zero in the original data frame and df2 contains all rows where ‘leads’ was equal to one in the original data frame.

The following tutorials explain how to perform other common operations in R:

x