How to Use the separate Function in R?

The separate() function in R is used to split a character string into multiple parts based on a specified separator. It takes two arguments, the string to be split and the separator and returns a vector of substrings split from the original string. It can be used to quickly and easily parse text data into useful components.


The separate() function from the package can be used to separate a data frame column into multiple columns.

This function uses the following basic syntax:

separate(data, col, into, sep)

where:

  • data: Name of the data frame
  • col: Name of the column to separate
  • into: Vector of names for the column to be separated into
  • sep: The value to separate the column at

The following examples show how to use this function in practice.

Example 1: Separate Column into Two Columns

Suppose we have the following data frame in R:

#create data frame
df <- data.frame(player=c('A', 'A', 'B', 'B', 'C', 'C'),
                 year=c(1, 2, 1, 2, 1, 2),
                 stats=c('22-2', '29-3', '18-6', '11-8', '12-5', '19-2'))

#view data frame
df

  player year stats
1      A    1  22-2
2      A    2  29-3
3      B    1  18-6
4      B    2  11-8
5      C    1  12-5
6      C    2  19-2

We can use the separate() function to separate the stats column into two new columns called “points” and “assists” as follows:

library(tidyr)

#separate stats column into points and assists columns
separate(df, col=stats, into=c('points', 'assists'), sep='-')

  player year points assists
1      A    1     22       2
2      A    2     29       3
3      B    1     18       6
4      B    2     11       8
5      C    1     12       5
6      C    2     19       2

Example 2: Separate Column into More Than Two Columns

Suppose we have the following data frame in R:

#create data frame
df2 <- data.frame(player=c('A', 'A', 'B', 'B', 'C', 'C'),
                 year=c(1, 2, 1, 2, 1, 2),
                 stats=c('22/2/3', '29/3/4', '18/6/7', '11/1/2', '12/1/1', '19/2/4'))

#view data frame
df2

  player year   stats
1      A    1  22/2/3
2      A    2  29/3/4
3      B    1  18/6/7
4      B    2  11/1/2
5      C    1  12/1/1
6      C    2  19/2/4

We can use the separate() function to separate the stats column into three separate columns:

library(tidyr)

#separate stats column into three new columns
separate(df, col=stats, into=c('points', 'assists', 'steals'), sep='/')

  player year points assists steals
1      A    1     22       2      3
2      A    2     29       3      4
3      B    1     18       6      7
4      B    2     11       1      2
5      C    1     12       1      1
6      C    2     19       2      4

  • Every column is a variable.
  • Every row is an observation.
  • Every cell is a single value.

The tidyr package uses four core functions to create tidy data:

1. The function.

2. The function.

3. The separate() function.

4. The function.

If you can master these four functions, you will be able to create “tidy” data from any data frame.

x