Table of Contents
In R, you can use the separate() function to separate a single character column into multiple columns. This is useful when you want to split strings of data based on a single character delimiter such as a space, a comma, or a tab. This function is also helpful when you want to split a column of text into multiple columns. You can specify the character you want to separate the data on and the number of columns to create. You can also use this function to split strings of data that contain more than one character. For example, you can use the separate() function to separate a column of data that contains both hyphens and spaces. This tutorial provides examples of how to use the separate() function in R.
You can use one of the following two methods to split one column into multiple columns in R:
Method 1: Use str_split_fixed()
library(stringr) df[c('col1', 'col2')] <- str_split_fixed(df$original_column, 'sep', 2)
Method 2: Use separate()
library(dplyr) library(tidyr) df %>% separate(original_column, c('col1', 'col2'))
The following examples show how to use each method in practice.
Method 1: Use str_split_fixed()
Suppose we have the following data frame:
#create data frame
df <- data.frame(player=c('John_Wall', 'Dirk_Nowitzki', 'Steve_Nash'),
points=c(22, 29, 18),
assists=c(8, 4, 15))
#view data frame
df
player points assists
1 John_Wall 22 8
2 Dirk_Nowitzki 29 4
3 Steve_Nash 18 15
We can use the str_split_fixed() function from the stringr package to separate the ‘player’ column into two new columns called ‘First’ and ‘Last’ as follows:
library(stringr)
#split 'player' column using '_' as the separator
df[c('First', 'Last')] <- str_split_fixed(df$player, '_', 2)
#view updated data frame
df
player points assists First Last
1 John_Wall 22 8 John Wall
2 Dirk_Nowitzki 29 4 Dirk Nowitzki
3 Steve_Nash 18 15 Steve Nash
Notice that two new columns are added at the end of the data frame.
Feel free to rearrange the columns and drop the original ‘player’ columns if you’d like:
#rearrange columns and leave out original 'player' column
df_final <- df[c('First', 'Last', 'points', 'assists')]
#view updated data frame
df_final
First Last points assists
1 John Wall 22 8
2 Dirk Nowitzki 29 4
3 Steve Nash 18 15
Method 2: Use separate()
The following code shows how to use the separate() function from the tidyr package to separate the ‘player’ column into ‘first’ and ‘last’ columns:
library(dplyr)
library(tidyr)
#create data frame
df <- data.frame(player=c('John_Wall', 'Dirk_Nowitzki', 'Steve_Nash'),
points=c(22, 29, 18),
assists=c(8, 4, 15))
#separate 'player' column into 'First' and 'Last'
df %>% separate(player, c('First', 'Last'))
First Last points assists
1 John Wall 22 8
2 Dirk Nowitzki 29 4
3 Steve Nash 18 15
For example, if the first and last names were separated by a comma, the separate() function would automatically split based on the location of the comma:
library(dplyr)
library(tidyr)
#create data frame
df <- data.frame(player=c('John,Wall', 'Dirk,Nowitzki', 'Steve,Nash'),
points=c(22, 29, 18),
assists=c(8, 4, 15))
#separate 'player' column into 'First' and 'Last'
df %>% separate(player, c('First', 'Last'))
First Last points assists
1 John Wall 22 8
2 Dirk Nowitzki 29 4
3 Steve Nash 18 15
You can find the complete online documentation for the separate() function .
The following tutorials explain how to perform other common operations in R: