I want to use strsplit() to split a string into a list using multiple delimiters in R

strsplit() is a useful function in R that can be used to split a string into a list of individual elements using multiple delimiters. It takes a character vector as an argument and splits it according to the specified delimiters. The result is a list of character vectors containing the individual elements. This function is especially useful when dealing with text data that contains multiple delimiters.


You can use the following basic syntax with the strsplit() function in R to split a string into pieces based on multiple delimiters:

strsplit(my_string , '[,& ]+')

This particular example splits the string called my_string whenever it encounters one of the following three delimiters:

  • A comma ( , )
  • An ampersand (&)
  • A space

Note that the characters inside the brackets indicate which delimiters to look for and the + sign indicates that there may be multiple delimiters in a row (e.g. there may be multiple spaces in a row).

The following example shows how to use this syntax in practice.

Example: Use strsplit() with Multiple Delimiters in R

Suppose we have the following string in R:

#create string
my_string <- 'this is a, string & with   seven words'

If we use the strsplit() function to split the string whenever a space is encountered, this will produce the following output:

#split string based on spaces
strsplit(my_string , ' ')

[[1]]
 [1] "this"   "is"     "a,"     "string" "&"      "with"   ""       ""      
 [9] "seven"  "words"

The strsplit() function splits the string whenever a space is encountered, but it is unable to handle commas, the ampersand, and multiple spaces.

To split the string based on each of these delimiters, we can use the following syntax:

#split string based on multiple delimiters
strsplit(my_string , '[,& ]+')

[[1]]
[1] "this"   "is"     "a"      "string" "with"   "seven"  "words" 

This function is able to split the string based on three different delimiters and correctly returns only the words in the string that we’re interested in.

Note that in this example we included three delimiters within the brackets in the strsplit() function but you can specify as many delimiters as you’d like.

x