Table of Contents
In R, you can extract a string between two specific characters by using the substr() function. This function takes three arguments; the string, the start index, and the end index. The start index is the location of the first character of the substring, and the end index is the location of the last character of the substring. When used together, the function substr() can be used to extract a string between two specific characters.
You can use the following methods to extract a string between specific characters in R:
Method 1: Extract String Between Specific Characters Using Base R
gsub(".*char1 (.+) char2.*", "", my_string)
Method 2: Extract String Between Specific Characters Using stringr
library(stringr) str_match(my_string, "char1\s*(.*?)\s*char2")[,2]
Both of these examples extract the string between the characters char1 and char2 within my_string.
The following examples show how to use each method in practice with the following data frame:
#create data frame
df <- data.frame(team=c('team Mavs pro', 'team Heat pro', 'team Nets pro'),
points=c(114, 135, 119))
#view data frame
df
team points
1 team Mavs pro 114
2 team Heat pro 135
3 team Nets pro 119
Example 1: Extract String Between Specific Characters Using Base R
The following code shows how to extract the string between the characters team and pro for each row in the team column of the data frame:
#create new column that extracts string between team and pro df$team_name <- gsub(".*team (.+) pro.*", "", df$team) #view updated data frame df team points team_name 1 team Mavs pro 114 Mavs 2 team Heat pro 135 Heat 3 team Nets pro 119 Nets
Notice that the new column called team_name contains the string between the characters team and pro for each row in the team column of the data frame.
Related:
Example 2: Extract String Between Specific Characters Using stringr Package
The following code shows how to extract the string between the characters team and pro for each row in the team column of the data frame by using the str_match() function from the stringr package in R:
library(stringr) #create new column that extracts string between team and pro df$team_name <- str_match(df$team, "team\s*(.*?)\s*pro")[,2] #view updated data frame df team points team_name 1 team Mavs pro 114 Mavs 2 team Heat pro 135 Heat 3 team Nets pro 119 Nets
Notice that the new column called team_name contains the string between the characters team and pro for each row in the team column of the data frame.
Note that the str_match() function returns a matrix in which the first column contains the original strings and the second column contains the substring we’re interested in.
Thus, we must use [,2] to extract only the second column from the matrix returned by the str_match() function.
The following tutorials explain how to perform other common tasks in R: