How do I extract a string after a specific character in R?

In R, you can extract a string after a specific character using the function strsplit() in combination with the argument “split =” followed by the character you want to use as a delimiter. For example, strsplit(“Hello World!”, split = ” “) will return a list with two elements “Hello” and “World!”, which were extracted after the space character.


You can use the following methods to extract a string after a specific character in R:

Method 1: Extract String After Specific Characters Using Base R

sub('.*the', '', my_string)

Method 2: Extract String After Specific Characters Using stringr

library(stringr)

str_replace(my_string, '(.*?)the(.*?)', '')

Both of these examples extract the string after the pattern “the” within my_string.

The following examples show how to use each method in practice with the following data frame:

#create data frame
df <- data.frame(team=c('theMavs', 'theHeat', 'theNets', 'theRockets'),
                 points=c(114, 135, 119, 140))

#view data frame
df

        team points
1    theMavs    114
2    theHeat    135
3    theNets    119
4 theRockets    140

Example 1: Extract String After Specific Characters Using Base R

The following code shows how to extract the string after “the” for each row in the team column of the data frame:

#create new column that extracts string after "the" in team column
df$team_name <- sub('.*the', '', df$team)

#view updated data frame
df

        team points team_name
1    theMavs    114      Mavs
2    theHeat    135      Heat
3    theNets    119      Nets
4 theRockets    140   Rockets

Notice that the new column called team_name contains the string after “the” for each row in the team column of the data frame.

Related:

Example 2: Extract String After Specific Characters Using stringr Package

The following code shows how to extract the string after “the” for each row in the team column of the data frame by using the str_replace() function from the stringr package in R:

library(stringr)

#create new column that extracts string after "the" in team column
df$team_name <- str_replace(df$team, '(.*?)the(.*?)', '')

#view updated data frame
df

           team points team_name
1 team Mavs pro    114      Mavs
2 team Heat pro    135      Heat
3 team Nets pro    119      Nets

Notice that the new column called team_name contains the string after “the” for each row in the team column of the data frame.

The following tutorials explain how to perform other common tasks in R:

x