Table of Contents
Extracting a substring starting from the end of a string is a way to select a portion of the string based on the index starting from the last character. This can be useful when dealing with certain data formats that require certain characters to be in the last few positions of the string. For example, the last three characters of a U.S. zip code could be extracted and stored as its own string.
You can use the following methods to extract a substring in R starting from the end of the string:
Method 1: Use Base R
#define function to extract n characters starting from end substr_end <- function(x, n){ substr(x, nchar(x)-n+1, nchar(x)) } #extract 3 characters starting from end substr_end(my_string, 3)
Method 2: Use stringr Package
library(stringr) #extract 3 characters starting from end str_sub(my_string, start = -3)
Both of these examples extract the last three characters from the string called my_string.
The following examples show how to use each method in practice with the following data frame:
#create data frame
df <- data.frame(team=c('Mavericks', 'Lakers', 'Hawks', 'Nets', 'Warriors'),
points=c(100, 143, 129, 113, 123))
#view data frame
df
team points
1 Mavericks 100
2 Lakers 143
3 Hawks 129
4 Nets 113
5 Warriors 123
Example 1: Extract Substring Starting from End Using Base R
The following code shows how to define a custom function in base R and then use the function to extract the last three characters from each string in the team column:
#define function to extract n characters starting from end substr_end <- function(x, n){ substr(x, nchar(x)-n+1, nchar(x)) } #create new column that extracts last 3 characters from team column df$team_last3 <- substr_end(my_string, 3) #view updated data frame df team points team_last3 1 Mavericks 100 cks 2 Lakers 143 ers 3 Hawks 129 wks 4 Nets 113 ets 5 Warriors 123 ors
Notice that the new column called team_last3 contains the last three characters of each string in the team column of the data frame.
Example 2: Extract Substring Starting from End Using stringr Package
The following code shows how to use the str_sub() function from the stringr package in R to extract the last three characters from each string in the team column:
library(stringr) #create new column that extracts last 3 characters from team column df$team_last3 <- str_sub(df$team, start = -3) #view updated data frame df team points team_last3 1 Mavericks 100 cks 2 Lakers 143 ers 3 Hawks 129 wks 4 Nets 113 ets 5 Warriors 123 ors
Notice that the new column called team_last3 contains the last three characters of each string in the team column of the data frame.
This matches the results from the previous method using base R.
The following tutorials explain how to perform other common tasks in R: