Table of Contents
Grepl is a powerful function in R that allows you to search through character strings for the presence of specified patterns. It can be used with multiple patterns, such as matching multiple words in a text, by passing a vector of patterns to the pattern argument instead of a single pattern. This makes it easy to search through strings for multiple patterns at once, saving time and effort.
You can use the following basic syntax with the grepl() function in R to filter for rows in a data frame that contain one of several string patterns in a specific column:
library(dplyr) new_df <- filter(df, grepl(paste(my_patterns, collapse='|'), my_column))
This particular syntax filters the data frame for rows where the value in the column called my_column contains one of the string patterns in the vector called my_patterns.
The following example shows how to use this syntax in practice.
Example: How to Use grepl() with Multiple Patterns in R
Suppose we have the following data frame in R that contains information about various basketball teams:
#create data frame df <- data.frame(team=c('Mavs', 'Hawks', 'Nets', 'Heat', 'Cavs'), points=c(104, 115, 124, 120, 112), status=c('Bad', 'Good', 'Excellent', 'Great', 'Bad')) #view data frame df team points status 1 Mavs 104 Bad 2 Hawks 115 Good 3 Nets 124 Excellent 4 Heat 120 Great 5 Cavs 112 Bad
Suppose we would like to filter the data frame to only contain rows where the string in the status column contains one of the following string patterns:
- ‘Good’
- ‘Gre’
- ‘Ex’
We can use the following syntax with the grepl() function to do so:
library(dplyr) #define patterns to search for my_patterns <- c('Good', 'Gre', 'Ex') #filter for rows where status column contains one of several strings new_df <- filter(df, grepl(paste(my_patterns, collapse='|'), status)) #view results new_df team points status 1 Hawks 115 Good 2 Nets 124 Excellent 3 Heat 120 Great
Notice that the data frame has been filtered to only contain the rows where the string in the status column contains one of the three patterns that we specified.
Note that by using the paste() function with the argument collapse=’|’ we actually searched for the string ‘Good|Gre|Ex’ in the status column.
Since the | symbol in R stands for “OR” we were able to search for rows that contained ‘Good’ or Gre’ or ‘Ex’ in the status column.
The following tutorials explain how to perform other common tasks in R: