How can I use the `filter()` function in `dplyr` to select only the rows where a specific column starts with a certain character or string?

The `filter()` function in `dplyr` is a useful tool for selecting specific rows of data based on certain criteria. One way to use this function is to filter for rows where a specific column starts with a certain character or string. This can be achieved by using the `starts_with()` function within the `filter()` function. This allows for a more precise selection of data, making it easier to work with and analyze. By using the `filter()` function in this way, one can efficiently manipulate and extract relevant data from a dataset.

dplyr: Use a “starts with” Filter

You can use the following basic syntax in to filter for rows where a column starts with a certain pattern:


df %>% 
  filter(str_detect(position, "^back"))

This particular example filters the data frame named df to only show the rows where the position column starts with the string “back.”

Note: In regex, the ^ symbol indicates the beginning of a string. 

The following example shows how to use this syntax in practice.

Example: How to Use “starts with” Filter in dplyr

Suppose we have the following data frame in R that contains information about various basketball players:

#create data frame
df <- data.frame(player=c('A', 'B', 'C', 'D', 'E', 'F'),
                 position=c('starting_guard', 'starting_center', 'backup_guard',
                            'backup_center', 'starting_forward', 'backup_forward'))

#view data frame

  player         position
1      A   starting_guard
2      B  starting_center
3      C     backup_guard
4      D    backup_center
5      E starting_forward
6      F   backup_forward

Suppose that we would like to filter the data frame to only show rows where the string in the position column starts with “back.”

We can use the following syntax to do so:


#filter data frame to only contain rows where position column starts with "back"
df %>% 
  filter(str_detect(position, "^back"))

  player       position
1      C   backup_guard
2      D  backup_center
3      F backup_forward

We can see that the resulting data frame only contains rows where the string in the position column starts with “back.”

Note that we could also filter for rows that start with a single specific character.

For example, we could use the following syntax to filter for rows where the string in the position column starts with the letter s:


#filter data frame to only contain rows where position column starts with "s"
df %>% 
  filter(str_detect(position, "^s"))

  player         position
1      A   starting_guard
2      B  starting_center
3      E starting_forward

We can see that the resulting data frame only contains rows where the string in the position column starts with the letter s.


Additional Resources

The following tutorials explain how to perform other common functions in dplyr:
