Use %in% to Filter for Rows with Value in List

The %in% operator in R can be used to filter a data frame for rows containing a given value in a list. This operator will check to see if a value exists in a given vector and will return TRUE/FALSE values. This can be used in combination with the subset function to create a subset of a data frame that contains a specific value. This is a useful way to quickly filter and extract information from a data frame.


You can use the following basic syntax with the %in% operator in R to filter for rows that contain a value in a list:

library(dplyr)

#specify team names to keep
team_names <- c('Mavs', 'Pacers', 'Nets')

#select all rows where team is in list of team names to keep
df_new <- df %>% filter(team %in% team_names)

This particular syntax filters a data frame to only keep the rows where the value in the team column is equal to one of the three values in the team_names vector that we specified.

The following example shows how to use this syntax in practice.

Example: Using %in% to Filter for Rows with Value in List

Suppose we have the following data frame in R that contains information about various basketball teams:

#create data frame
df <- data.frame(team=c('Mavs', 'Pacers', 'Mavs', 'Celtics', 'Nets', 'Pacers'),
                 points=c(104, 110, 134, 125, 114, 124),
                 assists=c(22, 30, 35, 35, 20, 27))

#view data frame
df

     team points assists
1    Mavs    104      22
2  Pacers    110      30
3    Mavs    134      35
4 Celtics    125      35
5    Nets    114      20
6  Pacers    124      27

Suppose we would like to filter the data frame to only contain rows where the value in the team column is equal to one of the following team names:

  • Mavs
  • Pacers
  • Nets

We can use the following syntax with the %in% operator to do so:

library(dplyr)

#specify team names to keep
team_names <- c('Mavs', 'Pacers', 'Nets')

#select all rows where team is in list of team names to keep
df_new <- df %>% filter(team %in% team_names)

#view updated data frame
df_new

    team points assists
1   Mavs    104      22
2 Pacers    110      30
3   Mavs    134      35
4   Nets    114      20
5 Pacers    124      27

Notice that only the rows with a value of Mavs, Pacers or Nets in the team column are kept.

If you would like to filter for rows where the team name is not in a list of team names, simply add an exclamation point (!) in front of the column name:

library(dplyr)

#specify team names to not keep
team_names <- c('Mavs', 'Pacers', 'Nets')

#select all rows where team is not in list of team names to keep
df_new <- df %>% filter(!team %in% team_names)

#view updated data frame
df_new

     team points assists
1 Celtics    125      35

Notice that only the rows with a value not equal to Mavs, Pacers or Nets in the team column are kept.

Note: You can find the complete documentation for the filter function in dplyr .

x