Table of Contents
The %in% operator in R can be used to filter a data frame for rows containing a given value in a list. This operator will check to see if a value exists in a given vector and will return TRUE/FALSE values. This can be used in combination with the subset function to create a subset of a data frame that contains a specific value. This is a useful way to quickly filter and extract information from a data frame.
You can use the following basic syntax with the %in% operator in R to filter for rows that contain a value in a list:
library(dplyr) #specify team names to keep team_names <- c('Mavs', 'Pacers', 'Nets') #select all rows where team is in list of team names to keep df_new <- df %>% filter(team %in% team_names)
This particular syntax filters a data frame to only keep the rows where the value in the team column is equal to one of the three values in the team_names vector that we specified.
The following example shows how to use this syntax in practice.
Example: Using %in% to Filter for Rows with Value in List
Suppose we have the following data frame in R that contains information about various basketball teams:
#create data frame
df <- data.frame(team=c('Mavs', 'Pacers', 'Mavs', 'Celtics', 'Nets', 'Pacers'),
points=c(104, 110, 134, 125, 114, 124),
assists=c(22, 30, 35, 35, 20, 27))
#view data frame
df
team points assists
1 Mavs 104 22
2 Pacers 110 30
3 Mavs 134 35
4 Celtics 125 35
5 Nets 114 20
6 Pacers 124 27
Suppose we would like to filter the data frame to only contain rows where the value in the team column is equal to one of the following team names:
- Mavs
- Pacers
- Nets
We can use the following syntax with the %in% operator to do so:
library(dplyr) #specify team names to keep team_names <- c('Mavs', 'Pacers', 'Nets') #select all rows where team is in list of team names to keep df_new <- df %>% filter(team %in% team_names) #view updated data frame df_new team points assists 1 Mavs 104 22 2 Pacers 110 30 3 Mavs 134 35 4 Nets 114 20 5 Pacers 124 27
Notice that only the rows with a value of Mavs, Pacers or Nets in the team column are kept.
If you would like to filter for rows where the team name is not in a list of team names, simply add an exclamation point (!) in front of the column name:
library(dplyr) #specify team names to not keep team_names <- c('Mavs', 'Pacers', 'Nets') #select all rows where team is not in list of team names to keep df_new <- df %>% filter(!team %in% team_names) #view updated data frame df_new team points assists 1 Celtics 125 35
Notice that only the rows with a value not equal to Mavs, Pacers or Nets in the team column are kept.
Note: You can find the complete documentation for the filter function in dplyr .