How to use a conditional filter in dplyr

A conditional filter in dplyr is used to filter data based on certain conditions. It is accomplished by using the filter() function within the dplyr package. This function takes in the dataframe as the first argument, followed by a logical statement that specifies the conditions for the filtering. The statement is composed of a column name, operator and value that defines the filter criterion. Once the filter is applied, the resulting dataframe will only contain rows that satisfy the conditions set in the filter statement.


You can use the following basic syntax to apply a conditional filter on a data frame using functions from the dplyr package in R:

library(dplyr)

#filter data frame where points is greater than some value (based on team)
df %>% 
  filter(case_when(team=='A' ~ points > 15,
                   team=='B' ~ points > 20,
                   TRUE ~ points > 30))

This particular example filters the rows in a data frame where the value in the points column is greater than a certain value, conditional on the value in the team column.

Related:

The following example shows how to use this syntax in practice.

Example: How to Use Conditional Filter in dplyr

Suppose we have the following data frame in R that contains information about various basketball players:

#create data frame
df <- data.frame(team=c('A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'),
                 points=c(10, 12, 17, 18, 24, 29, 29, 34, 35))

#view data frame
df

  team points
1    A     10
2    A     12
3    A     17
4    B     18
5    B     24
6    B     29
7    C     29
8    C     34
9    C     35

Now suppose we would like to apply the following conditional filter:

  • Only keep rows for players on team A where points is greater than 15
  • Only keep rows for players on team B where points is greater than 20
  • Only keep rows for players on team C where points is greater than 30

We can use the filter() and case_when() functions from the dplyr package to apply this conditional filter on the data frame:

library(dplyr)

#filter data frame where points is greater than some value (based on team)
df %>% 
  filter(case_when(team=='A' ~ points > 15,
                   team=='B' ~ points > 20,
                   TRUE ~ points > 30))

  team points
1    A     17
2    B     24
3    B     29
4    C     34
5    C     35

The rows in the data frame are now filtered where the value in the points column is greater than a certain value, conditional on the value in the team column.

Note #1: In the case_when() function, we use TRUE in the last argument to represent any values in the team column that are not equal to ‘A’ or ‘B’.

Note #2: You can find the complete documentation for the dplyr case_when() function .

The following tutorials explain how to perform other common functions in dplyr:

x