How to Subset Data Frame by Factor Levels in R?

In R, you can use the subset() function to subset a data frame by a factor level. The function takes two arguments, the data frame and the factor level. The factor level can be specified by name or by numeric index. The function will return a new data frame containing only those rows that match the specified factor level. This is a useful way to quickly extract and analyze subsets of data from a larger data frame.


You can use one of the following methods to subset a data frame by factor levels in R:

Method 1: Subset by One Factor Level

#subset rows where team is equal to 'B'
df_sub <- df[df$team == 'B', ]

Method 2: Subset by Multiple Factor Levels

#subset rows where team is equal to 'A' or 'C'
df_sub <- df[df$team %in% c('A', 'C'), ]

The following examples show how to use each of these methods in practice with the following data frame in R:

#create data frame
df <- data.frame(team=factor(c('A', 'A', 'B', 'B', 'B', 'C')),
                 points=c(22, 35, 19, 15, 29, 23))

#view data frame
df

  team points
1    A     22
2    A     35
3    B     19
4    B     15
5    B     29
6    C     23

Method 1: Subset by One Factor Level

The following code shows how to create a new data frame that subsets by the rows where the value in the team column is equal to ‘B’:

#subset rows where team is equal to 'B'
df_sub <- df[df$team == 'B', ]

#view updated data frame
df_sub

  team points
3    B     19
4    B     15
5    B     29

Notice that the new data frame only contains rows where the value in the team column is equal to ‘B’.

Example 2: Subset by Multiple Factor Levels

The following code shows how to create a new data frame that subsets by the rows where the value in the team column is equal to ‘A’ or ‘C’:

#subset rows where team is equal to 'A' or 'C'
df_sub <- df[df$team %in% c('A', 'C'), ]

#view updated data frame
df_sub

  team points
1    A     22
2    A     35
6    C     23

Notice that the new data frame only contains rows where the value in the team column is equal to ‘A’ or ‘C’.

Using this syntax, you can include as many factor levels as you’d like in the vector following the %in% operator to subset by even more factor levels.

Related:

The following tutorials explain how to perform other common tasks in R:

x