How to Create New Variables in R with mutate() and case_when()

The mutate() and case_when() functions in R allow users to create new variables from existing data. Mutate() creates a new column and populates it with the results of an expression, which can use existing columns to calculate a new value. Case_when() is a convenient way to create a new column based on the values of existing variables. It allows the user to specify cases and the resulting values for each case. By using both functions together, users can easily create new variables from existing data in R.


Often you may want to create a new variable in a data frame in R based on some condition. Fortunately this is easy to do using the mutate() and case_when() functions from the dplyr package.

This tutorial shows several examples of how to use these functions with the following data frame:

#create data frame
df <- data.frame(player = c('a', 'b', 'c', 'd', 'e'),
                 position = c('G', 'F', 'F', 'G', 'G'),
                 points = c(12, 15, 19, 22, 32),
                 rebounds = c(5, 7, 7, 12, 11))

#view data frame
df

  player position points rebounds
1      a        G     12        5
2      b        F     15        7
3      c        F     19        7
4      d        G     22       12
5      e        G     32       11

Example 1: Create New Variable Based on One Existing Variable

The following code shows how to create a new variable called ‘scorer’ based on the value in the points column:

library(dplyr)

#define new variable 'scorer' using mutate() and case_when()
df %>%
  mutate(scorer = case_when(points < 15 ~ 'low',
                           points < 25 ~ 'med',
                           points < 35 ~ 'high'))

  player position points rebounds scorer
1      a        G     12        5    low
2      b        F     15        7    med
3      c        F     19        7    med
4      d        G     22       12    med
5      e        G     32       11   high

Example 2: Create New Variable Based on Several Existing Variables

The following code shows how to create a new variable called ‘type’ based on the value in the player and position column:

library(dplyr)

#define new variable 'type' using mutate() and case_when()
df %>%
  mutate(type = case_when(player == 'a' | player == 'b' ~ 'starter',
                            player == 'c' | player == 'd' ~ 'backup',
                            position == 'G' ~ 'reserve'))

  player position points rebounds    type
1      a        G     12        5 starter
2      b        F     15        7 starter
3      c        F     19        7  backup
4      d        G     22       12  backup
5      e        G     32       11 reserve

The following code shows how to create a new variable called ‘valueAdded’ based on the value in the points and rebounds columns:

library(dplyr)

#define new variable 'valueAdded' using mutate() and case_when()
df %>%
  mutate(valueAdded = case_when(points <= 15 & rebounds <=5 ~ 2,
                                points <=15 & rebounds > 5 ~ 4,
                                points < 25 & rebounds < 8 ~ 6,
                                points < 25 & rebounds > 8 ~ 7,
                                points >=25 ~ 9))

  player position points rebounds valueAdded
1      a        G     12        5          2
2      b        F     15        7          4
3      c        F     19        7          6
4      d        G     22       12          7
5      e        G     32       11          9

How to Rename Columns in R
How to Remove Columns in R
How to Filter Rows in R

x