Table of Contents
The dplyr package is a useful tool in R for manipulating data frames. One of its functions allows the user to create a new variable based on a condition, specifically if a certain column contains a certain string. This can be achieved by using the “mutate” function and specifying the condition in the form of a logical statement. The resulting new variable will be added to the data frame, making it easy to perform further analysis and visualization. Overall, the dplyr package offers a simple and efficient way to incorporate conditional logic into data manipulation tasks in R.
dplyr: Mutate Variable if Column Contains String
You can use the following basic syntax in to mutate a variable if a column contains a particular string:
library(dplyr) df %>% mutate_at(vars(contains('starter')), ~ (scale(.) %>% as.vector))
This particular syntax applies the scale() function to each variable in the data frame that contains the string ‘starter’ in the column name.
The following example shows how to use this syntax in practice.
Example: Mutate Variable if Column Contains String
Suppose we have the following data frame in R:
#create data frame df <- data.frame(team=c('A', 'B', 'C', 'D', 'E', 'F'), starter_points=c(22, 26, 25, 13, 15, 22), starter_assists=c(4, 5, 10, 14, 12, 10), bench_points=c(7, 7, 9, 14, 13, 10), bench_assists=c(2, 5, 5, 4, 9, 14)) #view data frame df team starter_points starter_assists bench_points bench_assists 1 A 22 4 7 2 2 B 26 5 7 5 3 C 25 10 9 5 4 D 13 14 14 4 5 E 15 12 13 9 6 F 22 10 10 14
We can use the following syntax to apply the scale() function to each variable in the data frame that contains the string ‘starter’ in the column name.
library(dplyr) #apply scale() function to each variable that contains 'starter' in the name df %>% mutate_at(vars(contains('starter')), ~ (scale(.) %>% as.vector)) team starter_points starter_assists bench_points bench_assists 1 A 0.2819668 -1.3180158 7 2 2 B 1.0338784 -1.0629159 7 5 3 C 0.8459005 0.2125832 9 5 4 D -1.4098342 1.2329825 14 4 5 E -1.0338784 0.7227828 13 9 6 F 0.2819668 0.2125832 10 14
Using this syntax, we were able to apply the scale() function to scale each column that contained ‘starter’ such that their values now have a mean of 0 and standard deviation of 1.
Notice that the following columns were modified:
- starter_points
- starter_assists
All other columns remained unchanged.
Also note we can apply any function we’d like using this syntax.
In the previous example, we chose to scale each column with the string ‘starter’ in the name.
However, we could do something simpler such as multiply the values by two for each column with ‘starter’ in the name:
library(dplyr) #multiply values by two for each variable that contains 'starter' in the name df %>% mutate_at(vars(contains('starter')), ~ (. * 2)) team starter_points starter_assists bench_points bench_assists 1 A 44 8 7 2 2 B 52 10 7 5 3 C 50 20 9 5 4 D 26 28 14 4 5 E 30 24 13 9 6 F 44 20 10 14
The following tutorials explain how to perform other common tasks in dplyr:
Cite this article
stats writer (2024). How can I use the dplyr package in R to create a new variable that is based on a condition, specifically if a certain column contains a certain string?. PSYCHOLOGICAL SCALES. Retrieved from https://scales.arabpsychology.com/stats/how-can-i-use-the-dplyr-package-in-r-to-create-a-new-variable-that-is-based-on-a-condition-specifically-if-a-certain-column-contains-a-certain-string/
stats writer. "How can I use the dplyr package in R to create a new variable that is based on a condition, specifically if a certain column contains a certain string?." PSYCHOLOGICAL SCALES, 26 Jun. 2024, https://scales.arabpsychology.com/stats/how-can-i-use-the-dplyr-package-in-r-to-create-a-new-variable-that-is-based-on-a-condition-specifically-if-a-certain-column-contains-a-certain-string/.
stats writer. "How can I use the dplyr package in R to create a new variable that is based on a condition, specifically if a certain column contains a certain string?." PSYCHOLOGICAL SCALES, 2024. https://scales.arabpsychology.com/stats/how-can-i-use-the-dplyr-package-in-r-to-create-a-new-variable-that-is-based-on-a-condition-specifically-if-a-certain-column-contains-a-certain-string/.
stats writer (2024) 'How can I use the dplyr package in R to create a new variable that is based on a condition, specifically if a certain column contains a certain string?', PSYCHOLOGICAL SCALES. Available at: https://scales.arabpsychology.com/stats/how-can-i-use-the-dplyr-package-in-r-to-create-a-new-variable-that-is-based-on-a-condition-specifically-if-a-certain-column-contains-a-certain-string/.
[1] stats writer, "How can I use the dplyr package in R to create a new variable that is based on a condition, specifically if a certain column contains a certain string?," PSYCHOLOGICAL SCALES, vol. X, no. Y, ص Z-Z, June, 2024.
stats writer. How can I use the dplyr package in R to create a new variable that is based on a condition, specifically if a certain column contains a certain string?. PSYCHOLOGICAL SCALES. 2024;vol(issue):pages.
