Table of Contents
A left join in R is used to combine two data frames based on values in the left frame, with any unmatched values filled in with NA. It is accomplished using the merge() function and is specified with the all.x=TRUE argument. This allows the rows from the left frame to be kept even if there are no matching values in the right frame. Examples of how to use the left join in R are included in the article.
You can use the merge() function to perform a left join in base R:
#left join using base R merge(df1,df2, all.x=TRUE)
You can also use the left_join() function from the package to perform a left join:
#left join using dplyr
dplyr::left_join(df2, df1)
Note: If you’re working with extremely large datasets, the left_join() function will tend to be faster than the merge() function.
The following examples show how to use each of these functions in practice with the following data frames:
#define first data frame df1 <- data.frame(team=c('Mavs', 'Hawks', 'Spurs', 'Nets'), points=c(99, 93, 96, 104)) df1 team points 1 Mavs 99 2 Hawks 93 3 Spurs 96 4 Nets 104 #define second data frame df2 <- data.frame(team=c('Mavs', 'Hawks', 'Spurs', 'Nets'), rebounds=c(25, 32, 38, 30), assists=c(19, 18, 22, 25)) df2 team rebounds assists 1 Mavs 25 19 2 Hawks 32 18 3 Spurs 38 22 4 Nets 30 25
Example 1: Left Join Using Base R
We can use the merge() function in base R to perform a left join, using the ‘team’ column as the column to join on:
#perform left join using base R merge(df1, df2, by='team', all.x=TRUE) team points rebounds assists 1 Hawks 93 32 18 2 Mavs 99 25 19 3 Nets 104 30 25 4 Spurs 96 38 22
Example 2: Left Join Using dplyr
We can use the left_join() function from the dplyr package to perform a left join, using the ‘team’ column as the column to join on:
library(dplyr) #perform left join using dplyr left_join(df1, df2, by='team') team points rebounds assists 1 Mavs 99 25 19 2 Hawks 93 32 18 3 Spurs 96 38 22 4 Nets 104 30 25
One difference you’ll notice between these two functions is that the merge() function automatically sorts the rows alphabetically based on the column you used to perform the join.
Conversely, the left_join() function preserves the original order of the rows from the first data frame.
The following tutorials explain how to perform other common operations in R: