How to Do a Left Join in R (With Examples)

A left join in R is used to combine two data frames based on values in the left frame, with any unmatched values filled in with NA. It is accomplished using the merge() function and is specified with the all.x=TRUE argument. This allows the rows from the left frame to be kept even if there are no matching values in the right frame. Examples of how to use the left join in R are included in the article.


You can use the merge() function to perform a left join in base R:

#left join using base R
merge(df1,df2, all.x=TRUE)

You can also use the left_join() function from the package to perform a left join:

#left join using dplyr
dplyr::left_join(df2, df1)

Note: If you’re working with extremely large datasets, the left_join() function will tend to be faster than the merge() function.

The following examples show how to use each of these functions in practice with the following data frames:

#define first data frame
df1 <- data.frame(team=c('Mavs', 'Hawks', 'Spurs', 'Nets'),
                  points=c(99, 93, 96, 104))

df1

   team points
1  Mavs     99
2 Hawks     93
3 Spurs     96
4  Nets    104

#define second data frame
df2 <- data.frame(team=c('Mavs', 'Hawks', 'Spurs', 'Nets'),
                  rebounds=c(25, 32, 38, 30),
                  assists=c(19, 18, 22, 25))

df2

   team rebounds assists
1  Mavs       25      19
2 Hawks       32      18
3 Spurs       38      22
4  Nets       30      25

Example 1: Left Join Using Base R

We can use the merge() function in base R to perform a left join, using the ‘team’ column as the column to join on:

#perform left join using base R
merge(df1, df2, by='team', all.x=TRUE)

   team points rebounds assists
1 Hawks     93       32      18
2  Mavs     99       25      19
3  Nets    104       30      25
4 Spurs     96       38      22

Example 2: Left Join Using dplyr

We can use the left_join() function from the dplyr package to perform a left join, using the ‘team’ column as the column to join on:

library(dplyr)

#perform left join using dplyr 
left_join(df1, df2, by='team')

   team points rebounds assists
1  Mavs     99       25      19
2 Hawks     93       32      18
3 Spurs     96       38      22
4  Nets    104       30      25

One difference you’ll notice between these two functions is that the merge() function automatically sorts the rows alphabetically based on the column you used to perform the join.

Conversely, the left_join() function preserves the original order of the rows from the first data frame.

The following tutorials explain how to perform other common operations in R:

x