How to do Left Join in dplyr with Different Column Names

Using the left_join() function in dplyr allows users to join two datasets together based on different column names. The left_join() function requires two arguments, the first dataset and the second dataset, and the names of the columns that should be used to join the datasets. The left_join() function will apply an inner join to the datasets and will only keep the values that match the columns specified. However, if one of the datasets does not contain a value that is in the other dataset, the left_join() function will keep the value from the first dataset and fill in the empty value with NA. This allows users to combine two datasets with different column names and easily access the data that matches between them.


You can use the following basic syntax in dplyr to perform a left join on two data frames when the columns you’re joining on have different names in each data frame:

library(dplyr)

final_df <- left_join(df_A, df_B, by = c('team' = 'team_name'))

This particular example will perform a left join on the data frames called df_A and df_B, joining on the column in df_A called team and the column in df_B called team_name.

The following example shows how to use this syntax in practice.

Example: Perform Left Join with Different Column Names in dplyr

Suppose we have the following two data frames in R:

#create first data frame
df_A <- data.frame(team=c('A', 'B', 'C', 'D', 'E'),
                   points=c(22, 25, 19, 14, 38))

df_A

  team points
1    A     22
2    B     25
3    C     19
4    D     14
5    E     38

#create second data frame
df_B <- data.frame(team=c('A', 'C', 'D', 'F', 'G'),
                   rebounds=c(14, 8, 8, 6, 9))

df_B

  team_name rebounds
1         A       14
2         C        8
3         D        8
4         F        6
5         G        9

We can use the following syntax in dplyr to perform a left join based on matching values in the team column of df_A and the team_name column of df_B:

library(dplyr)

#perform left join based on different column names in df_A and df_B
final_df <- left_join(df_A, df_B, by = c('team' = 'team_name'))

#view final data frame
final_df

  team points rebounds
1    A     22       14
2    B     25       NA
3    C     19        8
4    D     14        8
5    E     38       NA

The resulting data frame contains all rows from df_A and only the rows in df_B where the team values matched the team_name values.

Note that you can also match on multiple columns with different names by using the following basic syntax:

library(dplyr)

#perform left join based on multiple different column names
final_df <- left_join(df_A, df_B, by = c('A1' = 'B1', 'A2' = 'B2', 'A3' = 'B3'))

Note: You can find the complete documentation for the left_join() function in dplyr .

The following tutorials explain how to perform other common operations in R:

x