How to Remove Duplicate Rows in R (With Examples)

In R, there is a built-in function called duplicated() which can be used to identify and remove duplicate rows from a data frame. This function takes a data frame as input and returns a logical vector which can be used to subset the data frame. Several examples of how to use this function are provided so users can understand the syntax and apply it to their own data.


You can use one of the following two methods to remove duplicate rows from a data frame in R:

Method 1: Use Base R

#remove duplicate rows across entire data frame
df[!duplicated(df), ]

#remove duplicate rows across specific columns of data frame
df[!duplicated(df[c('var1')]), ]

Method 2: Use dplyr

#remove duplicate rows across entire data frame 
df %>%
  distinct(.keep_all = TRUE)

#remove duplicate rows across specific columns of data frame
df %>%
  distinct(var1, .keep_all = TRUE)

The following examples show how to use this syntax in practice with the following data frame:

#define data frame
df <- data.frame(team=c('A', 'A', 'A', 'B', 'B', 'B'),
                 position=c('Guard', 'Guard', 'Forward', 'Guard', 'Center', 'Center'))

#view data frame
df

  team position
1    A    Guard
2    A    Guard
3    A  Forward
4    B    Guard
5    B   Center
6    B   Center

Example 1: Remove Duplicate Rows Using Base R

The following code shows how to remove duplicate rows from a data frame using functions from base R:

#remove duplicate rows from data frame
df[!duplicated(df), ]

  team position
1    A    Guard
3    A  Forward
4    B    Guard
5    B   Center

The following code shows how to remove duplicate rows from specific columns of a data frame using base R:

#remove rows where there are duplicates in the 'team' column
df[!duplicated(df[c('team')]), ]

  team position
1    A    Guard
4    B    Guard

Example 2: Remove Duplicate Rows Using dplyr

The following code shows how to remove duplicate rows from a data frame using the distinct() function from the package:

library(dplyr)

#remove duplicate rows from data frame
df %>%
  distinct(.keep_all = TRUE)

  team position
1    A    Guard
2    A  Forward
3    B    Guard
4    B   Center

Note that the .keep_all argument tells R to keep all of the columns from the original data frame.

The following code shows how to use the distinct() function to remove duplicate rows from specific columns of a data frame:

library(dplyr)

#remove duplicate rows from data frame
df %>%
  distinct(team, .keep_all = TRUE)

  team position
1    A    Guard
2    B    Guard

The following tutorials explain how to perform other common functions in R:

x