Table of Contents

The duplicated function in R is used to identify and remove duplicate values in a data set. It returns a logical vector indicating which elements of a vector or data frame are duplicates of previous elements. This function is useful in many scenarios, such as checking for duplicate entries in a data frame before performing analyses, identifying duplicated records in a database, or removing duplicate values from a vector. For example, if we have a data frame containing customer information, we can use the duplicated function to check for duplicate entries by their names or IDs and remove them to ensure accurate analysis. Similarly, in a vector of stock prices, we can use the duplicated function to identify and remove any duplicate values before performing statistical analysis on the data.

You can use the duplicated() function in R to identify duplicate rows in a data frame.

The following examples show how to use this function in practice with the following data frame in R:

#create data frame
df <- data.frame(team=c('Mavs', 'Mavs', 'Mavs', 'Nets', 'Nets', 'Kings', 'Hawks'),
                 position=c('G', 'G', 'F', 'F', 'F', 'C', 'G'),
                 points=c(23, 18, 14, 14, 13, 34, 22))

#view data frame
df

   team position points
1  Mavs        G     23
2  Mavs        G     18
3  Mavs        F     14
4  Nets        F     14
5  Nets        F     13
6 Kings        C     34
7 Hawks        G     22

Example 1: Use duplicated() to Find Duplicate Rows in Data Frame

We can use the following code to find all rows with a duplicate value in the team column of the data frame:

#view rows with duplicate values in 'team' column
df[duplicated(df$team), ]

  team position points
2 Mavs        G     18
3 Mavs        F     14
5 Nets        F     13

The output shows the three rows in the data frame that have duplicate values in the team column.

For example, row 1 in the data frame contained Mavs in the team column, which means the occurrence of Mavs in rows 2 and 3 are both duplicates.

Similarly, row 4 in the data frame contained Nets in the team column, which means the occurrence of Nets in row 5 is a duplicate.

Example 2: Use duplicated() to Count Number of Duplicated Rows in Data Frame

We can use the following code to count the number of rows with a duplicate value in the team column of the data frame:

#count rows with duplicate values in 'team' column
sum(duplicated(df$team))

[1] 3

The output tells us that there are 3 rows with duplicate values in the team column.

Additional Resources

The following tutorials explain how to use other common functions in R:

What are some examples of using the duplicated function in R?

Example 1: Use duplicated() to Find Duplicate Rows in Data Frame

Example 2: Use duplicated() to Count Number of Duplicated Rows in Data Frame

Additional Resources

Requst a

Scale

Example 1: Use duplicated() to Find Duplicate Rows in Data Frame

Example 2: Use duplicated() to Count Number of Duplicated Rows in Data Frame

Additional Resources

Related terms:

Requst a

Scale