How can the duplicated function be used in R, and what are some examples of its application?

The duplicated function in R is used to identify and remove duplicate values from a vector or data frame. It is a useful tool for data cleaning and analysis, as it allows for the identification of recurring values and the creation of more accurate and concise data sets.

One example of its application is in identifying duplicate entries in a large data set, such as a customer database. The duplicated function can be used to quickly identify and remove any duplicate customer records, ensuring that the data is accurate and not skewed by repeated information.

Another example is in market research, where the duplicated function can be used to remove duplicate responses from survey data. This ensures that the data analysis is based on unique and authentic responses, providing more accurate insights.

Overall, the duplicated function in R is a valuable tool for data management and analysis, helping to streamline processes and improve the quality of data.

Use the duplicated Function in R (With Examples)


You can use the duplicated() function in R to identify duplicate rows in a data frame.

The following examples show how to use this function in practice with the following data frame in R:

#create data frame
df <- data.frame(team=c('Mavs', 'Mavs', 'Mavs', 'Nets', 'Nets', 'Kings', 'Hawks'),
                 position=c('G', 'G', 'F', 'F', 'F', 'C', 'G'),
                 points=c(23, 18, 14, 14, 13, 34, 22))

#view data frame
df

   team position points
1  Mavs        G     23
2  Mavs        G     18
3  Mavs        F     14
4  Nets        F     14
5  Nets        F     13
6 Kings        C     34
7 Hawks        G     22

Example 1: Use duplicated() to Find Duplicate Rows in Data Frame

We can use the following code to find all rows with a duplicate value in the team column of the data frame:

#view rows with duplicate values in 'team' column
df[duplicated(df$team), ]

  team position points
2 Mavs        G     18
3 Mavs        F     14
5 Nets        F     13

The output shows the three rows in the data frame that have duplicate values in the team column.

For example, row 1 in the data frame contained Mavs in the team column, which means the occurrence of Mavs in rows 2 and 3 are both duplicates.

Similarly, row 4 in the data frame contained Nets in the team column, which means the occurrence of Nets in row 5 is a duplicate.

Example 2: Use duplicated() to Count Number of Duplicated Rows in Data Frame

We can use the following code to count the number of  rows with a duplicate value in the team column of the data frame:

#count rows with duplicate values in 'team' column
sum(duplicated(df$team))

[1] 3

The output tells us that there are 3 rows with duplicate values in the team column.

Additional Resources

The following tutorials explain how to use other common functions in R:

x