How to Find Duplicates in MongoDB?

Finding duplicate documents in MongoDB can be done by using the Aggregation Pipeline to group documents based on criteria such as values of one or more fields and then counting the number of documents in each group. If a group has more than one document, then at least one of the documents is considered to be a duplicate. Additionally, the find() method can be used to query for duplicate documents with the same values for specified field(s).


You can use the following syntax to find documents with duplicate values in MongoDB:

db.collection.aggregate([
    {"$group" : { "_id": "$field1", "count": { "$sum": 1 } } },
    {"$match": {"_id" :{ "$ne" : null } , "count" : {"$gt": 1} } }, 
    {"$project": {"name" : "$_id", "_id" : 0} }
])

Here’s what this syntax does:

  • Group all documents having the same value in field1
  • Match the groups that have more than one document
  • Project all groups that have more than one document

This particular query finds duplicate values in the field1 column. Simply change this value to change the field to look in.

The following example shows how to use this syntax with a collection teams with the following documents:

db.teams.insertOne({team: "Mavs", position: "Guard", points: 31})
db.teams.insertOne({team: "Mavs", position: "Guard", points: 22})
db.teams.insertOne({team: "Rockets", position: "Center", points: 19})
db.teams.insertOne({team: "Rockets", position: "Forward", points: 26})
db.teams.insertOne({team: "Cavs", position: "Guard", points: 33})

Example: Find Documents with Duplicate Values

We can use the following code to find all of the duplicate values in the ‘team’ column:

db.teams.aggregate([
    {"$group" : { "_id": "$team", "count": { "$sum": 1 } } },
    {"$match": {"_id" :{ "$ne" : null } , "count" : {"$gt": 1} } }, 
    {"$project": {"name" : "$_id", "_id" : 0} }
])

This query returns the following results:

{ name: 'Rockets' }
{ name: 'Mavs' }

This tells us that the values ‘Rockets’ and ‘Mavs’ occur multiple times in the ‘team’ field.

Note that we can simply change $team to $position to instead search for duplicate values in the ‘position’ field:

db.teams.aggregate([
    {"$group" : { "_id": "$position", "count": { "$sum": 1 } } },
    {"$match": {"_id" :{ "$ne" : null } , "count" : {"$gt": 1} } }, 
    {"$project": {"name" : "$_id", "_id" : 0} }
])

This query returns the following results:

{ name: 'Guard' }

The following tutorials explain how to perform other common operations in MongoDB:

x