How to Group Data by Multiple Fields in MongoDB: A Step-by-Step Guide

MongoDB, a leading NoSQL database, provides exceptionally powerful mechanisms for processing and analyzing large volumes of semi-structured data. One of the most essential functionalities for advanced data analysis is the ability to perform complex calculations across datasets, known as data aggregation. When dealing with intricate real-world collections, users frequently encounter scenarios where they need to group documents based on the combined uniqueness of multiple fields, rather than just a single attribute. This capability is absolutely critical for deriving meaningful, context-specific insights—for example, analyzing employee performance categorized simultaneously by office location and job department, or, as we will explore in detail, aggregating sports statistics categorized by both team affiliation and specific player position.

The central component that facilitates this sophisticated, multi-dimensional grouping is the $group operator, which functions seamlessly within the broader Aggregation pipeline framework. This operator fundamentally transforms the incoming stream of documents, efficiently creating a single new output document for every unique combination of field values specified in its grouping key. By carefully defining a composite key structure within the mandatory _id field of the grouping stage, we issue an explicit instruction to MongoDB: treat documents as belonging to the same logical group only if all specified key fields match perfectly. This flexible and robust approach allows for highly granular and accurate analysis across complex, multi-dimensional datasets.

The utility of the $group operator extends far beyond merely separating documents into distinct groups; it also enables the direct application of various powerful aggregation functions to the data within each group. These functions include essential summary statistics such as data aggregation operators like $sum, $avg, $min, $max, and $count. Using these tools, a developer could calculate the average score for all ‘Guards’ specifically on the ‘Mavs’ team, or determine the total sales figures for ‘Q4’ exclusively within the ‘East’ regional territory. Mastering the precise syntax for performing multi-field grouping is an indispensable skill for anyone seeking to unlock the complete analytical potential of MongoDB, allowing for quick, effortless, and precise data analysis across deeply nested and complex structures.

Demystifying the MongoDB Aggregation Pipeline

To fully appreciate the power of multi-field grouping, it is essential to understand the context in which the $group operator operates: the Aggregation pipeline. The pipeline processes data records through a series of stages, each of which performs a specific data manipulation task. Documents flow sequentially from one stage to the next, similar to an assembly line, where the output of one stage becomes the input for the next. This modular design makes complex data transformations manageable, allowing users to filter data using $match, reshape documents using $project, and, critically, summarize data using $group.

The structure of the pipeline dictates how the $group stage must be implemented. Since $group consumes documents and outputs new aggregated documents, its placement within the pipeline significantly impacts performance and results. For instance, placing a $match stage before $group can dramatically reduce the number of documents processed during aggregation, leading to faster query times. Conversely, placing a $sort stage after $group allows the developer to organize the final calculated results based on the aggregated metrics, such as ordering groups by total sum or average value.

When defining the aggregation pipeline in MongoDB, we always pass an array of stage definitions to the db.collection.aggregate() method. Each element in this array is a document defining a single stage operation. For grouping by multiple fields, the $group stage is the most critical component. It requires careful construction of the _id field to correctly define the composite key that will drive the grouping logic, ensuring that the database engine correctly identifies which documents should be bundled together for subsequent calculation.

Implementing the $group Operator with Composite Keys

The fundamental requirement for executing any grouping operation in MongoDB is defining the _id field within the $group stage. When grouping by a single field, _id references that field directly. However, to implement multi-field grouping, the _id field must be set as a document (an embedded object) containing key-value pairs, where each key represents a desired grouping dimension and the value is a reference to the corresponding document field, prefixed by a dollar sign ($). This nested structure forms the composite key.

For example, if we wish to group sales data by both region and productCategory, the _id structure would look like { _id: { regionKey: "$region", categoryKey: "$productCategory" } }. The combination of values for region and productCategory must be identical across multiple input documents for them to be placed into the same output group. Once the unique groups are established based on this composite key, we can then define accumulator fields outside of the _id document, which utilize operators like $sum or $avg to calculate metrics specific to that group.

The following syntax template illustrates the standard approach for grouping documents by multiple fields and simultaneously performing an aggregation function, such as counting the occurrences of documents within each unique group combination. This pattern is foundational for all multi-field aggregation tasks in MongoDB:

db.collection.aggregate([
    {$group : {_id:{field1:"$field1", field2:"$field2"}, count:{$sum:1}}}
])

Setting Up Our Sports Teams Dataset

To provide a concrete, reproducible demonstration of multi-field grouping, we will utilize a sample dataset representing sports team statistics. This collection, named teams, contains documents where each record tracks a player’s performance, specifically focusing on their team affiliation, their position on the field, and the points they scored in a given game. The primary objective is to group these records to analyze performance aggregated by the combination of Team and Position.

The structure of our data is intentionally simple, featuring three key fields: team (string), position (string), and points (numeric). This straightforward structure allows us to focus entirely on the aggregation logic without the complexity of nested documents, making the resulting groupings and calculations clear and easy to trace. We insert a small but illustrative set of five documents into the teams collection, ensuring that there are deliberate overlaps in both the team and position fields to create groups that can be aggregated.

The following commands insert the example documents into the teams collection. Notice how the first two documents share the exact same team (‘Mavs’) and position (‘Guard’), which will lead to them being combined into a single group during the aggregation process, enabling us to sum their individual point scores:

db.teams.insertOne({team: "Mavs", position: "Guard", points: 31})
db.teams.insertOne({team: "Mavs", position: "Guard", points: 22})
db.teams.insertOne({team: "Mavs", position: "Forward", points: 19})
db.teams.insertOne({team: "Rockets", position: "Guard", points: 26})
db.teams.insertOne({team: "Rockets", position: "Forward", points: 33})

Example 1: Counting Documents and Calculating Totals

Our first example focuses on using the multi-field grouping functionality to count the number of records (player entries) associated with each unique combination of team and position. This is achieved by setting the composite key using $team and $position within the _id field, and then defining an accumulator field named count that utilizes the $sum operator with a value of 1. Summing 1 for every document processed ensures that the result accurately reflects the total number of documents belonging to that specific group.

The following code snippet executes this counting aggregation. Notice the structure of the _id field: it creates a unique grouping key based on the concatenation of the values found in the team and position fields. If a document has ‘Mavs’ and ‘Guard’, it belongs to the same group as any other document with ‘Mavs’ and ‘Guard’, regardless of its points value.

db.teams.aggregate([
    {$group : {_id:{team:"$team", position:"$position"}, count:{$sum:1}}}
])

Executing the aggregation pipeline above yields four distinct result documents, corresponding to the four unique pairings present in our dataset. The aggregation correctly identifies that the combination of ‘Mavs’ and ‘Guard’ appears twice, while the other three combinations (‘Mavs’/’Forward’, ‘Rockets’/’Guard’, ‘Rockets’/’Forward’) appear only once each. This type of grouping is invaluable for deriving frequencies and distributions within multi-dimensional categorical data.

{ _id: { team: 'Rockets', position: 'Forward' }, count: 1 }
{ _id: { team: 'Mavs', position: 'Guard' }, count: 2 }
{ _id: { team: 'Mavs', position: 'Forward' }, count: 1 }
{ _id: { team: 'Rockets', position: 'Guard' }, count: 1 }

Expanding Aggregation: Calculating Total Points

While counting documents is useful, the true analytical power of the $group operator is realized when we apply more complex accumulators to numerical fields. Instead of merely counting, we can calculate metrics like sums, averages, or standard deviations. In our next iteration, we will retain the exact same multi-field grouping key (team and position) but shift our focus to calculating the total number of points scored by all players within that specific team/position grouping.

To achieve this, we replace the previous count: {$sum: 1} accumulator with a new field, sumPoints, which utilizes the $sum operator referencing the $points field. By passing "$points" as the argument to $sum, we instruct MongoDB to aggregate the numeric values found in the points field across all input documents that fall into the current group. This provides a tangible metric for assessing performance across the defined dimensions.

db.teams.aggregate([
    {$group : {_id:{team:"$team", position:"$position"}, sumPoints:{$sum:"$points"}}}
])

The result set provides the calculated sum of points for each unique combination, effectively transforming raw transactional data into meaningful performance summary data. For instance, the two ‘Mavs’/’Guard’ records (31 points and 22 points) are combined, and their points are summed to 53, reflecting the total offensive output from that specific role on that team.

{ _id: { team: 'Rockets', position: 'Forward' }, sumPoints: 33 }
{ _id: { team: 'Mavs', position: 'Guard' }, sumPoints: 53 }
{ _id: { team: 'Mavs', position: 'Forward' }, sumPoints: 19 }
{ _id: { team: 'Rockets', position: 'Guard' }, sumPoints: 26 }

Interpreting the Aggregated Results

The output from the previous aggregation stage is a collection of documents where the _id object clearly identifies the group, and the sumPoints field provides the calculated metric. Interpreting these results correctly is crucial for data-driven decision-making, as the aggregated data now provides insights that were not apparent when viewing the individual raw documents. This aggregation has successfully distilled five original documents into four summary documents, each representing a unique analytical slice.

By examining the output, we can quickly draw precise conclusions about the scoring distribution across the teams and positions. This type of aggregated summary is often the final required output for reporting or dashboard visualization purposes, as it provides high-level data consumption with minimal effort. The results immediately highlight the most productive group in terms of scoring: the ‘Mavs’ Guards.

Specifically, the aggregated data tells us the following key facts:

  • The total sum of points scored by players on the ‘Rockets’ holding the ‘Forward’ position is 33.
  • The combined sum of points scored by all players on the ‘Mavs’ categorized as ‘Guard’ is 53.
  • The ‘Mavs’ ‘Forward’ position contributed 19 points to the total score.
  • The ‘Rockets’ ‘Guard’ position contributed 26 points to the total score.

Example 2: Refining Results with Sorting ($sort)

While aggregation provides the necessary calculations, the output order is often arbitrary unless explicitly controlled. In a real-world scenario, analysts almost always need to see the results sorted, typically by the newly calculated metrics, such as the highest total points or the lowest average cost. This is achieved by adding a subsequent stage to the aggregation pipeline: the $sort operator. The $sort operator must follow the $group stage because the sorting is based on the fields created during aggregation (like sumPoints).

To demonstrate this, we will take the exact aggregation code used to calculate sumPoints and append a $sort stage. We instruct the pipeline to sort the output documents based on the sumPoints field in ascending order, where 1 signifies ascending sorting (from lowest to highest value). This makes it easy to identify the lowest-performing groups based on the calculated metric.

db.teams.aggregate([
    {$group : {_id:{team:"$team", position:"$position"}, sumPoints:{$sum:"$points"}}},
    {$sort : {sumPoints:1}}
])

The resulting output confirms that the documents are now ordered based on the value in the sumPoints field, starting with 19 and ending with 53. This organized presentation significantly enhances readability and facilitates quick identification of extremes within the aggregated data, allowing users to efficiently pinpoint the highest and lowest scoring groups.

{ _id: { team: 'Mavs', position: 'Forward' }, sumPoints: 19 }
{ _id: { team: 'Rockets', position: 'Guard' }, sumPoints: 26 }
{ _id: { team: 'Rockets', position: 'Forward' }, sumPoints: 33 }
{ _id: { team: 'Mavs', position: 'Guard' }, sumPoints: 53 }

Alternatively, sorting can be performed in descending order to immediately highlight the groups with the highest aggregated values. To achieve this, we simply change the sort value from 1 to -1. The descending sort order arranges results from the highest value down to the lowest, immediately showcasing the top performers in the dataset without requiring manual scanning.

db.teams.aggregate([
    {$group : {_id:{team:"$team", position:"$position"}, sumPoints:{$sum:"$points"}}},
    {$sort : {sumPoints:-1}}
])

This modification yields the results ordered from the highest total points downwards, confirming that the ‘Mavs’ ‘Guard’ position is the top-scoring group in this sample set.

{ _id: { team: 'Mavs', position: 'Guard' }, sumPoints: 53 }
{ _id: { team: 'Rockets', position: 'Forward' }, sumPoints: 33 }
{ _id: { team: 'Rockets', position: 'Guard' }, sumPoints: 26 }
{ _id: { team: 'Mavs', position: 'Forward' }, sumPoints: 19 }

Advanced Considerations and Best Practices

When working with extensive datasets and complex multi-field groupings, performance considerations become paramount. A crucial best practice is to ensure that you minimize the number of documents passed into the $group stage. If possible, utilize the $match stage early in the pipeline to filter out irrelevant records based on criteria such as date ranges or status flags. Reducing the data volume processed by the $group operator often translates into significantly faster aggregation times, especially when dealing with collections containing millions of documents.

Furthermore, while we focused on $sum and $count, the $group operator supports a comprehensive array of accumulator operators, including $avg, $push (to create an array of values), $addToSet (to create an array of unique values), $min, and $max. Developers should choose the most appropriate accumulator based on the analytical question being asked. For instance, determining the earliest order date for a specific customer/product group would require the $min operator on the date field, while finding all unique product IDs purchased by a certain region would require $addToSet.

In summary, the ability to group by multiple fields using a composite key within the $group operator is a cornerstone of advanced data analysis in MongoDB. By combining this technique with subsequent pipeline stages like $sort, developers can efficiently transform raw transactional data into structured, meaningful, and actionable business intelligence. For complete and detailed information regarding all supported functions and advanced grouping techniques, always refer to the official $group operator documentation provided by MongoDB.


Note: You can find the complete documentation for $group here.

Cite this article

stats writer (2025). How to Group Data by Multiple Fields in MongoDB: A Step-by-Step Guide. PSYCHOLOGICAL SCALES. Retrieved from https://scales.arabpsychology.com/stats/mongodb-group-by-multiple-fieldshow-to-group-by-multiple-fields-in-mongodb/

stats writer. "How to Group Data by Multiple Fields in MongoDB: A Step-by-Step Guide." PSYCHOLOGICAL SCALES, 2 Dec. 2025, https://scales.arabpsychology.com/stats/mongodb-group-by-multiple-fieldshow-to-group-by-multiple-fields-in-mongodb/.

stats writer. "How to Group Data by Multiple Fields in MongoDB: A Step-by-Step Guide." PSYCHOLOGICAL SCALES, 2025. https://scales.arabpsychology.com/stats/mongodb-group-by-multiple-fieldshow-to-group-by-multiple-fields-in-mongodb/.

stats writer (2025) 'How to Group Data by Multiple Fields in MongoDB: A Step-by-Step Guide', PSYCHOLOGICAL SCALES. Available at: https://scales.arabpsychology.com/stats/mongodb-group-by-multiple-fieldshow-to-group-by-multiple-fields-in-mongodb/.

[1] stats writer, "How to Group Data by Multiple Fields in MongoDB: A Step-by-Step Guide," PSYCHOLOGICAL SCALES, vol. X, no. Y, ص Z-Z, December, 2025.

stats writer. How to Group Data by Multiple Fields in MongoDB: A Step-by-Step Guide. PSYCHOLOGICAL SCALES. 2025;vol(issue):pages.

Download Post (.PDF)
Slide Up
x
PDF
Scroll to Top