Introduction
MongoDB is a NoSQL database that provides a high level of flexibility and scalability for managing large amounts of unstructured data. One of the most powerful features of MongoDB is its aggregation framework, which allows users to perform complex data processing and analysis by defining a pipeline of operations. At the heart of this framework lies the $group
stage, which is used to group input documents by a specified identifier and apply accumulators to perform various operations on the grouped data.
This tutorial will dive deep into the $group
aggregation stage in MongoDB. We will begin with basic concepts, gradually moving to more complex operations, complete with code examples and their respective outputs.
Understanding the $group
Stage
The $group
stage is akin to the GROUP BY clause in SQL, used for grouping documents according to one or more fields. The grouped documents can then be processed using accumulator operators like $sum
, $avg
, $max
, and more. The syntax of the $group
stage is:
{
$group: {
_id: <expression>, // Field to group by
<field1>: { <accumulator1> : <expression1> },
...
<fieldN>: { <accumulatorN> : <expressionN> }
}
}
Here, _id
is mandatory and specifies the group identifier. Each field after _id
applies an accumulator to the collection of documents that share the same group identifier.
Basic Usage of $group
Let’s start with a basic example of using the $group
stage to group documents by a single field and count the number of documents in each group.
// Group by category and count products in each category
db.products.aggregate([
{
$group : {
_id : '$category', // Group by 'category' field
count: { $sum: 1 } // Count documents in each category
}
}
]);
In the example above, the documents in the products
collection are grouped by the category
field. The accumulator $sum
is used to add 1 for each document encountered, effectively counting the number of documents in each category.
Grouping by Multiple Fields
You can also group documents by multiple fields by using compound keys in the _id
field as shown in the example below:
// Group by category and tags
mdb.products.aggregate([
{
$group : {
_id : { category: '$category', tags: '$tags' },
count: { $sum: 1 }
}
}
]);
In the updated query, documents are now grouped by both category
and tags
, resulting in a count of documents for every unique combination of category and tags.
Using Accumulators in $group
Accumulators are operations that process data (e.g., sums, averages) for documents within a group. Here’s an example where we calculate the average price of products in each category:
// Calculate average price for each category
db.products.aggregate([
{
$group: {
_id: '$category',
averagePrice: { $avg: '$price'}
}
}
]);
The $avg
accumulator calculates the average of all the price
fields within each group identified by category
.
Combining Multiple Accumulators
You can also combine multiple accumulators within the same $group
stage to produce multiple calculations. For example, you can calculate the average, maximum, and minimum price for each category:
// Combine multiple accumulators
mdb.products.aggregate([
{
$group: {
_id: '$category',
averagePrice: { $avg: '$price'},
maxPrice: { $max: '$price'},
minPrice: { $min: '$price'}
}
}
]);
This will output documents with the category
field as the identifier and fields for the average, maximum, and minimum prices calculated for each category.
Advanced Grouping with Arbitrary Expressions
Besides grouping by direct field references, $group
can use expressions that transform the input values. For instance, you could create groups based on the length of a string field:
// Group by string length of the 'name' field
db.products.aggregate([
{
$group: {
_id: { $strLenCP: '$name' }, // Uses a string expression
db.products.aggregate([
{
$group: {
_id: { length: { $strLenCP: '$name' } },
count: { $sum: 1 }
}
}
]);
This aggregation groups the documents based on the computed length of the names of the products and counts how many products have names of the same length.
Nesting ‘$group’ Stages
In some cases, you might need to perform multiple grouping stages consecutively. This can be useful for performing sub-aggregations within groups. Here’s an example:
// Perform nested grouping
mdb.products.aggregate([
{
$group: {
_id: '$category',
totalSales: { $sum: '$quantitySold' }
}
},
{
$group: {
_id: null,
totalRevenue: { $sum: '$totalSales' }
}
}
]);
The first group stage computes the total sales for each category. Then, another group stage without specifying an _id
(or setting it to null) aggregates across the entire collection to find cumulative sales across all categories.
Conclusion
MongoDB’s $group
aggregation stage is a versatile tool that lets you perform a plethora of operations on your data collections. Whether you’re counting documents, calculating averages, or performing sub-aggregations, $group
provides a convenient and powerful way to accomplish a wide variety of tasks. The examples provided illustrate only a fraction of what’s possible, enabling you to build upon these foundations for more complex data analysis and manipulation.