MongoDB Aggregation Pipeline
A Comprehensive Guide
Introduction
The MongoDB aggregation pipeline is a powerful framework for processing data. It allows you to perform complex data transformations using a sequence of stages. Each stage processes the output from the previous stage, creating a flexible and efficient way to manipulate your data.
Aggregation Pipeline
The core concept of aggregation in MongoDB is the aggregation pipeline. It is a sequence of stages that operate on a stream of documents. Each stage takes the output from the previous stage as input and transforms it into a new output.
Stages
Here are some common stages used in the aggregation pipeline:
- $match: Filters documents based on a specific condition. It acts like a
WHERE
clause in SQL. It's placed early in the pipeline to reduce the amount of data processed in later stages.- Usage:
{$match: {<query>}}
- Example:
{$match: {price: {$gt: 100}}}
(finds documents where price is greater than 100)
- Usage:
- $project: Selects or renames fields in documents. This stage allows you to choose which fields to include in the output and even create new fields from existing ones.
- Usage:
{$project: {<projection>}}
- Example:
{$project: {name: 1, price: 1, _id: 0}}
(selects name and price, excludes _id)
- Usage:
- $group: Groups documents based on a field and calculates aggregate values. It's used for tasks like calculating sums, averages, counts, etc., per group.
- Usage:
{$group: {_id: <grouping_field>, <accumulator_expressions>}}
- Example:
{$group: {_id: "$category", totalValue: {$sum: "$price"}}}
(groups by category, sums prices)
- Usage:
- $sort: Sorts the documents based on a specified field. This stage orders the results based on ascending or descending order of a specific field.
- Usage:
{$sort: {<field>: <1 or -1>}}
(1 for ascending, -1 for descending) - Example:
{$sort: {price: 1}}
(sorts by price in ascending order)
- Usage:
- $limit: Limits the number of documents returned from the pipeline. Use this to control the size of the result set.
- Usage:
{$limit: <number>}
- Example:
{$limit: 10}
(limits the result to the top 10 documents)
- Usage:
- $skip: Skips a specified number of documents from the pipeline. Useful for pagination or offsetting results.
- Usage:
{$skip: <number>}
- Example:
{$skip: 10}
(skips the first 10 documents)
- Usage:
- $unwind: Expands an array field into multiple documents. If a document has an array field, this stage creates multiple documents, one for each element in the array.
- Usage:
{$unwind: "$<arrayField>"}
- Example:
{$unwind: "$tags"}
(if "tags" is an array field)
- Usage:
Do You Know?
The order of stages in an aggregation pipeline is crucial. The output of one stage becomes the input of the next.
Example
Let's say we have a collection called products
with documents like this:
{ "_id": "1", "name": "Laptop", "price": 1200, "category": "Electronics" },{ "_id": "2", "name": "Mouse", "price": 20, "category": "Electronics" },{ "_id": "3", "name": "Keyboard", "price": 50, "category": "Electronics" },{ "_id": "4", "name": "Shirt", "price": 30, "category": "Clothing" }
We want to calculate the average price of products in each category. We can do this using the aggregation pipeline:
db.products.aggregate([ { $group: { _id: "$category", averagePrice: { $avg: "$price" } } }])
This aggregation pipeline will group the documents by category and calculate the average price for each group. The output would look something like this:
{ "_id": "Electronics", "averagePrice": 423.3333333333333 },{ "_id": "Clothing", "averagePrice": 30 }
Important Note
Error handling and efficient pipeline design are essential for large datasets. Consider using indexes to optimize performance.
Summary
- Aggregation pipelines are sequences of stages that transform data.
- Stages like $match, $project, $group, $sort, $limit, $skip, and $unwind provide various data manipulation capabilities.
- Careful planning and efficient use of stages are key for optimal performance.
- MongoDB Compass can assist in visualizing and building aggregation pipelines.