When handling extensive datasets in MongoDB, there are instances where you need to retrieve data that adheres to specific criteria. As the complexity of your requirements increases, the associated operation becomes more resource-intensive and costly.
To improve the efficiency of your database operation, Mongo DB has an interesting feature called an "index".
If you prefer a video walkthrough, watch below;
For the readers π, please carry on;
In MongoDB, an index is like a detailed map of data. It helps the database quickly find specific information, similar to an index in a book guiding you to the exact pages where certain topics are discussed. This speeds up data retrieval, making queries faster and more efficient.
Imagine a well-organized library where the librarian has created an index for every book, detailing its title, author, genre, and page numbers. When you ask for a specific book, instead of scanning every shelf, the librarian refers to the index, quickly pinpointing the exact location of the book. In the same way, MongoDB's index works as a sophisticated librarian, efficiently directing the database to the precise location of data based on predefined criteria.
To demonstrate this, we will import a large JSON file containing an array of over 60,000 jobs to a MongoDB database, and then we will attempt to run a resource-intensive command before and after we create an index for the jobs collection.
To follow along you need;
- MongoDB setup and running on your PC or in the cloud
- Hemlock; a Database management tool
- A Large List of Jobs
Step 1: Setup Hemlock and Connect to your Database
After downloading all dependencies, you can follow these instructions on how to Connect Hemlock to a Database
Step 2: Create a Collection and Import the Jobs
Here, I will use the Commands tab in Hemlock to run a create collection command which will create my jobs collection.
db.createCollection("jobs")
Now switch to the Databases tab and refresh to reveal the new collection.
To import the large jobs JSON list, click jobs, then click "Import" and Select "JSON", then choose the file you downloaded and wait for several seconds, this may take longer if your database is in the cloud.
After import, you should see the jobs list and stats relating to the job.
Step 3: Run our aggregate
Command
To show the issue with running resource-intensive commands on this dataset, we will run a command. Switch to "Commands" tab in Hemlock and run the following command;
db.jobs.aggregate([
{
$match: {
$and: [
{
$or: [
{
commute_type: {
$regex: `.*remote.*`,
$options: 'i',
},
},
{
job_location: {
$regex: `.*remote.*`,
$options: 'i',
},
},
],
},
]
},
},
{ $sort: { date_created: -1 } },
{ $skip: (3375 - 1) * 12 },
{ $limit: 12 },
])
This command is trying to find jobs where commute_type
or job_location
includes "remote", sort all jobs in descending order with the date_created
field, paginate with 12 items per page and then return all items on page 3,375.
Running the command will result in the following error;
Uncaught
MongoServerError: Error in $cursor stage :: caused by :: Sort exceeded memory limit of 104857600 bytes, but did not opt in to external sorting. Aborting operation. Pass allowDiskUse:true to opt in.
Step 4: Create an Index
To resolve this, we will create an index for our "jobs" collection with all our jobs sorted in descending order by date_created
field.
db.jobs.createIndex({ date_created: -1 })
You can verify its creation by running;
db.jobs.getIndexes()
This should show a list of indexes including the one you just created.
Step 5: Run the aggregate
Command Again
Trigger the aggregate command again the same as before, this time you will get an array of 12 jobs meeting the requirements we specified.
Conclusion
In conclusion, MongoDB's index feature plays a pivotal role in optimizing data retrieval efficiency, akin to a librarian's well-organized index in a library. As demonstrated with a substantial job dataset, the impact of indexes became evident when a resource-intensive command encountered memory limits. With the strategic use of indexes, MongoDB empowers smoother and more efficient database operations, highlighting its significance in handling large and complex datasets.
π Enjoyed this article?, check out other great learning materialsπ
Explore our MongoDB module π https://stackbay.org/modules/60b9235ad88c0300220a4a73
Join our Discord community π https://discord.com/invite/HNbNZDZWhN
Enroll in our FREE MongoDB Course π https://www.udemy.com/course/mongodb-unleashed-a-complete-guide/
Top comments (0)