I’ve been working on projects with larger data sets recently and have begun to notice that not all queries resolve in one or two milliseconds.
As I wrote in Indexing Databases – A Postgres Example, adding indexes can speed up queries, but come at a cost of inserting / updating times.
This time around, I was working with MongoDB and wanted to understand how indexing worked. While the principles hold, the details vary.
Still – the conclusion’s the same: if you’re reading more than you’re writing and can come up with a sound index target, it can be a valuable tool in your tool belt.
Let’s dig in.
Common Commands
The three most useful beginner commands I found were create, get, and drop.
Create an Index
db.collection.createIndex( { <field_name> : <order> [, <field_name> : <order> ...] } )
When creating an index – you can specify the field(s) and the order.
The order is 1
for ascending and -1
for descending.
The ordering is done sequentially, so the first field listed is sorted first, then the second, and so on.
Review Indexes
db.collection.getIndexes()
To see the indexes on a collection, use the getIndexes()
method.
This is helpful when confirming the index is on the expected attribute of the document and to see which queries have been optimized.
Drop Indexes
db.collection.dropIndexes(["<index_name>"])
If you want to remove an index, you can use the dropIndexes()
method.
The optional parameter allows for specifying which index should be dropped. The name can be retrieved using the getIndexes()
method.
If no name is specified, all non–_id
indexes will be dropped.
Case Study
So what do indexes actually get us?
To test this, I created a sample database (trailblazers
) and collection (descriptions
) and generated 10m rows of data.
I then compared the performance of the same lookup query on the database pre- and post-index. This is not a scientific study, but at least when querying on the indexed attribute, query times fell 99.99% from 15000+ ms to ~1 ms.
Let’s take a look.
These times were found using the .explain(“executionStats”)
method.
Without An Index
With An Index
Conclusion
The index created a much more efficient query because Mongo knew exactly where to go to find the documents in question.
Without the index, the query reviewed all 10 million + documents. With the index, that number was cut to 10.
Another way to think about this is that the index changed the time complexity of the query from linear to constant.
As the database continues to grow, indexed fields can be found in constant time because the index offers a pointer.
That direction comes at a cost of a slower insert / update time, but if the database has more reads than writes, that trade off can be worth it quickly.
Top comments (7)
Out of curiosity. Are indexes in MongoDB global or local?
According to DZone, NoSQL is typically local:
Source
That said, this was the first time I had to look into it.
Pity. I remember looking into Couchbase and it supported global secondary indexes. Thought that MongoDB has caught up.
Can I ask, how much data are we talking about that you've got a distributed setup and need the global indices?
I don't really have one. I did quite a lot of research about Cassandra and wanted to have data there for a project but it never worked out.
Project was supposed to be a rewrite of legacy app that had to run in multiple regions and distributed database felt like a natural choice.
Cassandra, like MongoDB had only local Indexes at that time.
Got it! For what it's worth, I was recently working on projects with ~10m records - I went with Mongo, some peers went with Cassandra. We had comparable query times (post indexing), but the set up of Cassandra was apparently a pain whereas Mongo is a breeze.
Gee! Cassandra seems like a massive overkill for 10M records. Even MongoDB does, actually.
Also ye, Cassandra requires way more thinking in terms of data modeling and setting up infrastructure, but if done correctly - pays off.