A MongoDB index is a special data structure comprising the search field and its location in the MongoDB document. Indexing allows MongoDB to ind and retrieve data, reducing query execution time quickly.
Imagine searching for a spelling in a dictionary. Instead of flipping through each page, you scroll to the alphabet section where the name begins, greatly narrowing your search.
MongoDB indexing is essential for fast document retrieval, sorting, and filtering. Without it, databases become slow, especially with large collections, and response times increase.
MongoDB indexing strategies play a crucial role in reducing the application response times.
MongoDB uses a B-tree index, organizing entries in sorted order for efficient insertion, deletion, and search. Creating an index adds a structure with document keys and links to the corresponding documents.
Index Key: It is simply the field or fields in a MongoDB document.
Index Direction: The index direction commonly referred to as index order, determines if the field is sorted in ascending(1) and descending(-1) order.
To proceed with this tutorial, you will need to:
Have a MongoDB database installation or get your free MongoDB atlas account from here.
This tutorial assumes you have some familiarity with mongosh
. You should know how to switch to different databases and query collections.
Download the test database:
wget https://raw.githubusercontent.com/ozlerhakan/mongodb-json-files/refs/heads/master/datasets/companies.json
Some parts of the tutorial use the Airbnb review database.
Import it. If you have local installation of MongoDB, you can import this JSON database with mongoimport
utility.
mongoimport --collection="companies" --file='companies.json' --db hostman-tutorial
mongoimport --collection="reviews" --file='reviews.csv' –type csv --db hostman-tutorial --headerline
To show indexes in MongoDB:
db.reviews.getIndexes()
To create an index in MongoDB, simply use the function db.createIndex
and pass the field name.
db.reviews.createIndex({ reviewer_name: 1 })
To drop the index in MongoDB:
db.reviews.dropIndex("reviewer_name_1")
It obviously depends upon the scenario if creating an index on a single field or a combination of fields will be more efficient. It’s also interesting to note what kind of information the field is storing.
Here is a list of different techniques.
Single field index is useful in scenarios where MongoDB needs to frequently query data by a particular field.
Obviously, a single field index is not a viable option in case you need to support searching across multiple fields.
In the reviews
dataset, it might be interesting to list only the property reviewed by a particular person.
db.reviews.createIndex({ reviewer_name : 1 })
To verify if index creation has benefitted the database queries:
db.reviews.find({ reviewer_name: “Kristen” }).explain(“executionStats”)
The executionTimeMillis
has been drastically reduced from 31 ms to 1 ms. Similarly, totalDocsExamined
was reduced from 24752 to 47 only, thanks to MongoDB indexing.
To retrieve comments of multiple reviewers, use the MongoDB $in
operator.
db.reviews.find(
{ reviewer_name : { $in: ["Christopher", "Altay", "Kristen"] } },
{ reviewer_name:1, comments: 1 }
)
What if a database frequently needs to query by three different fields, that’s where compound index comes to the rescue.
db.companies.createIndex({
category_code: 1,
number_of_employees: 1,
founded_year: 1
})
Now, let's verify how the compound index improves our query using explain('executionStats')
.
db.companies.find({ category_code: "enterprise", number_of_employees: { $gte: 500, $lte: 1000 }, founded_year: { $gte: 1990 } }).explain("executionStats")
Remember, if you have hundreds of compound indexes, it can cause a significant downfall in the write performance of the database. The reason is its high resource usage.
What if the MongoDB field that needs to be indexed is an array? For example, a quick database inspection with the following command reveals relationships is an array field.
db.companies.find().limit(1)
The multikey index really shines here. It would be really interesting to filter out those persons who have still held their positions. For this purpose, you can create a multikey index on the is_past
field.
db.companies.createIndex({ "relationships.is_past": 1 })
For full-text search in MongoDB, use a text index, like in the Airbnb review database sample.
db.reviews.createIndex({ comments:“text” })
[
{ v: 2, key: { _id: 1 }, name: '_id_' },
{
v: 2,
key: { _fts: 'text', _ftsx: 1 },
name: 'comments_text',
weights: { comments: 1 },
default_language: 'english',
language_override: 'language',
textIndexVersion: 3
}
]
Now, let’s search for a large-bedroom apartment.
db.reviews.find({ $text: { $search: "large bedroom" } }).limit(20)
If you ever need to implement sorting, MongoDB does provide the sort
function and textScore
metadata for searching.
db.reviews.find({ $text: { $search: "large bedroom" } })
.sort({ score: { $meta: "textScore" } })
While creating a text index in MongoDB, if the key is an array, it would index and search across each element of the array.
MongoDB internally uses the hash function and uses it as a reference to the contents of fields in consideration.
If you’re using the MongoDB sharding feature, a hash index can make it more performant.
db.users.createIndex({ password: "hashed" })
db.users.find({ password: “very-long-hash” })
While the hash index is great, there are a few limitations though. For instance, you can’t use range queries like $gte
, $lte
, $gt
.
Whenever you come across a lot of null, missing, or boolean values in a MongoDB collection, a sparse index is worth consideration. Sparse indexes are easier to maintain, and can significantly improve query performance.
Let’s create an index for the documents that have a phone number field.
db.customers.createIndex({ phone: 1 }, { sparse: true })
Consider, out of 1 million customers only 20% of them provided their phone number. So, while creating a sparse index, it will only create an index for 0.2 million records. Isn’t this great?
Mongoose is similar to what SQLalchemy is for Flask. It makes working with MongoDB databases a lot easier in Node.js applications.
Here are two different approaches.
Index with Mongoose Schema: Mongoose schema determines the structure of a collection.
Mongoose provides an index method to create a new index on the desired schema. Every mongoose schema is tied to a model.
const mongoose = require(‘mongoose’);
const reviewSchema = new mongoose.Schema({
property: String,
comment: String
});
reviewSchema.index({ comment: 'text' });
const Review = mongoose.model(‘Review’, reviewSchema);
Index with MongoDB Collection: The second strategy is to retrieve the collection, the Mongoose way, and then set up an index.
const mongoose = require(‘mongoose’);
mongoose.connect('mongodb://localhost:27017/hostman-mongodb-tutorial, { useNewUrlParser: true, useUnifiedTopology: true });
mongoose.connection.once('open', function() {
const reviewsCollection = mongoose.connection.collection(‘reviews’);
reviewsCollection.createIndex({ email: 1 }, (err, result) => {
if (err) {
console.error('Error creating index:', err);
} else {
console.log('Index created successfully:', result);
}
});
});
Index intersection is a technique to combine multiple indexes to satisfy a complex query. The benefit is you get improved read performance without sacrificing index size in the long run.
Consider the scenario:
db.reviews.createIndex({ listing_id : 1 })
db.reviews.createIndex({ reviewer_id : 1 })
Perform the following query:
db.reviews.find({ listing_id: 2992450, reviewer_id: 16827297 })
With these two indexes, the query will use index intersection, but only if MongoDB’s query optimizer finds it more efficient.
No doubt, Indexing improves application response time but don’t overdo it. Too many indexes can be hard to maintain as data grows. Here are a few pointers:
MongoDB indexes are a great way to improve query times for document retrieval, and they’re crucial for high-availability setups. However, understanding how indexing works, its tradeoffs and the challenges it can bring for a maintenance team can help you get the most out of it.
At Hostman, you can deploy a MongoDB cloud database in a few seconds and start working in no time.