Zum Inhalt springen

How to Do MongoDB Indexes

You just finished building an awesome feature that involves some new database queries. You have added unit tests, you carefully validated it in your test environment, and you deployed it to production, waiting for cheers and happy feedback.

Except that… your database suddenly implodes, because one of your new queries is extremely slow, taking down the whole database cluster (and your application) with it as well.

Sound familiar? Here’s how we avoid it.

What Happened?

But first of all: What was the issue, actually? You tested everything, even in a real environment, right? Right, but not with a real data volume.

In this case, your query was using a collection scan because it was not able to use an index. And while this worked locally and in your test environment, data volume on production was orders of magnitude larger, resulting in significantly higher load on the database and exponentially slower response times.

Surely, you could duplicate data from production to your test environment. But not only does that come with a certain price tag, you would also have to consider data anonymization and data access, as test environments are usually less restricted – but still must not leak production data.

So what’s the better way?

Requirements

To follow our approach, you need to maintain your MongoDB indexes in your application code. This feels like a natural choice when you are already maintaining your queries and data model there, and helps to avoid other problems as well. But more on that later.

Let’s assume you have a simple DataMongoRepository to manage a collection of Data documents (yes, naming things is one of the two challenges in software engineering):

class Data(val id: Long, val name: String)

class DataMongoRepository(mongoClient: MongoClient) {
    private val database = mongoClient.getDatabase("database")
    private val collection = database.getCollection<Data>("datas")

    suspend fun insert(data: Data) {
        collection.insertOne(data)
    }

    suspend fun fetchById(id: Long): Data? =
        collection.find(Filters.eq(Data::id.name, id)).firstOrNull()
}

In that case, a typical test could look like this:

class DataMongoRepositoryTest {
    private val mongoContainer = MongoDbTestContainer().also { it.start() }
    private val mongoClient = MongoClient.create(
        mongoContainer.getReplicaSetUrl("database")
    )
    private val repository = DataMongoRepository(mongoClient)

    @Test
    fun `test simple insert`() {
        runBlocking {
            // Given
            val data = Data(1, "data1")
            repository.insert(data)

            // When
            val fetched = repository.fetchById(data.id)

            // Then
            assertEquals(data, fetched)
        }
    }
}

It starts an actual MongoDB via testcontainers, inserts some test data and then fetches it to see if it is still the same.

Using an Index

Now, to ensure that this query works with higher loads, let’s add an index in our DataMongoRepository:

class DataMongoRepository(mongoClient: MongoClient) {
    ...

    init {
        runBlocking {
            collection.createIndex(Indexes.ascending(Data::id.name))
        }
    }

   ...
}

If we run the test again, it still succeeds (as expected, of course).

But what if we forgot to add this index? Can we somehow make this more robust?
Yes, we can!

Ensuring Index Usage

The magic lies in MongoDB’s notablescan parameter. When enabled, it prevents queries from running collection scans, and returns an error instead.

The parameter can be configured using the setParameter command on the admin database:

mongoClient
    .getDatabase("admin")
    .runCommand(
        Document.parse("{ setParameter: 1, notablescan: 1 }")
    )

In our test setup (here simplified as @BeforeTest) we enable this as default for all tests, which allows us to quickly catch new queries that would try to do a collection scan, and react accordingly:

class DataMongoRepositoryTest {
    ...

    @BeforeTest
    fun setUp() {
        runBlocking {
            mongoClient.getDatabase("admin").runCommand(
                Document.parse("{ setParameter: 1, notablescan: 1 }")
            )
        }
    }

    ...
}

Let’s see how this works by adding a query by name:

class DataMongoRepository(mongoClient: MongoClient) {
    ...

    suspend fun fetchAllByName(name: String): List<Data> =
        collection.find(Filters.eq(Data::name.name, name)).toList()
}

And a corresponding test:

class DataMongoRepositoryTest {
    ...

    @Test
    fun `test fetch by name`() {
        runBlocking {
            // Given
            val data1 = Data(1, "data")
            val data2 = Data(2, "data")
            repository.insert(data1)
            repository.insert(data2)

            // When
            val fetched = repository.fetchAllByName("data")

            // Then
            assertEquals(listOf(data1, data2), fetched)
        }
    }
}

Running the new test fails with an exception:

com.mongodb.MongoQueryException: Command failed with error 291 (NoQueryExecutionPlans)

Nice, isn’t it?

To make that test work, we could of course add an index for the name field. But that’s not always desired, so let’s see how we can disable validation for a particular test:

class DataMongoRepositoryTest {
    ...

    @Test
    fun `test fetch by name`() {
        runBlocking {
            // Allow table scan
            mongoClient.getDatabase("admin").runCommand(
                Document.parse("{ setParameter: 1, notablescan: 0 }")
            )

            // Given
            val data1 = Data(1, "data")
            val data2 = Data(2, "data")
            repository.insert(data1)
            repository.insert(data2)

            // When
            val fetched = repository.fetchAllByName("data")

            // Then
            assertEquals(listOf(data1, data2), fetched)
        }
    }
}

Et voilà.

By using a Kotlin extension function we can make this even more concise:

class DataMongoRepositoryTest {
    ...

    // Runs the given [block] with `notablescan` disabled
    private fun allowTableScan(block: suspend CoroutineScope.() -> Unit) = runBlocking {
        mongoClient.getDatabase("admin").let {
            it.runCommand(Document.parse("{ setParameter: 1, notablescan: 0 }"))
            block()
            it.runCommand(Document.parse("{ setParameter: 1, notablescan: 1 }"))
        }
    }

    @Test
    fun `test fetch by name`() {
        allowTableScan {
            // Given
            val data1 = Data(1, "data")
            val data2 = Data(2, "data")
            repository.insert(data1)
            repository.insert(data2)

            // When
            val fetched = repository.fetchAllByName("data")

            // Then
            assertEquals(listOf(data1, data2), fetched)
        }
    }
}

This can be useful for e.g. complex search queries with multiple parameters, where you either cannot or don’t want to create an index for every possible combination of fields. Indexes are not for free, after all – they consume memory and need to be maintained when fields are updated, incurring additional overhead.

Unique Constraint Validation

As mentioned, indexes are not only created for performance reasons. Let’s get back to our example: Isn’t an id actually supposed to be unique?

The following test runs without error, so obviously there is no guarantee:

class DataMongoRepositoryTest {
    ...

    @Test
    fun `test duplicate insert`() {
        runBlocking {
            val data1 = Data(1, "data1")
            repository.insert(data)
            repository.insert(data)
        }
    }
}

In fact, we now have two Data documents in our collection with id=1:

[
  Document{{_id=688b83c7bc39ec12e9b2a15a, id=1, name=data1}},
  Document{{_id=688b83c7bc39ec12e9b2a15b, id=1, name=data1}}
]

This is especially dangerous because fetchById will simply ignore all except the first document, so it may take a while to actually notice that issue.

Let’s fix the problem by making our index unique using IndexOptions().unique(true):

class DataMongoRepository(mongoClient: MongoClient) {
    private val database = mongoClient.getDatabase("database")
    private val collection = database.getCollection<Data>("datas")

    init {
        runBlocking {
            collection.createIndex(
                Indexes.ascending("id"), IndexOptions().unique(false)
            )
        }
    }

    ...
}

Now the test succeeds and we can validate the expected error code:

class DataMongoRepositoryTest {
    ...

    @Test
    fun `test duplicate insert`() {
        runBlocking {
            // Given
            val data = Data(1, "data1")
            repository.insert(data)

            // When/Then
            val exception = assertFailsWith<MongoWriteException> {
                repository.insert(data)
            }
            assertEquals(ErrorCategory.DUPLICATE_KEY, ErrorCategory.fromErrorCode(exception.code))
        }
    }
}

Summary

We now have an easy way of validating database indexes, and an automated reminder that breaks our build if we accidentally add a new query without corresponding index.

Naturally, this cannot ensure that all queries use an optimal index. But while choosing the optimal index for a given query can have significant impact on performance, it is not always trivial, and you’ve already come a long way when you can at least guarantee that a query uses any index in order to avoid worst case scenarios.

And sometimes, all it needs is a little reminder, so there is really no reason to not include index validation in your build – especially, if it is such a trivial implementation.

get to know us 👉 https://mms.tech 👈

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert