You just finished building an awesome feature that involves some new database queries. You have added unit tests, you carefully validated it in your test environment, and you deployed it to production, waiting for cheers and happy feedback.
Except that… your database suddenly implodes, because one of your new queries is extremely slow, taking down the whole database cluster (and your application) with it as well.
Sound familiar? Here’s how we avoid it.
What Happened?
But first of all: What was the issue, actually? You tested everything, even in a real environment, right? Right, but not with a real data volume.
In this case, your query was using a collection scan because it was not able to use an index. And while this worked locally and in your test environment, data volume on production was orders of magnitude larger, resulting in significantly higher load on the database and exponentially slower response times.
Surely, you could duplicate data from production to your test environment. But not only does that come with a certain price tag, you would also have to consider data anonymization and data access, as test environments are usually less restricted – but still must not leak production data.
So what’s the better way?
Requirements
To follow our approach, you need to maintain your MongoDB indexes in your application code. This feels like a natural choice when you are already maintaining your queries and data model there, and helps to avoid other problems as well. But more on that later.
Let’s assume you have a simple DataMongoRepository
to manage a collection of Data
documents (yes, naming things is one of the two challenges in software engineering):
class Data(val id: Long, val name: String)
class DataMongoRepository(mongoClient: MongoClient) {
private val database = mongoClient.getDatabase("database")
private val collection = database.getCollection<Data>("datas")
suspend fun insert(data: Data) {
collection.insertOne(data)
}
suspend fun fetchById(id: Long): Data? =
collection.find(Filters.eq(Data::id.name, id)).firstOrNull()
}
In that case, a typical test could look like this:
class DataMongoRepositoryTest {
private val mongoContainer = MongoDbTestContainer().also { it.start() }
private val mongoClient = MongoClient.create(
mongoContainer.getReplicaSetUrl("database")
)
private val repository = DataMongoRepository(mongoClient)
@Test
fun `test simple insert`() {
runBlocking {
// Given
val data = Data(1, "data1")
repository.insert(data)
// When
val fetched = repository.fetchById(data.id)
// Then
assertEquals(data, fetched)
}
}
}
It starts an actual MongoDB via testcontainers, inserts some test data and then fetches it to see if it is still the same.
Using an Index
Now, to ensure that this query works with higher loads, let’s add an index in our DataMongoRepository
:
class DataMongoRepository(mongoClient: MongoClient) {
...
init {
runBlocking {
collection.createIndex(Indexes.ascending(Data::id.name))
}
}
...
}
If we run the test again, it still succeeds (as expected, of course).
But what if we forgot to add this index? Can we somehow make this more robust?
Yes, we can!
Ensuring Index Usage
The magic lies in MongoDB’s notablescan
parameter. When enabled, it prevents queries from running collection scans, and returns an error instead.
The parameter can be configured using the setParameter
command on the admin database:
mongoClient
.getDatabase("admin")
.runCommand(
Document.parse("{ setParameter: 1, notablescan: 1 }")
)
In our test setup (here simplified as @BeforeTest
) we enable this as default for all tests, which allows us to quickly catch new queries that would try to do a collection scan, and react accordingly:
class DataMongoRepositoryTest {
...
@BeforeTest
fun setUp() {
runBlocking {
mongoClient.getDatabase("admin").runCommand(
Document.parse("{ setParameter: 1, notablescan: 1 }")
)
}
}
...
}
Let’s see how this works by adding a query by name:
class DataMongoRepository(mongoClient: MongoClient) {
...
suspend fun fetchAllByName(name: String): List<Data> =
collection.find(Filters.eq(Data::name.name, name)).toList()
}
And a corresponding test:
class DataMongoRepositoryTest {
...
@Test
fun `test fetch by name`() {
runBlocking {
// Given
val data1 = Data(1, "data")
val data2 = Data(2, "data")
repository.insert(data1)
repository.insert(data2)
// When
val fetched = repository.fetchAllByName("data")
// Then
assertEquals(listOf(data1, data2), fetched)
}
}
}
Running the new test fails with an exception:
com.mongodb.MongoQueryException: Command failed with error 291 (NoQueryExecutionPlans)
Nice, isn’t it?
To make that test work, we could of course add an index for the name
field. But that’s not always desired, so let’s see how we can disable validation for a particular test:
class DataMongoRepositoryTest {
...
@Test
fun `test fetch by name`() {
runBlocking {
// Allow table scan
mongoClient.getDatabase("admin").runCommand(
Document.parse("{ setParameter: 1, notablescan: 0 }")
)
// Given
val data1 = Data(1, "data")
val data2 = Data(2, "data")
repository.insert(data1)
repository.insert(data2)
// When
val fetched = repository.fetchAllByName("data")
// Then
assertEquals(listOf(data1, data2), fetched)
}
}
}
Et voilà.
By using a Kotlin extension function we can make this even more concise:
class DataMongoRepositoryTest {
...
// Runs the given [block] with `notablescan` disabled
private fun allowTableScan(block: suspend CoroutineScope.() -> Unit) = runBlocking {
mongoClient.getDatabase("admin").let {
it.runCommand(Document.parse("{ setParameter: 1, notablescan: 0 }"))
block()
it.runCommand(Document.parse("{ setParameter: 1, notablescan: 1 }"))
}
}
@Test
fun `test fetch by name`() {
allowTableScan {
// Given
val data1 = Data(1, "data")
val data2 = Data(2, "data")
repository.insert(data1)
repository.insert(data2)
// When
val fetched = repository.fetchAllByName("data")
// Then
assertEquals(listOf(data1, data2), fetched)
}
}
}
This can be useful for e.g. complex search queries with multiple parameters, where you either cannot or don’t want to create an index for every possible combination of fields. Indexes are not for free, after all – they consume memory and need to be maintained when fields are updated, incurring additional overhead.
Unique Constraint Validation
As mentioned, indexes are not only created for performance reasons. Let’s get back to our example: Isn’t an id
actually supposed to be unique?
The following test runs without error, so obviously there is no guarantee:
class DataMongoRepositoryTest {
...
@Test
fun `test duplicate insert`() {
runBlocking {
val data1 = Data(1, "data1")
repository.insert(data)
repository.insert(data)
}
}
}
In fact, we now have two Data
documents in our collection with id=1
:
[
Document{{_id=688b83c7bc39ec12e9b2a15a, id=1, name=data1}},
Document{{_id=688b83c7bc39ec12e9b2a15b, id=1, name=data1}}
]
This is especially dangerous because fetchById
will simply ignore all except the first document, so it may take a while to actually notice that issue.
Let’s fix the problem by making our index unique using IndexOptions().unique(true)
:
class DataMongoRepository(mongoClient: MongoClient) {
private val database = mongoClient.getDatabase("database")
private val collection = database.getCollection<Data>("datas")
init {
runBlocking {
collection.createIndex(
Indexes.ascending("id"), IndexOptions().unique(false)
)
}
}
...
}
Now the test succeeds and we can validate the expected error code:
class DataMongoRepositoryTest {
...
@Test
fun `test duplicate insert`() {
runBlocking {
// Given
val data = Data(1, "data1")
repository.insert(data)
// When/Then
val exception = assertFailsWith<MongoWriteException> {
repository.insert(data)
}
assertEquals(ErrorCategory.DUPLICATE_KEY, ErrorCategory.fromErrorCode(exception.code))
}
}
}
Summary
We now have an easy way of validating database indexes, and an automated reminder that breaks our build if we accidentally add a new query without corresponding index.
Naturally, this cannot ensure that all queries use an optimal index. But while choosing the optimal index for a given query can have significant impact on performance, it is not always trivial, and you’ve already come a long way when you can at least guarantee that a query uses any index in order to avoid worst case scenarios.
And sometimes, all it needs is a little reminder, so there is really no reason to not include index validation in your build – especially, if it is such a trivial implementation.
get to know us 👉 https://mms.tech 👈