Amazon DocumentDB is a scalable, fully managed document database service designed to work with MongoDB workloads. While it behaves similarly to MongoDB in many ways, it has some critical limitations and behavioral nuances — especially when you’re dealing with large datasets and heavy aggregations.
In this post, I’ll walk you through how to:
- Populate a large collection with test data
- Run a heavy
$lookup
operation multiple times - Monitor long-running operations with
db.currentOp()
- Filter and analyze slow queries
- Kill operations directly from the shell
📚 Step 1: Insert 100,000 Books into DocumentDB
To simulate load and test joins and performance, we first populate the books
collection with randomized data.
// Choose the database
use mylibrary;
// Define helper functions (basic randomizer)
function getRandomElement(arr) {
return arr[Math.floor(Math.random() * arr.length)];
}
function getRandomInt(min, max) {
return Math.floor(Math.random() * (max - min + 1)) + min;
}
var genres = ['Fiction', 'Non-Fiction', 'Science', 'Fantasy', 'Biography', 'History', 'Mystery', 'Romance'];
var formats = ['epub', 'pdf', 'txt', 'audio'];
var firstNames = ['John', 'Mary', 'Alice', 'Robert', 'Linda', 'Michael', 'Sarah', 'David'];
var lastNames = ['Smith', 'Johnson', 'Williams', 'Brown', 'Jones', 'Miller', 'Davis', 'Garcia'];
for (var i = 0; i < 100000; i++) {
var author = getRandomElement(firstNames) + " " + getRandomElement(lastNames);
var title = "Book Title " + (i + 1);
var book = {
author: author,
title: "title,"
genre: getRandomElement(genres),
format: getRandomElement(formats),
num_of_pages: getRandomInt(50, 1000),
year_when_published: getRandomInt(1900, 2025)
};
db.books.insertOne(book);
}
After running the script:
db.books.find().count();
100000
🔁 Step 2: Run a Heavy Self-Join Using $lookup
Three Times
We now run a self-join query that joins the books
collection to itself three times based on the title
field.
db.books.aggregate([
{
$lookup: {
from: "books",
localField: "title",
foreignField: "title",
as: "join1"
}
},
{
$lookup: {
from: "books",
localField: "title",
foreignField: "title",
as: "join2"
}
},
{
$lookup: {
from: "books",
localField: "title",
foreignField: "title",
as: "join3"
}
}
])
This can result in long-running operations, especially on large collections like ours. Now let’s inspect how to monitor them.
🔍 Step 3: Use db.currentOp()
to Monitor Active Queries
db.currentOp()
Example Output:
rs0 [direct: primary] libraries> db.currentOp()
{
inprog: [
{
active: true,
killPending: false,
effectiveUsers: [ { user: 'myadmin', db: 'admin' } ],
opid: 3427458,
ns: 'mylibrary;.books',
command: { cursor: Long('337448504967') },
lsid: {
id: UUID('54c4f794-c707-4779-8e37-be9ff53a1d3d'),
uid: Binary.createFromBase64('H8snps2xdBYX5lwliJwTwIf7Mh1/ajgry8q8kHMbKvs=', 0)
},
originatingCommand: {
aggregate: 'books',
pipeline: [
{
'$lookup': {
from: 'books',
localField: 'title',
foreignField: 'title',
as: 'join1'
}
},
{
'$lookup': {
from: 'books',
localField: 'title',
foreignField: 'title',
as: 'join2'
}
},
{
'$lookup': {
from: 'books',
localField: 'title',
foreignField: 'title',
as: 'join3'
}
}
],
cursor: {},
lsid: { id: UUID('54c4f794-c707-4779-8e37-be9ff53a1d3d') },
'$readPreference': { mode: 'primaryPreferred' },
'$db': 'mylibrary;'
},
op: 'command',
'$db': 'mylibrary;',
secs_running: 15,
microsecs_running: Long('15646511'),
threadId: 48599,
desc: 'Cursor',
WaitState: 'Other'
},
{
client: '172.31.80.163:56514',
desc: 'Conn',
active: true,
killPending: false,
effectiveUsers: [ { user: 'myadmin', db: 'admin' } ],
opid: 3427498,
ns: 'admin.$cmd',
command: {
aggregate: 1,
pipeline: [
{
'$currentOp': {
allUsers: true,
idleConnections: false,
truncateOps: false
}
},
{ '$match': {} }
],
cursor: {},
lsid: { id: UUID('d2c06e64-4793-4e0a-88de-e98fb24eb98e') },
'$readPreference': { mode: 'primaryPreferred' },
'$db': 'admin'
},
lsid: {
id: UUID('d2c06e64-4793-4e0a-88de-e98fb24eb98e'),
uid: Binary.createFromBase64('H8snps2xdBYX5lwliJwTwIf7Mh1/ajgry8q8kHMbKvs=', 0)
},
op: 'command',
'$db': 'admin',
secs_running: 0,
microsecs_running: Long('220'),
threadId: 51335,
clientMetaData: {
application: { name: 'mongosh 2.5.1' },
driver: { name: 'nodejs|mongosh', version: '6.16.0|2.5.1' },
platform: 'Node.js v20.19.1, LE',
os: {
name: 'linux',
architecture: 'x64',
version: '3.10.0-327.22.2.el7.x86_64',
type: 'Linux'
}
}
}
],
ok: 1
}
rs0 [direct: primary] libraries>
⏱️ Step 4: Filter for Long-Running Queries (>10 Seconds)
db.currentOp({
active: true,
secs_running: { $gt: 10 }
})
This is useful if you want to isolate only the slow or stuck operations.
🧠 Step 5: Extract Only Key Fields from Long Queries
To get a more readable view, you can map just the relevant fields:
db.currentOp({
active: true,
secs_running: { $gt: 10 }
}).inprog.map(op => ({
active: op.active,
opid: op.opid,
ns: op.ns,
originatingCommand: op.originatingCommand,
secs_running: op.secs_running
}));
🛑 Step 6: Kill a Long-Running Operation
Once you identify the opid
of the slow query, you can terminate it with:
db.killOp(opid)
Example:
db.killOp(3427458)
This immediately stops the operation and frees up system resources.
🚀 Final Thoughts
Amazon DocumentDB can handle large-scale document operations, but heavy aggregation pipelines (especially self-joins with $lookup
) can easily cause bottlenecks. By combining $lookup
, db.currentOp()
, and db.killOp()
, you can simulate and manage real-world production issues like slow queries and runaway operations.
Use this workflow to monitor your workloads and keep your DocumentDB clusters responsive.