Zum Inhalt springen

Monitoring and Managing Heavy Aggregations in AWS DocumentDB

Amazon DocumentDB is a scalable, fully managed document database service designed to work with MongoDB workloads. While it behaves similarly to MongoDB in many ways, it has some critical limitations and behavioral nuances — especially when you’re dealing with large datasets and heavy aggregations.

In this post, I’ll walk you through how to:

  • Populate a large collection with test data
  • Run a heavy $lookup operation multiple times
  • Monitor long-running operations with db.currentOp()
  • Filter and analyze slow queries
  • Kill operations directly from the shell

📚 Step 1: Insert 100,000 Books into DocumentDB

To simulate load and test joins and performance, we first populate the books collection with randomized data.

// Choose the database
use mylibrary;

// Define helper functions (basic randomizer)
function getRandomElement(arr) {
  return arr[Math.floor(Math.random() * arr.length)];
}

function getRandomInt(min, max) {
  return Math.floor(Math.random() * (max - min + 1)) + min;
}

var genres = ['Fiction', 'Non-Fiction', 'Science', 'Fantasy', 'Biography', 'History', 'Mystery', 'Romance'];
var formats = ['epub', 'pdf', 'txt', 'audio'];
var firstNames = ['John', 'Mary', 'Alice', 'Robert', 'Linda', 'Michael', 'Sarah', 'David'];
var lastNames = ['Smith', 'Johnson', 'Williams', 'Brown', 'Jones', 'Miller', 'Davis', 'Garcia'];

for (var i = 0; i < 100000; i++) {
  var author = getRandomElement(firstNames) + " " + getRandomElement(lastNames);
  var title = "Book Title " + (i + 1);
  var book = {
    author: author,
    title: "title,"
    genre: getRandomElement(genres),
    format: getRandomElement(formats),
    num_of_pages: getRandomInt(50, 1000),
    year_when_published: getRandomInt(1900, 2025)
  };

  db.books.insertOne(book);
}

After running the script:

db.books.find().count();
100000

🔁 Step 2: Run a Heavy Self-Join Using $lookup Three Times

We now run a self-join query that joins the books collection to itself three times based on the title field.

db.books.aggregate([
  {
    $lookup: {
      from: "books",
      localField: "title",
      foreignField: "title",
      as: "join1"
    }
  },
  {
    $lookup: {
      from: "books",
      localField: "title",
      foreignField: "title",
      as: "join2"
    }
  },
  {
    $lookup: {
      from: "books",
      localField: "title",
      foreignField: "title",
      as: "join3"
    }
  }
])

This can result in long-running operations, especially on large collections like ours. Now let’s inspect how to monitor them.

🔍 Step 3: Use db.currentOp() to Monitor Active Queries

db.currentOp()

Example Output:

rs0 [direct: primary] libraries> db.currentOp()
{
  inprog: [
    {
      active: true,
      killPending: false,
      effectiveUsers: [ { user: 'myadmin', db: 'admin' } ],
      opid: 3427458,
      ns: 'mylibrary;.books',
      command: { cursor: Long('337448504967') },
      lsid: {
        id: UUID('54c4f794-c707-4779-8e37-be9ff53a1d3d'),
        uid: Binary.createFromBase64('H8snps2xdBYX5lwliJwTwIf7Mh1/ajgry8q8kHMbKvs=', 0)
      },
      originatingCommand: {
        aggregate: 'books',
        pipeline: [
          {
            '$lookup': {
              from: 'books',
              localField: 'title',
              foreignField: 'title',
              as: 'join1'
            }
          },
          {
            '$lookup': {
              from: 'books',
              localField: 'title',
              foreignField: 'title',
              as: 'join2'
            }
          },
          {
            '$lookup': {
              from: 'books',
              localField: 'title',
              foreignField: 'title',
              as: 'join3'
            }
          }
        ],
        cursor: {},
        lsid: { id: UUID('54c4f794-c707-4779-8e37-be9ff53a1d3d') },
        '$readPreference': { mode: 'primaryPreferred' },
        '$db': 'mylibrary;'
      },
      op: 'command',
      '$db': 'mylibrary;',
      secs_running: 15,
      microsecs_running: Long('15646511'),
      threadId: 48599,
      desc: 'Cursor',
      WaitState: 'Other'
    },
    {
      client: '172.31.80.163:56514',
      desc: 'Conn',
      active: true,
      killPending: false,
      effectiveUsers: [ { user: 'myadmin', db: 'admin' } ],
      opid: 3427498,
      ns: 'admin.$cmd',
      command: {
        aggregate: 1,
        pipeline: [
          {
            '$currentOp': {
              allUsers: true,
              idleConnections: false,
              truncateOps: false
            }
          },
          { '$match': {} }
        ],
        cursor: {},
        lsid: { id: UUID('d2c06e64-4793-4e0a-88de-e98fb24eb98e') },
        '$readPreference': { mode: 'primaryPreferred' },
        '$db': 'admin'
      },
      lsid: {
        id: UUID('d2c06e64-4793-4e0a-88de-e98fb24eb98e'),
        uid: Binary.createFromBase64('H8snps2xdBYX5lwliJwTwIf7Mh1/ajgry8q8kHMbKvs=', 0)
      },
      op: 'command',
      '$db': 'admin',
      secs_running: 0,
      microsecs_running: Long('220'),
      threadId: 51335,
      clientMetaData: {
        application: { name: 'mongosh 2.5.1' },
        driver: { name: 'nodejs|mongosh', version: '6.16.0|2.5.1' },
        platform: 'Node.js v20.19.1, LE',
        os: {
          name: 'linux',
          architecture: 'x64',
          version: '3.10.0-327.22.2.el7.x86_64',
          type: 'Linux'
        }
      }
    }
  ],
  ok: 1
}
rs0 [direct: primary] libraries> 

⏱️ Step 4: Filter for Long-Running Queries (>10 Seconds)

db.currentOp({
  active: true,
  secs_running: { $gt: 10 }
})

This is useful if you want to isolate only the slow or stuck operations.

🧠 Step 5: Extract Only Key Fields from Long Queries

To get a more readable view, you can map just the relevant fields:

db.currentOp({
  active: true,
  secs_running: { $gt: 10 }
}).inprog.map(op => ({
  active: op.active,
  opid: op.opid,
  ns: op.ns,
  originatingCommand: op.originatingCommand,
  secs_running: op.secs_running
}));

🛑 Step 6: Kill a Long-Running Operation

Once you identify the opid of the slow query, you can terminate it with:

db.killOp(opid)

Example:

db.killOp(3427458)

This immediately stops the operation and frees up system resources.

🚀 Final Thoughts

Amazon DocumentDB can handle large-scale document operations, but heavy aggregation pipelines (especially self-joins with $lookup) can easily cause bottlenecks. By combining $lookup, db.currentOp(), and db.killOp(), you can simulate and manage real-world production issues like slow queries and runaway operations.

Use this workflow to monitor your workloads and keep your DocumentDB clusters responsive.

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert