Performance tuning in MongoDB follows a systematic methodology. Hardware upgrades cannot fix missing indexes or poor schema design — start with measurement, then optimize.

Tuning Methodology

  1. Measure — identify bottlenecks with metrics, profiler, and explain()
  2. Hypothesize — missing index? cache too small? lock contention? hot shard?
  3. Change one thing — isolate impact
  4. Verify — compare before/after with explain("executionStats") and benchmarks
  5. Document — record settings and rationale

Never tune randomly — every change should trace to measured evidence.

WiredTiger Cache

The cache is MongoDB’s in-memory working set — the single most important setting on dedicated servers.

  # mongod.conf
storage:
  wiredTiger:
    engineConfig:
      cacheSizeGB: 4
  

Rule of thumb: 50% of RAM minus OS and other processes on a dedicated DB server.

  // Monitor cache usage
const wt = db.serverStatus().wiredTiger.cache;
print("Cache used:", wt["bytes currently in the cache"] / 1e9, "GB");
print("Cache dirty:", wt["tracked dirty bytes in the cache"] / 1e9, "GB");
  

If cache hit ratio is low and disk I/O is high, increase cache or reduce working set size.

Index Strategy

  // Match query filter + sort in compound index (ESR rule)
db.orders.createIndex({ userId: 1, createdAt: -1 })

// Verify index usage
db.orders.find({ userId: ObjectId("...") })
  .sort({ createdAt: -1 })
  .explain("executionStats")
// Look for: stage: "IXSCAN", totalDocsExamined ≈ nReturned
  

Index Anti-Patterns

Problem Symptom Fix
Missing index COLLSCAN, high docsExamined Create matching index
Wrong field order IXSCAN but high docsExamined Reorder compound index
Too many indexes Slow writes Drop unused via $indexStats
Index not in RAM Slow queries despite index Reduce index size or add RAM

Query Optimization

Use Projection

  // Bad — returns full 50 KB documents
db.users.find({ status: "active" })

// Good — returns only needed fields
db.users.find({ status: "active" }, { name: 1, email: 1, _id: 0 })
  

Keyset Pagination

  // Bad — skip(10000) scans 10,000 documents
db.logs.find().sort({ _id: 1 }).skip(10000).limit(50)

// Good — cursor-based
db.logs.find({ _id: { $gt: lastId } }).sort({ _id: 1 }).limit(50)
  

Avoid Unanchored Regex

  // Bad — collection scan
db.users.find({ name: { $regex: "alice", $options: "i" } })

// Better — anchored with index
db.users.find({ name: { $regex: "^alice", $options: "i" } })

// Best — use text index or Atlas Search
db.users.find({ $text: { $search: "alice" } })
  

Aggregation Optimization

  db.orders.aggregate([
  { $match: { status: "completed", createdAt: { $gte: startDate } } },  // filter first
  { $project: { userId: 1, total: 1 } },                                // reduce size
  { $group: { _id: "$userId", revenue: { $sum: "$total" } } },
  { $sort: { revenue: -1 } },
  { $limit: 100 }
], { allowDiskUse: true })
  

Rules:

  • $match and $project as early as possible
  • $limit before $lookup when you need top-N
  • allowDiskUse: true for large sorts/groups

Write Performance

Bulk Operations

  db.products.bulkWrite(operations, { ordered: false })
// ordered: false — parallel execution, continue on errors
  

Write Concern Trade-offs

Concern Latency Durability
{ w: 1 } Lowest Primary only
{ w: "majority" } Medium Majority replicated
{ w: "majority", j: true } Highest Journaled on majority

Use { w: 1 } for high-throughput logging where occasional loss is acceptable.

Unordered Bulk Inserts

  db.events.insertMany(documents, { ordered: false, writeConcern: { w: 1 } })
  

Fastest ingestion pattern for analytics and logging pipelines.

Connection Management

  // Node.js — one client, connection pool
const client = new MongoClient(uri, {
  maxPoolSize: 100,
  minPoolSize: 10,
  maxIdleTimeMS: 30000
})
  

Do not create a new connection per request — use pooling. Monitor:

  db.serverStatus().connections
// { current: 45, available: 819, totalCreated: 1234 }
  

Monitoring Tools

mongostat — Live Throughput

  mongostat --uri "mongodb://host:27017" 5
# inserts/s, queries/s, updates/s, deletes/s, flushes, faults
  

High faults = data not in cache, disk reads increasing.

mongotop — Collection-Level I/O

  mongotop --uri "mongodb://host:27017" 5
# Time spent reading/writing per collection
  

currentOp — Active Operations

  db.currentOp({
  active: true,
  secs_running: { $gt: 5 },
  op: { $in: ["query", "command", "update", "insert"] }
})
  

Kill long-running operations:

  db.killOp(opid)
  

Profiler

  db.setProfilingLevel(1, { slowms: 100 })  // log queries > 100ms
db.system.profile.find().sort({ ts: -1 }).limit(10)
db.setProfilingLevel(0)  // disable when done
  

Production Scenarios

High-Read Application

  • Read from secondaries with secondaryPreferred
  • Ensure indexes cover all query patterns
  • Cache hot data in Redis for sub-millisecond reads
  • Use projection aggressively

High-Write Pipeline

  • { w: 1 } write concern for ingestion
  • insertMany with ordered: false
  • Bucket pattern or time-series collections
  • Shard by hashed key for even write distribution

Mixed Workload

  • Separate analytics to secondary or dedicated analytics node
  • Schedule index builds during low-traffic windows
  • Monitor globalLock — high queue length indicates contention

Common Mistakes

  • Scaling hardware before fixing query patterns
  • Creating indexes without testing with explain()
  • Using skip() for pagination at scale
  • Not monitoring replication lag when reading from secondaries
  • Setting WiredTiger cache to 90%+ of RAM — OS needs memory too
  • Running profiler level 2 (all ops) in production — massive overhead

Troubleshooting Slow Queries

  // Step 1: explain
db.collection.find(filter).sort(sort).explain("executionStats")

// Step 2: check if index exists
db.collection.getIndexes()

// Step 3: check index usage stats
db.collection.aggregate([{ $indexStats: {} }])

// Step 4: check collection size vs cache
db.collection.stats().size / db.serverStatus().wiredTiger.cache["maximum bytes configured"]
  

Best Practices

  1. Design schema and indexes together from access patterns
  2. Set slowms in profiler and review weekly
  3. Monitor cache hit ratio and page faults
  4. Load test with production-scale data before launch
  5. Document performance baselines for regression detection
  6. Use Atlas Performance Advisor for automated index suggestions

What Comes Next

Change streams and transactions add real-time and ACID capabilities — understand their performance implications before adopting widely.