to navigate

to select

to close

On this page

MongoDB Sharding Deep Dive

Sharding distributes data across multiple replica sets (shards) for horizontal scale. Getting the shard key wrong is the most expensive mistake in MongoDB — resharding is disruptive and slow. This guide covers advanced sharding operations.

When to Shard

Shard when a single replica set cannot meet requirements:

Signal	Threshold
Working set exceeds RAM	Hot data + indexes > available memory
Write throughput	Sustained writes exceeding single primary capacity
Storage	Approaching single-node disk limits
I/O saturation	Disk or network bottleneck on primary

Do not shard prematurely — a well-indexed replica set handles terabytes and thousands of ops/sec.

Sharded Cluster Components

                      ┌──────────────┐
                    │    mongos    │  ← clients connect here
                    │ (query router)│
                    └──────┬───────┘
                           │
           ┌───────────────┼───────────────┐
           ▼               ▼               ▼
    ┌────────────┐  ┌────────────┐  ┌────────────┐
    │  Shard A   │  │  Shard B   │  │  Shard C   │
    │ (repl set) │  │ (repl set) │  │ (repl set) │
    └────────────┘  └────────────┘  └────────────┘
           │               │               │
           └───────────────┼───────────────┘
                           ▼
                  ┌──────────────┐
                  │Config Servers│  ← metadata (repl set of 3)
                  └──────────────┘

Shard Key Selection

The shard key is immutable for a collection (pre-resharding). Choose carefully.

Good Shard Key Properties

High cardinality — many distinct values
Even distribution — writes spread across shards
Query isolation — common queries target single shard

Shard Key Strategies

Hashed — even write distribution

  sh.shardCollection("analytics.events", { userId: "hashed" })
// Pros: even distribution
// Cons: range queries scatter to all shards

Compound — range queries + distribution

  sh.shardCollection("analytics.events", { userId: 1, timestamp: 1 })
// Pros: user-scoped queries hit one shard
// Cons: hot users create hotspots

Hashed compound (MongoDB 5.0+)

  sh.shardCollection("analytics.events", { userId: "hashed", timestamp: 1 })

Anti-Patterns

Bad Key	Problem
`{ _id: 1 }` (default)	Monotonic ObjectIds → all writes to last chunk
`{ createdAt: 1 }`	Time-based hotspot on latest shard
`{ status: 1 }`	Low cardinality — 3 values, 3 chunks max
`{ isActive: 1 }`	Boolean — two shards at best

Enabling Sharding

  // Connect to mongos
mongosh --host mongos:27017

// Add shards (each is a replica set)
sh.addShard("rs-shard1/mongo1:27017,mongo2:27017,mongo3:27017")
sh.addShard("rs-shard2/mongo4:27017,mongo5:27017,mongo6:27017")

// Enable sharding on database
sh.enableSharding("myapp")

// Shard collections
sh.shardCollection("myapp.users", { userId: "hashed" })
sh.shardCollection("myapp.orders", { userId: 1, orderDate: 1 })

sh.status()

Chunk Management

Data is divided into chunks (default 128 MB range per shard key):

  // View chunks for a collection
use config
db.chunks.find({ ns: "myapp.orders" }).pretty()

// Split a chunk manually (rare — balancer handles this)
sh.splitAt("myapp.orders", { userId: ObjectId("..."), orderDate: ISODate("2024-06-01") })

// Move a chunk to specific shard
sh.moveChunk("myapp.orders", { userId: MinKey }, "rs-shard2")

Chunk Migration

The balancer migrates chunks between shards for even distribution:

  sh.getBalancerState()        // true = enabled
sh.isBalancerRunning()
sh.setBalancerState(false)   // disable during maintenance

// Configure migration window (off-peak only)
use config
db.settings.updateOne(
  { _id: "balancer" },
  { $set: { activeWindow: { start: "01:00", stop: "05:00" } } },
  { upsert: true }
)

Excessive migrations indicate poor shard key distribution.

Zone Sharding

Pin data ranges to specific shards for compliance and locality:

  // Define zones
sh.addShardToZone("rs-shard-us", "US")
sh.addShardToZone("rs-shard-eu", "EU")

// Assign ranges
sh.updateZoneKeyRange(
  "myapp.users",
  { country: MinKey, country: MaxKey },
  "US"  // default zone
)
sh.updateZoneKeyRange(
  "myapp.users",
  { country: "DE", country: "DE" },
  "EU"
)
sh.updateZoneKeyRange(
  "myapp.users",
  { country: "FR", country: "FR" },
  "EU"
)

Pre-split chunks in each zone before loading data to avoid migration storms.

Query Routing

  // Targeted query — single shard
db.orders.find({ userId: ObjectId("..."), orderDate: { $gte: date } })

// Scatter-gather — all shards
db.orders.find({ status: "pending" })

// Check which shards a query hits
db.orders.find({ userId: ObjectId("...") }).explain("executionStats")
// shardedDataDistribution shows targeted vs scatter

Design queries to include the shard key prefix for targeted routing.

Resharding (MongoDB 4.4+)

Change shard key without dump/restore:

  db.adminCommand({
  reshardingCollection: "myapp.orders",
  key: { orderId: "hashed" }
})

Resharding:

Runs in background but consumes I/O
Requires free disk space (~2x collection size)
Brief performance impact during cutover
Plan during maintenance window

Hotspot Diagnosis

Symptoms: One shard at 90% CPU while others idle.

  // Check per-shard stats
db.orders.getShardDistribution()

// Check chunk distribution
use config
db.chunks.aggregate([
  { $match: { ns: "myapp.orders" } },
  { $group: { _id: "$shard", count: { $sum: 1 } } }
])

// Current operations per shard
db.adminCommand({ currentOp: 1, $all: true })

Fixes:

Reshard with better key (hashed vs range)
Pre-split chunks for known hot ranges
Application-level sharding (separate collections per tenant)

Production Scenario: Multi-Tenant SaaS

  // Option 1: Tenant ID as prefix of shard key
sh.shardCollection("myapp.data", { tenantId: 1, createdAt: 1 })

// Option 2: Database per tenant (small tenants)
// myapp_tenant_abc, myapp_tenant_xyz — no cross-tenant queries

// Option 3: Hashed tenantId for even distribution
sh.shardCollection("myapp.data", { tenantId: "hashed" })

For large tenants, consider dedicated shards via zone sharding.

Config Server Operations

Config servers store cluster metadata — protect them:

  // Config server replica set (always 3 members)
rs.initiate({
  _id: "configReplSet",
  configsvr: true,
  members: [
    { _id: 0, host: "cfg1:27019" },
    { _id: 1, host: "cfg2:27019" },
    { _id: 2, host: "cfg3:27019" }
  ]
})

Never run production sharded clusters with fewer than 3 config servers.

Common Mistakes

Sharding before exhausting replica set capacity
Monotonic shard key causing write hotspots
Queries without shard key prefix — scatter-gather on every request
Disabling balancer permanently — uneven data distribution
Not pre-splitting chunks before bulk load — migration storm
Resharding without disk space planning

Troubleshooting

Balancer not running

  sh.getBalancerState()
// Check config server connectivity
// Check for active migrations blocking balancer
db.getSiblingDB("config").locks.find()

Chunk too large (> maxChunkSize)

  // Force split
sh.splitFind("myapp.orders", { userId: ObjectId("...") })

Jumbo chunk (cannot migrate)

  // Identify jumbo chunks
db.getSiblingDB("config").chunks.find({ ns: "myapp.orders", jumbo: true })
// Split manually or redesign shard key

Best Practices

Choose shard key before loading production data
Include shard key in all high-frequency queries
Pre-split chunks for bulk imports
Schedule balancer during off-peak hours
Monitor per-shard metrics independently
Test with production-scale data in staging
Plan resharding strategy before going live

What Comes Next

Production operations covers backup, monitoring, upgrades, and incident response for sharded and replica set deployments.

Backup and Recovery

MongoDB Production Operations

MongoDB Sharding Deep Dive

When to Shard link

Sharded Cluster Components link

Shard Key Selection link

Good Shard Key Properties link

Shard Key Strategies link

Anti-Patterns link

Enabling Sharding link

Chunk Management link

Chunk Migration link

Zone Sharding link

Query Routing link

Resharding (MongoDB 4.4+) link

Hotspot Diagnosis link

Production Scenario: Multi-Tenant SaaS link

Config Server Operations link

Common Mistakes link

Troubleshooting link

Best Practices link

What Comes Next link