Sharding distributes data across multiple replica sets (shards) for horizontal scale. Getting the shard key wrong is the most expensive mistake in MongoDB — resharding is disruptive and slow. This guide covers advanced sharding operations.

When to Shard

Shard when a single replica set cannot meet requirements:

Signal Threshold
Working set exceeds RAM Hot data + indexes > available memory
Write throughput Sustained writes exceeding single primary capacity
Storage Approaching single-node disk limits
I/O saturation Disk or network bottleneck on primary

Do not shard prematurely — a well-indexed replica set handles terabytes and thousands of ops/sec.

Sharded Cluster Components

                      ┌──────────────┐
                    │    mongos    │  ← clients connect here
                    │ (query router)│
                    └──────┬───────┘
                           │
           ┌───────────────┼───────────────┐
           ▼               ▼               ▼
    ┌────────────┐  ┌────────────┐  ┌────────────┐
    │  Shard A   │  │  Shard B   │  │  Shard C   │
    │ (repl set) │  │ (repl set) │  │ (repl set) │
    └────────────┘  └────────────┘  └────────────┘
           │               │               │
           └───────────────┼───────────────┘
                           ▼
                  ┌──────────────┐
                  │Config Servers│  ← metadata (repl set of 3)
                  └──────────────┘
  

Shard Key Selection

The shard key is immutable for a collection (pre-resharding). Choose carefully.

Good Shard Key Properties

  1. High cardinality — many distinct values
  2. Even distribution — writes spread across shards
  3. Query isolation — common queries target single shard

Shard Key Strategies

Hashed — even write distribution

  sh.shardCollection("analytics.events", { userId: "hashed" })
// Pros: even distribution
// Cons: range queries scatter to all shards
  

Compound — range queries + distribution

  sh.shardCollection("analytics.events", { userId: 1, timestamp: 1 })
// Pros: user-scoped queries hit one shard
// Cons: hot users create hotspots
  

Hashed compound (MongoDB 5.0+)

  sh.shardCollection("analytics.events", { userId: "hashed", timestamp: 1 })
  

Anti-Patterns

Bad Key Problem
{ _id: 1 } (default) Monotonic ObjectIds → all writes to last chunk
{ createdAt: 1 } Time-based hotspot on latest shard
{ status: 1 } Low cardinality — 3 values, 3 chunks max
{ isActive: 1 } Boolean — two shards at best

Enabling Sharding

  // Connect to mongos
mongosh --host mongos:27017

// Add shards (each is a replica set)
sh.addShard("rs-shard1/mongo1:27017,mongo2:27017,mongo3:27017")
sh.addShard("rs-shard2/mongo4:27017,mongo5:27017,mongo6:27017")

// Enable sharding on database
sh.enableSharding("myapp")

// Shard collections
sh.shardCollection("myapp.users", { userId: "hashed" })
sh.shardCollection("myapp.orders", { userId: 1, orderDate: 1 })

sh.status()
  

Chunk Management

Data is divided into chunks (default 128 MB range per shard key):

  // View chunks for a collection
use config
db.chunks.find({ ns: "myapp.orders" }).pretty()

// Split a chunk manually (rare — balancer handles this)
sh.splitAt("myapp.orders", { userId: ObjectId("..."), orderDate: ISODate("2024-06-01") })

// Move a chunk to specific shard
sh.moveChunk("myapp.orders", { userId: MinKey }, "rs-shard2")
  

Chunk Migration

The balancer migrates chunks between shards for even distribution:

  sh.getBalancerState()        // true = enabled
sh.isBalancerRunning()
sh.setBalancerState(false)   // disable during maintenance

// Configure migration window (off-peak only)
use config
db.settings.updateOne(
  { _id: "balancer" },
  { $set: { activeWindow: { start: "01:00", stop: "05:00" } } },
  { upsert: true }
)
  

Excessive migrations indicate poor shard key distribution.

Zone Sharding

Pin data ranges to specific shards for compliance and locality:

  // Define zones
sh.addShardToZone("rs-shard-us", "US")
sh.addShardToZone("rs-shard-eu", "EU")

// Assign ranges
sh.updateZoneKeyRange(
  "myapp.users",
  { country: MinKey, country: MaxKey },
  "US"  // default zone
)
sh.updateZoneKeyRange(
  "myapp.users",
  { country: "DE", country: "DE" },
  "EU"
)
sh.updateZoneKeyRange(
  "myapp.users",
  { country: "FR", country: "FR" },
  "EU"
)
  

Pre-split chunks in each zone before loading data to avoid migration storms.

Query Routing

  // Targeted query — single shard
db.orders.find({ userId: ObjectId("..."), orderDate: { $gte: date } })

// Scatter-gather — all shards
db.orders.find({ status: "pending" })

// Check which shards a query hits
db.orders.find({ userId: ObjectId("...") }).explain("executionStats")
// shardedDataDistribution shows targeted vs scatter
  

Design queries to include the shard key prefix for targeted routing.

Resharding (MongoDB 4.4+)

Change shard key without dump/restore:

  db.adminCommand({
  reshardingCollection: "myapp.orders",
  key: { orderId: "hashed" }
})
  

Resharding:

  • Runs in background but consumes I/O
  • Requires free disk space (~2x collection size)
  • Brief performance impact during cutover
  • Plan during maintenance window

Hotspot Diagnosis

Symptoms: One shard at 90% CPU while others idle.

  // Check per-shard stats
db.orders.getShardDistribution()

// Check chunk distribution
use config
db.chunks.aggregate([
  { $match: { ns: "myapp.orders" } },
  { $group: { _id: "$shard", count: { $sum: 1 } } }
])

// Current operations per shard
db.adminCommand({ currentOp: 1, $all: true })
  

Fixes:

  • Reshard with better key (hashed vs range)
  • Pre-split chunks for known hot ranges
  • Application-level sharding (separate collections per tenant)

Production Scenario: Multi-Tenant SaaS

  // Option 1: Tenant ID as prefix of shard key
sh.shardCollection("myapp.data", { tenantId: 1, createdAt: 1 })

// Option 2: Database per tenant (small tenants)
// myapp_tenant_abc, myapp_tenant_xyz — no cross-tenant queries

// Option 3: Hashed tenantId for even distribution
sh.shardCollection("myapp.data", { tenantId: "hashed" })
  

For large tenants, consider dedicated shards via zone sharding.

Config Server Operations

Config servers store cluster metadata — protect them:

  // Config server replica set (always 3 members)
rs.initiate({
  _id: "configReplSet",
  configsvr: true,
  members: [
    { _id: 0, host: "cfg1:27019" },
    { _id: 1, host: "cfg2:27019" },
    { _id: 2, host: "cfg3:27019" }
  ]
})
  

Never run production sharded clusters with fewer than 3 config servers.

Common Mistakes

  • Sharding before exhausting replica set capacity
  • Monotonic shard key causing write hotspots
  • Queries without shard key prefix — scatter-gather on every request
  • Disabling balancer permanently — uneven data distribution
  • Not pre-splitting chunks before bulk load — migration storm
  • Resharding without disk space planning

Troubleshooting

Balancer not running

  sh.getBalancerState()
// Check config server connectivity
// Check for active migrations blocking balancer
db.getSiblingDB("config").locks.find()
  

Chunk too large (> maxChunkSize)

  // Force split
sh.splitFind("myapp.orders", { userId: ObjectId("...") })
  

Jumbo chunk (cannot migrate)

  // Identify jumbo chunks
db.getSiblingDB("config").chunks.find({ ns: "myapp.orders", jumbo: true })
// Split manually or redesign shard key
  

Best Practices

  1. Choose shard key before loading production data
  2. Include shard key in all high-frequency queries
  3. Pre-split chunks for bulk imports
  4. Schedule balancer during off-peak hours
  5. Monitor per-shard metrics independently
  6. Test with production-scale data in staging
  7. Plan resharding strategy before going live

What Comes Next

Production operations covers backup, monitoring, upgrades, and incident response for sharded and replica set deployments.