Production MongoDB runs on replica sets (high availability) and optionally sharded clusters (horizontal scale). This guide covers configuration, failover behavior, read scaling, and sharding fundamentals.

Replica Set Architecture

                      ┌─────────────┐
                    │   PRIMARY   │ ← writes, default reads
                    │  mongo1:27017│
                    └──────┬──────┘
                           │ replication
              ┌────────────┼────────────┐
              ▼            ▼            ▼
        ┌──────────┐ ┌──────────┐ ┌──────────┐
        │SECONDARY │ │SECONDARY │ │ ARBITER  │
        │mongo2    │ │mongo3    │ │ (opt.)   │
        └──────────┘ └──────────┘ └──────────┘
              ↑            ↑
         read scaling   failover candidate
  
  • Primary — accepts all writes, replicates oplog to secondaries
  • Secondary — applies oplog, can serve reads with read preference
  • Arbiter — votes in elections only, holds no data (use sparingly)

Minimum production replica set: 3 data-bearing members (or 2 + arbiter for dev only).

Initialize a Replica Set

  // Connect to first member
mongosh --host mongo1:27017

rs.initiate({
  _id: "rs0",
  members: [
    { _id: 0, host: "mongo1:27017", priority: 2 },
    { _id: 1, host: "mongo2:27017", priority: 1 },
    { _id: 2, host: "mongo3:27017", priority: 1 }
  ]
})

rs.status()
// Wait until one member shows "PRIMARY", others "SECONDARY"
  

Each member needs matching replSetName in mongod.conf:

  replication:
  replSetName: rs0
  

Connection Strings

  mongodb://mongo1:27017,mongo2:27017,mongo3:27017/myapp?replicaSet=rs0
  

The driver discovers the primary automatically and fails over on election.

Read Preferences

Preference Behavior Use Case
primary Primary only (default) Strong consistency
primaryPreferred Primary, fallback to secondary HA with consistency preference
secondary Secondary only Analytics, reporting
secondaryPreferred Secondary, fallback to primary Read scaling
nearest Lowest latency member Multi-region
  // mongosh
db.users.find().readPref("secondaryPreferred")

// Node.js driver
const client = new MongoClient(uri, {
  readPreference: ReadPreference.SECONDARY_PREFERRED
})
  

Caution: reads from secondaries may be stale (replication lag). Never read your own writes from secondaries without afterClusterTime.

Write Concern

  db.orders.insertOne(
  { total: 99.99 },
  { writeConcern: { w: "majority", j: true, wtimeout: 5000 } }
)
  
Setting Meaning
w: 1 Acknowledged by primary only
w: "majority" Replicated to majority of voting members
j: true Written to journal on disk

Use w: "majority" for durability; w: 1 for lower latency with accepted risk.

Failover and Elections

When the primary fails, secondaries hold an election:

  rs.status()  // identify current PRIMARY

// Adjust member priority
cfg = rs.conf()
cfg.members[1].priority = 3  // prefer mongo2 as primary
rs.reconfigure(cfg)
  

Election triggers:

  • Primary unreachable for ~10 seconds
  • Manual rs.stepDown()
  • Priority changes

Typical failover: 10–30 seconds for election + client rediscovery.

Adding and Removing Members

  // Add a secondary
rs.add("mongo4:27017")

// Remove a member ( drain first in production )
rs.remove("mongo4:27017")

// Step down primary for maintenance
rs.stepDown(60)  // 60 seconds
  

Always add/remove members during maintenance windows with monitoring.

Monitoring Replication

  rs.printSecondaryReplicationInfo()
// Shows lag per secondary

rs.status().members.forEach(m => {
  print(m.name, m.stateStr, m.optimeDate)
})

// Lag in seconds
db.adminCommand({ replSetGetStatus: 1 }).members
  

High lag causes:

  • Network issues between members
  • Secondary disk slower than primary
  • Heavy writes overwhelming replication
  • Index builds on secondary

Sharded Cluster Overview

For datasets exceeding single-node capacity:

  Clients → mongos (query router) → Shards (replica sets)
                ↓
         Config servers (metadata)
  
Component Role
mongos Query router — clients connect here
Config servers Store cluster metadata (replica set of 3)
Shards Replica sets holding data subsets

Enable Sharding

  // Connect to mongos
mongosh --host mongos:27017

sh.enableSharding("analytics")

// Hashed shard key — even distribution
sh.shardCollection("analytics.events", { userId: "hashed" })

// Range shard key — range queries, hotspot risk
sh.shardCollection("analytics.logs", { timestamp: 1 })

sh.status()
  

Choosing a Shard Key

Good shard keys:

  • High cardinality (many distinct values)
  • Even distribution of reads and writes
  • Supports common query patterns (targeted queries hit one shard)

Bad shard keys:

  • Monotonically increasing (_id, timestamp alone) — all writes to one shard
  • Low cardinality (boolean, country with 5 values)
  // Compound shard key for time-series with even distribution
sh.shardCollection("analytics.events", { userId: 1, timestamp: 1 })
  

Zone Sharding

Route data to geographic shards for data locality and compliance:

  sh.addShardToZone("shard-us", "US")
sh.addShardToZone("shard-eu", "EU")

sh.updateZoneKeyRange(
  "myapp.users",
  { country: "US", country: "US" },
  "US"
)
sh.updateZoneKeyRange(
  "myapp.users",
  { country: "DE", country: "DE" },
  "EU"
)
  

GDPR data stays in EU shards; US data in US shards.

Balancer

The balancer migrates chunks between shards for even distribution:

  sh.getBalancerState()       // true = enabled
sh.isBalancerRunning()
sh.setBalancerState(false)  // disable during maintenance
  

Monitor chunk migration — excessive migrations indicate poor shard key choice.

Production Scenario: 3-Region Deployment

  Region US-East:  Primary + Secondary (rs0)
Region US-West:  Secondary (rs0)
Region EU:       Secondary (rs0) + zone sharding
  
  • Writes go to US-East primary
  • Local reads from nearest secondary
  • EU data zone-sharded to EU shard

Common Mistakes

  • Running standalone mongod in production — no failover, no transactions
  • Using arbiters instead of data-bearing members in production
  • Reading from secondaries without handling replication lag
  • Choosing monotonic shard key — creates write hotspots
  • Sharding too early — replica set handles most workloads to TB scale
  • Not monitoring replication lag — stale reads and delayed failover

Troubleshooting

Replica set won’t elect primary

  rs.status()  // check for RECOVERING, STARTUP2, or DOWN members
// Ensure majority of members are reachable
  

StepDown fails — “not primary”

Primary already lost — check rs.status() for current state.

Chunk migration stuck

  sh.stopBalancer()
// Fix underlying issue (disk space, network)
sh.setBalancerState(true)
  

Best Practices

  1. Always deploy 3+ data-bearing replica set members
  2. Use w: "majority" for critical writes
  3. Monitor replication lag with alerts (> 30 seconds)
  4. Test failover quarterly — verify application reconnects
  5. Shard only when single replica set exceeds capacity
  6. Choose shard key before loading data — resharding is costly

What Comes Next

Performance optimization builds on replication awareness — read preferences and write concerns directly affect latency and consistency.