Life of a Sharded Write by Randolph Tan
Sharding Concepts
Sharding concepts Shard0 Shard1 Shard2
Sharding concepts [minKey, -200) Shard0 Shard1 Shard2
Sharding concepts [minKey, -200) [-200, -100) Shard0 Shard1 Shard2
Sharding concepts Shard0 Shard1 Shard2 [minKey, -200) [-200, -100) [-100, 0) Shard0 Shard1 Shard2
Sharding concepts Shard0 Shard1 Shard2 [minKey, -200) [-200, -100) [0, 100) [-200, -100) [-100, 0) Shard0 Shard1 Shard2
Sharding concepts Shard0 Shard1 Shard2 [100, 200) [minKey, -200) [0, 100) [100, 200) [-200, -100) [-100, 0) Shard0 Shard1 Shard2
Sharding concepts Shard0 Shard1 Shard2 [100, 200) [minKey, -200) [0, 100) [200, maxKey) [100, 200) [-200, -100) [-100, 0) Shard0 Shard1 Shard2
Routing table min max owner MinKey -200 shard0 -100 shard2 100 200 100 200 shard1 MaxKey
Sharded Cluster Architecture shards mongos/ router sh0 Client App sh2 sh1 config servers
Sharded Single Doc Write db.foo.insert({ x: 1 }); sh0 shard0 mongos sh1 shard1 config servers
Sharded Single Doc Write Get routing table mongos sh1 shard1 config servers
Routing table db.foo.insert({ x: 1 }); min max owner MinKey -200 shard0 -100 shard2 100 200 shard1 MaxKey
Sharded Single Doc Write insert({ x: 1 }); shard0 mongos sh1 shard1 config servers
Chunk Migration Shard0 Shard1 Shard2 [100, 200) [minKey, -200) [0, 100) [200, maxKey) [100, 200) [-200, -100) [-100, 0) Shard0 Shard1 Shard2
Chunk Migration Shard0 Shard1 Shard2 [100, 200) [minKey, -200) [0, 100) [200, maxKey) [100, 200) [0, 100) [-200, -100) [-100, 0) Shard0 Shard1 Shard2
Sharding Version Protocol min max owner 1 MinKey -200 shard0 2 -100 shard2 3 4 100 5 200 shard1 6 MaxKey
Sharding Version Protocol nextVersion = max(routingTableVersions) + 1
Chunk Migration Shard0 Shard1 Shard2 [100, 200) [minKey, -200) [0, 100) [200, maxKey) [100, 200) [0, 100) [-200, -100) [-100, 0) Shard0 Shard1 Shard2
Sharding Version Protocol min max owner 1 MinKey -200 shard0 2 -100 shard2 3 4 -> 7 100 shard0 -> shard1 5 200 shard1 6 MaxKey
Sharded Single Doc Write db.foo.insert({ x: 1 }); shard0 w/ routing table v7 sh0 mongos shard1 w/ routing table v7 sh1 w/ routing table v6 config servers w/ routing table v7
Sharding Version Protocol min max owner 1 MinKey -200 shard0 2 -100 shard2 3 4 100 5 200 shard1 6 MaxKey
Sharded Single Doc Write w/ routing table v7 insert({ x: 1 }); @ v6 sh0 mongos shard1 w/ routing table v7 sh1 w/ routing table v6 config servers w/ routing table v7
Sharded Single Doc Write v6 != v7 shard0 w/ routing table v7 sh0 mongos shard1 w/ routing table v7 sh1 w/ routing table v6 config servers w/ routing table v7
Sharded Single Doc Write w/ routing table v7 sh0 update routing table mongos shard1 w/ routing table v7 sh1 w/ routing table v6 config servers w/ routing table v7
Sharding Version Protocol min max owner 1 MinKey -200 shard0 2 -100 shard2 3 7 100 shard1 5 200 6 MaxKey
Sharded Single Doc Write w/ routing table v7 sh0 insert({ x: 1 }); @v7 mongos shard1 w/ routing table v7 sh1 w/ routing table v7 config servers w/ routing table v7
Mirrored Config Servers mongos
Mirrored Config Servers update x mongos
Mirrored Config Servers 1 update x mongos
Mirrored Config Servers update x mongos 2
Mirrored Config Servers update x mongos 3
Mirrored Config Servers update x mongos
Mirrored Config Servers mongos
Mirrored Config Servers mongos
CSRS (Config Server as Replica Sets) Config servers mongos
Mirrored Config vs CSRS Recap Lost write inconsistency
Mirrored Config vs CSRS Recap Lost write inconsistency Single source of truth
Mirrored Config vs CSRS Recap Lost write inconsistency Single source of truth Read only when special server dies
Mirrored Config vs CSRS Recap Lost write inconsistency Single source of truth Read only when special server dies New primary gets elected
Mirrored Config vs CSRS Recap Lost write inconsistency Single source of truth Read only when special server dies New primary gets elected Complicated distributed lock
Mirrored Config vs CSRS Recap Lost write inconsistency Single source of truth Read only when special server dies New primary gets elected Complicated distributed lock Single server maintains lock
CSRS Challenges Possible to read data that may roll back Data on secondaries may be too stale
Rollback Example rTable v6 [0, 100) -> sh0 [-100, 0) -> sh2 P
Rollback Example rTable v7 [0, 100) -> sh0 [-100, 0) -> sh2 sh1
Rollback Example rTable v7 [0, 100) -> sh0 [-100, 0) -> sh1 P
Rollback Example rTable v7 [0, 100) -> sh0 [-100, 0) -> sh1
Rollback Example rTable v7 [0, 100) -> sh0 [-100, 0) -> sh1
Rollback Example rTable v7 [0, 100) -> sh0 [-100, 0) -> sh1
Rollback Example rTable v7 [0, 100) -> sh0 [-100, 0) -> sh1 R
Rollback Example rTable v6 [0, 100) -> sh0 [-100, 0) -> sh1 sh2
Rollback Example rTable v6 [0, 100) -> sh0 [-100, 0) -> sh2 S
Rollback Example rTable v7 [0, 100) -> sh1 [-100, 0) -> sh2 S
Rollback Example rTable v7 [0, 100) -> sh0 [-100, 0) -> sh1
New Feature db.runCommand({ find: ‘foo’, filter: { x: 1 }, readConcern: { level: ‘majority’ } });
Rollback Example Commited view rTable v6 [0, 100) -> sh0
CSRS Challenges Possible to read data that may roll back readConcern majority Data on secondaries may be too stale
Stale Reads Example v6 != v7 sh0 rTable v7 mongos rTable v6 sh1 Resume practice here rTable v6
Stale Reads Example sh0 rTable v7 mongos rTable v6 sh1 update routing Already up to date??? rTable v7 rTable v6
New Feature (internal use only) db.runCommand({ find: ‘foo’, filter: { x: 1 }, readConcern: { level: ‘majority’, afterOpTime: <opTime> }});
Read after OpTime Example rTable v6, opTime t9 sh0 rTable v7 mongos rTable v6 sh1 rTable v7 rTable v7 rTable v6
Read after OpTime Example rTable v7, opTime t11 sh0 rTable v7 mongos rTable v6 sh1 rTable v7 rTable v7 rTable v6
Read after OpTime Example sh0 rTable v7 mongos sh1 rTable v7 Wait until t >= t11; update routing table rTable v7 rTable v6
Read after OpTime Example sh0 rTable v7 mongos sh1 rTable v7 Wait until t >= t11; update routing table rTable v7 rTable v7
Sharding Version Protocol min max owner 1 MinKey -200 shard0 2 -100 shard2 3 7 100 shard1 5 200 6 MaxKey
Read after OpTime Example sh0 rTable v7 insert({ x: 1 }); mongos sh1 rTable v7 rTable v7 rTable v7
CSRS Challenges Possible to read data that may roll back readConcern majority Data on secondaries may be too stale readConcern afterOpTime
New in v3.2 Config servers can be replica sets (required in v3.4) New feature: readConcern
Future Possibilities afterOpTime functionality outside of internal code
Further reading https://docs.mongodb.com read concern replica set rollback sharding concepts
Questions?
International Offices Market Size $36 Billion International Offices 15 Partners 1,000+ Global Employees 575+ Downloads Worldwide 15,000,000+ Make a GIANT Impact www.mongodb.com/careers