Rex: Replication at the Speed of Multi-core Zhenyu Guo, Chuntao Hong, Dong Zhou*, Mao Yang, Lidong Zhou, Li Zhuang Microsoft ResearchCMU* 1
Tension between Replication and Multi- core Most applications are multi-threaded But, to replicate, you can only use single-thread Sacrifices performance for replication 2 Database Lock Server File Server Key-value Stores Multi-core Replication
Rex: Replication at the Speed of Multi-core 3 Replication Multi-core
Outline Motivation System Overview Implementation Evaluation 4
State Machine Replication To replicate a service: 1.Model as deterministic state machine 2.Order requests with consensus protocol 3.Execute with single-thread 5 Consensus Server Sequential Execution Consistent States Parallel Execution Multi-core Server Inconsistent States requests
Why Multi-thread Breaks State Machine Replication Non-deterministic decisions: locking order, etc… Replicas make decisions independently 6 Server 1 Server 2 Performance Consistency
Rex: Execute-Agree-Follow 7 Primary Traces Consensus Traces Secondary Execute AgreeFollow
Programming With Rex 1.Model app as RexRSM 2.Use Rex to make non-deterministic decisions RexLocks, RexCond, … RexTimeStamp, RexRand, etc. 8
Outline Motivation System Overview Implementation Evaluation 9
Normal Execution: Primary 10 2 lockA 3 unlockA 4 1 request 1 1 request 2 2 lockA reply unlockA reply 2 Primary Trace: (t1, 1, request 1) … Causal edge((t1, 3)->(t2, 2)) … (t1, 4, reply 1)... …
Normal Execution: Secondary 11 1 request 1 1 request 2 lockA 2 3 unlockA 4 reply 1 Secondary (t1, 1, request 1) … Causal edge((t1, 3)->(t2, 2)) … (t1, 4, reply 1)... … 3 4 unlockA reply 2 2 waited event
Primary Failover Primary restart from checkpoint rejoin Secondary upgrade to primary switch replay -> record 12 Committed Uncommitted Crash
Unique Challenges: Integrating Replication and Record/Replay Inconsistency cut “Holes” in logs Causal edge pruning Hybrid execution … 13
The Inconsistent Cut Problem Collects logs at each thread asynchronously Inconsistent cut contains destination nodes without source node Problem: not be able to follow 14
Solving Inconsistent Cut Problem Define consensus on last consistent cut Drop C1-C2 when primary fail Reply only when reply contained in a committed consistent cut Use vector clock to track 15
Outline Motivation System Overview Implementation Evaluation 16
Experiment Setup Real-world Applications Micro-benchmark: for lock contention ratio Servers: 12-core, 24-thread, 10GE network 17 AppDescription ThumbnailGenerating and storing thumbnails XLockLock server similar to Chubby File ServerFile server Kyoto CabinetKey-value store LevelDBLocal storage behind BigTable MemCachedCache server
Performance Overview 18 Rex scales as nonreplicated <24% overhead
LevelDB in Detail 19 # cores Waited events grows with # threads, so does overhead overhead drops with more threads to schedule
Lock Conflict Ratio 20 Overhead < 15%
Summary Rex: execute-agree-follow Applied to six real-world applications Preserves scalability and low overhead 21
Thanks! Q&A 22
Backups 23
Dealing with Data Races Reply logging & compare Resource version checking Lock-free data structures: NATIVE_EXEC Experience shows that getting rid of data races is doable 24
Workloads Thumbnail: 1 pic per request K-V stores: 1M pairs 16 byte key, 100 byte value 10% write File system: 16KB random requests 20% write Xlock: 90% lease renew 100B – 5KB file 25
Lock Granularity 26
Request Granularity 27 10% computation in locks 1% conflict ratio
Experimental Results: Scalability 28
Causal Events & Performance 29
Improving Performance: Causal Edge Pruning with Vector Clock More causal edges, more overhead Causal edge pruning: trades primary performance for secondary 30 Reduces 58% ~ 99% causal edges
Replicated State Machine 31
Rex: Causal Order Replication 32
Correctness Correctness guaranteed by: 1.Captures all non-determinism with Rex 2.Consensus on traces 3.Agreed trace is a continuous sequence (no holes) 33
Inconsistent Cut: Why Is It Bad? 34 Trace: t1 unlock -> t2 lock -> t2 unlock -> t3 lock reply: 0 Replay: t1 unlock -> t3 lock -> t3 unlock -> t2 lockreply: 1 Should we reply 0 or 1?
Inconsistent Cut: Solving the Reply Problem Reply only when reply and all its dependencies are committed Use a vector clock to detect 35