UW-Madison Computer Sciences Multifacet Group© 2011 Karma: Scalable Deterministic Record-Replay Arkaprava Basu Jayaram Bobba Mark D. Hill Work done at University of Wisconsin-Madison
Executive summary Applications of deterministic record-replay –Debugging –Fault tolerance –Security Existing hardware record-replayer –Fast record but –Slow replay or –Requires major hardware changes Karma: Faster Replay with nearly- conventional h/w –Extends Rerun –Records more parallelism 2
Outline Background & Motivation Rerun Overview Karma Insights Karma Implementation Evaluation Conclusion 3
Deterministic Record-Replay Multi-threaded execution non-deterministic Deterministic record-replay to reincarnate past execution Record: –Record selective events in a log Replay: –Use the log to reincarnate past execution Key Challenge: Memory races 4
Record-Replay Motivation Debugging –Ensures bugs faithfully reappear (no heisenbugs) Fault-Tolerance –Enable hot backup for primary server to shadow primary & take over on failure Security –Real time intrusion detection & attack analysis Replay speed matters 5
Previous work Record Dependence –Wisconsin Flight Data Recorder [ISCA’03,etc.]: Too much state –UCSD Strata [ASPLOS’06]: Log size grows rapidly w #cores Record Independence –UIUC DeLorean [ISCA’08]: Non-conventional BulkSC H/W –Wisconsin Rerun [ISCA’08]: Sequential replay –Intel MRR [MICRO’09]: Only for snoop based systems –Timetraveler [ISCA’10]: Extends Rerun to lower log size Our Goal –Retain Rerun’s near-conventional hardware –Enable Faster Replay 6
Outline Background & Motivation Rerun Overview Karma Insights Karma Implementation Evaluation Conclusion 7
Rerun’s Recording Most code executes without races –Use race-free regions for ordering Episodes: independent execution regions –Defined per thread T0 T1 LD A ST B ST C LD F ST E LD B ST X LD R ST T LD X T2 ST V ST Z LD W LD J ST C LD Q LD J ST Q ST E ST K LD Z LD V ST X Partially adopted from ISCA’08 talk 8
23 Rerun’s Recording (Contd.) Capturing causality: –Timestamp via Lamport scalar clock [Lamport ‘78] Replay in timestamp order –Episodes with same timestamp can be replayed in parallel T0T1T2 9
Rerun’s Replay T0T1T TS=22 TS=45 TS=44 TS=43 TS=60 TS=61 10
Outline Background & Motivation Rerun Overview Karma Insights Karma Implementation Evaluation Conclusion 11
Karma’s Insight 1: Capture order with DAG (not scalar clock) Recording: DAG captured with episode predecessor & successor sets T0T1T2 12
Karma’s Insight 1: T0T1T T0T1T Rerun’s Replay Karma’s Replay 13
Karma’s Insight 1: (Contd.) Naïve approach: DAG arcs point to episodes –E–Episode represented by integers –T–Too much log size overhead !! Our approach: DAG arcs point to cores –R–Recording: Only one “active” episode per core –R–Replay: Send wakeup message(s) to core(s) of successor episode(s) 14
Karma’s Insight 1: T0T1T |0|1 0|0|1 Anatomy of a log entry 15
Each log entry: Karma’s Insight 1: (Contd.) REFS Count Predecessor Successor 16
Not necessary to end the episode on every conflict: –As long as the episodes can be ordered during replay ST B ST C Karma Insight 2: T0 T1 LD A LD F ST E LD B ST X LD R ST T LD X T2 ST V ST Z LD W LD J ST C LD Q LD J ST Q ST E ST K LD Z LD V ST X 17
Outline Background & Motivation Rerun Overview Karma Insights Karma Implementation Evaluation Conclusion 18
Karma’s Per-Core State Karma Hardware Data Tags Directory Coherence Controller L1 I L1 D Pipeline L2 0 L2 1 L2 14 L2 15 Core 15 Interconnect DRAM … Core 14 Core 1 Core 0 … Base System Rerun L2/Memory State Total State: 148 bytes/core Address Filter(FLT) Reference (REFS ) Predecessor(PRED) Successor(SUCC ) Timestamp(TS ) 19
Outline Background & Motivation Rerun Overview Karma Insights Karma Implementation Evaluation Conclusion 20
Evaluation: Were we able to speed up the replay? 21
Evaluation: Were we able to speed up the replay? On Average ~4X improvement in replay speed over Rerun 22
Evaluation Did we blowup log size? On average Karma does not increase the size of the log but instead improves it by as much as 40% as we allow larger episodes 23
Outline Background & Motivation Rerun Overview Karma Insights Karma Implementation Evaluation Conclusion 24
Conclusion Applications of deterministic replay –Debugging –Fault tolerance –Security Existing hardware record-replayer –Slow replay or –Requires major hardware changes Karma: Faster Replay with nearly-conventional h/w –Extends Rerun –Uses DAG instead of Scalar clock –Extend episodes past conflicts Widen Application + Lower Cost More Attractive 25
Questions? 26