Presentation is loading. Please wait.

Presentation is loading. Please wait.

Robust Non-Intrusive Record-Replay with Processor Extraction Filippo Gioachin Gengbin Zheng Laxmikant Kalé Parallel Programming Laboratory Departement.

Similar presentations


Presentation on theme: "Robust Non-Intrusive Record-Replay with Processor Extraction Filippo Gioachin Gengbin Zheng Laxmikant Kalé Parallel Programming Laboratory Departement."— Presentation transcript:

1 Robust Non-Intrusive Record-Replay with Processor Extraction Filippo Gioachin Gengbin Zheng Laxmikant Kalé Parallel Programming Laboratory Departement of Computer Science University of Illinois at Urbana-Champaign

2 13 Aprile 2010Filippo Gioachin - UIUC2 Outline ● Motivations ● Three-step Procedure – Detecting failure ● Performance – Benchmarks and real applications ● Conclusions

3 13 Aprile 2010Filippo Gioachin - UIUC3 Motivations ● Intermittent problem – Only certain event orderings cause the problem ● Problems may not appear at small scale – Races between messages ● Latencies in the underlying hardware – Incorrect messaging – Data decomposition

4 13 Aprile 2010Filippo Gioachin - UIUC4 Problems at Large Scale ● Infeasible – Debugger needs to handle many processors – Human can be overwhelmed by information – Long waiting time in queue – Machine not available ● Expensive – Large machine allocation consume a lot of computational resources

5 13 Aprile 2010Filippo Gioachin - UIUC5 Do we need all the processors? ● The problem manifests itself on a single processor – If more than one, they are equivalent ● The cause can span multiple processors (causally related) – The subset is generally much smaller than the whole system ● Select the interesting processors and ignore the others

6 13 Aprile 2010Filippo Gioachin - UIUC6 Fighting non-determinism ● Record all data processed by each processor – Huge volume of data stored – High interference with application ● Likely the bug will not appear... – Need to run a non-optimized code ● Record only message ordering – Based on piecewise deterministic assumption – Must re-execute using the same machine

7 13 Aprile 2010Filippo Gioachin - UIUC7 Three-step Procedure for Processor Extraction Execute program recording message ordering Replay application with detailed recording enabled Replay selected processors as stand-alone Is problem solved? Done Select processor s to record Step1 Step2 Step3 Has bug appear ed? Minimize perturbation (few bytes per message) ● Iterate for incremental extraction ● Use message ordering to guarantee determinism ● Can execute in the virtualized environment Robust Record- Replay with Processor Extraction * F. Gioachin, G. Zheng, L.V. Kalé: "Robust Record- Replay with Processor Extraction" in Proceedings of the Workshop on Parallel and Distributed Systems: Testing, Analysis, and Debugging (PADTAD – VIII), 2010 Use r

8 13 Aprile 2010Filippo Gioachin - UIUC8 What if the piecewise deterministic assumption is not met? ● Make sure to detect it, and notify the user ● If all messages are identical, then we can assume the non-determinism was captured ● Methods to detect failure: – Message size and destination – Checksum of the whole message (XOR, CRC32)

9 13 Aprile 2010Filippo Gioachin - UIUC9 Computing Checksums ● Checksum considers memory as raw data, ignores what it contains – Pointers – Garbage ● Uninitialized fields ● Compiler padding ● Use Charm++ memory allocator – Intercept calls to malloc and pre-fill memory double intshort double int double

10 13 Aprile 2010Filippo Gioachin - UIUC10 Message Order Recording Performance (on NCSA's Abe)

11 13 Aprile 2010Filippo Gioachin - UIUC11 kNeighbor

12 13 Aprile 2010Filippo Gioachin - UIUC12 ChaNGa (dwf1.2048 on NCSA's BluePrint)

13 13 Aprile 2010Filippo Gioachin - UIUC13 Replaying the Application

14 13 Aprile 2010Filippo Gioachin - UIUC14 Virtual Processor BigSim Emulator Message Queue Converse Main Thread Worker Thread Communication Thread Communication Thread

15 13 Aprile 2010Filippo Gioachin - UIUC15 Replaying under BigSim Emulation: NAMD

16 13 Aprile 2010Filippo Gioachin - UIUC16 Amount of Data Saved ChaNGa dwf1.2048, numbers in MB

17 13 Aprile 2010Filippo Gioachin - UIUC17 Debugging Case Study ● Message race during particle exchange – Fixed with tedious print statements (while trying to avoid hiding the bug...)../charmdebug +p16../ChaNGa cube300.param +record +recplay-crc../charmdebug +p16../ChaNGa cube300.param +replay +recplay-crc +record-detail 7 gdb../ChaNGa >> run cube300.param +replay-detail 7/16

18 13 Aprile 2010Filippo Gioachin - UIUC18 CharmDebug ● Debugger tailored to Charm++ applications – Show information pertinent to the user ● Messages in the queue ● Chare elements and their state ● Effective scalability to large applications – Uses single connection to interact with application

19 13 Aprile 2010Filippo Gioachin - UIUC19 CharmDebug Integration ● Enable record/replay – Select which processors to record/replay – Select which detection mechanism to use

20 13 Aprile 2010Filippo Gioachin - UIUC20 Summary ● Capture non-determinism present in applications ● Resources are expensive and should not be wasted – Processor Extraction to capture non-determinism of parallel application ● Must not interfere too much with the application timing ● Save only needed information – Use virtualization to extract using less resources

21 13 Aprile 2010Filippo Gioachin - UIUC21 Future Extensions ● Multi-replay with CharmDebug ● Replay on different architectures – Message translation (envelope and content) ● Replay in isolation of single virtual entities – Extract single chares from the application – Conditions of validity


Download ppt "Robust Non-Intrusive Record-Replay with Processor Extraction Filippo Gioachin Gengbin Zheng Laxmikant Kalé Parallel Programming Laboratory Departement."

Similar presentations


Ads by Google