Debugging Large Scale Applications in a Virtualized Environment Filippo Gioachin Gengbin Zheng Laxmikant Kalé Parallel Programming Laboratory Departement of Computer Science University of Illinois at Urbana-Champaign
LCPC 2010Phil Miller - PPL - UIUC2 Motivations ● Debugging is a fundamental part of software development ● Parallel programs have all the sequential bugs plus other specific to parallelism ● Problems may not appear at small scale – Races between messages ● Latencies in the underlying hardware – Incorrect messaging – Data decomposition
LCPC 2010Phil Miller - PPL - UIUC3 Problems at Large Scale ● Infeasible – Debugger needs to handle many processors – Human can be overwhelmed by information – Long waiting time in queue – Machine not available ● Expensive – Large machine allocations consume a lot of computational resources Virtualize application processors and allocate fewer physical processors
LCPC 2010Phil Miller - PPL - UIUC4 CharmDebug Overview ● Single connection to the application as a whole – Uses the same communication infrastucture as the application
LCPC 2010Phil Miller - PPL - UIUC5 Virtual Processor BigSim Emulator Message Queue Converse Main Thread Worker Thread Communication Thread Communication Thread
LCPC 2010Phil Miller - PPL - UIUC6 Virtualized Emulation ● Use emulation techniques to provide virtual processors to display to the user – Different scenario from BigSim's performance analysis – Debugger needs to communicate with application
LCPC 2010Phil Miller - PPL - UIUC7 Converse Client-Server under Emulated Environment Virtual Processor Worker Thread Communication Thread Message Queue Converse Main Thread Virtual Processor Worker Thread Communication Thread CCS Host Real PE 12 VP 513 VP 87
LCPC 2010Phil Miller - PPL - UIUC8 Usage: Starting
LCPC 2010Phil Miller - PPL - UIUC9 Usage: Debugging
LCPC 2010Phil Miller - PPL - UIUC10 Real Processor Virtualized MPI Converse/Charm++ MPI process MPI process MPI process MPI process MPI process MPI process MPI process MPI process
LCPC 2010Phil Miller - PPL - UIUC11 Resource Consumption: Jacobi (on NCSA's BluePrint) ● User thinks for one minute about what to do: – 8 processors ● 86 sec. ● ~0.2 SU – 1024 procs ● 60.5 sec. ● ~17 SU
LCPC 2010Phil Miller - PPL - UIUC12 Performance
LCPC 2010Phil Miller - PPL - UIUC13 Limitations – Small Machine ● Small memory footprint – Many processors needs to fit into a single physical processor ● Session should be constrained by human speed – Allocation idle most of the time waiting for user input – Bad for computation intensive applications
LCPC 2010Phil Miller - PPL - UIUC14 Limitations – Buggy Program ● Cannot assume correctness of program ● Single address space ● Race conditions Memory Tagging in Charm++ * F. Gioachin, L.V. Kalé: "Memory Tagging in Charm++". Proceedings of the 6th Workshop on Parallel and Distributed Systems: Testing, Analysis, and Debugging (PADTAD '08)
LCPC 2010Phil Miller - PPL - UIUC15 Summary ● Resources are expensive and should not be wasted – Allocate fewer processors than requested – Virtualize application to emulate a large machine – Present to the user illusion of large machine ● Feasible and practical
LCPC 2010Phil Miller - PPL - UIUC16 Poor Man's Virtualization ● Could just run more processes than processors ● Would suffer from ● - Preemptive multitasking ● - Inability to share interconnect ● - Target machines may not allow multiprocessing