Debugging Large Scale Applications in a Virtualized Environment Filippo Gioachin Gengbin Zheng Laxmikant Kalé Parallel Programming Laboratory Departement.

Slides:



Advertisements
Similar presentations
Distributed Systems CS
Advertisements

Introductions to Parallel Programming Using OpenMP
WHAT IS AN OPERATING SYSTEM? An interface between users and hardware - an environment "architecture ” Allows convenient usage; hides the tedious stuff.
The Who, What, Why and How of High Performance Computing Applications in the Cloud Soheila Abrishami 1.
Abhinav Bhatele, Laxmikant V. Kale University of Illinois at Urbana-Champaign Sameer Kumar IBM T. J. Watson Research Center.
Multilingual Debugging Support for Data-driven Parallel Languages Parthasarathy Ramachandran Laxmikant Kale Parallel Programming Laboratory Dept. of Computer.
Dr. Gengbin Zheng and Ehsan Totoni Parallel Programming Laboratory University of Illinois at Urbana-Champaign April 18, 2011.
OPERATING SYSTEM OVERVIEW
Ritu Varma Roshanak Roshandel Manu Prasanna
Adaptive MPI Chao Huang, Orion Lawlor, L. V. Kalé Parallel Programming Lab Department of Computer Science University of Illinois at Urbana-Champaign.
Multithreading in Java Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
BigSim: A Parallel Simulator for Performance Prediction of Extremely Large Parallel Machines Gengbin Zheng Gunavardhan Kakulapati Laxmikant V. Kale University.
 What is an operating system? What is an operating system?  Where does the OS fit in? Where does the OS fit in?  Services provided by an OS Services.
(Superficial!) Review of Uniprocessor Architecture Parallel Architectures and Related concepts CS 433 Laxmikant Kale University of Illinois at Urbana-Champaign.
If Exascale by 2018, Really? Yes, if we want it, and here is how Laxmikant Kale.
Advanced / Other Programming Models Sathish Vadhiyar.
1 Blue Gene Simulator Gengbin Zheng Gunavardhan Kakulapati Parallel Programming Laboratory Department of Computer Science.
ECE200 – Computer Organization Chapter 9 – Multiprocessors.
A Fault Tolerant Protocol for Massively Parallel Machines Sayantan Chakravorty Laxmikant Kale University of Illinois, Urbana-Champaign.
Workshop BigSim Large Parallel Machine Simulation Presented by Eric Bohm PPL Charm Workshop 2004.
1Charm++ Workshop 2010 The BigSim Parallel Simulation System Gengbin Zheng, Ryan Mokos Charm++ Workshop 2010 Parallel Programming Laboratory University.
University of Illinois at Urbana-Champaign Memory Architectures for Protein Folding: MD on million PIM processors Fort Lauderdale, May 03,
Optimizing Charm++ Messaging for the Grid Gregory A. Koenig Parallel Programming Laboratory Department of Computer.
Simics: A Full System Simulation Platform Synopsis by Jen Miller 19 March 2004.
Using BigSim to Estimate Application Performance Ryan Mokos Parallel Programming Laboratory University of Illinois at Urbana-Champaign October 19, 2010.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 2 Parallel Hardware and Parallel Software An Introduction to Parallel Programming Peter Pacheco.
Using Charm++ to Mask Latency in Grid Computing Applications Gregory A. Koenig Parallel Programming Laboratory Department.
IPDPS Workshop: Apr 2002PPL-Dept of Computer Science, UIUC A Parallel-Object Programming Model for PetaFLOPS Machines and BlueGene/Cyclops Gengbin Zheng,
Parallelization Strategies Laxmikant Kale. Overview OpenMP Strategies Need for adaptive strategies –Object migration based dynamic load balancing –Minimal.
A uGNI-Based Asynchronous Message- driven Runtime System for Cray Supercomputers with Gemini Interconnect Yanhua Sun, Gengbin Zheng, Laximant(Sanjay) Kale.
(Superficial!) Review of Uniprocessor Architecture Parallel Architectures and Related concepts CS 433 Laxmikant Kale University of Illinois at Urbana-Champaign.
Gengbin Zheng Xiang Ni Laxmikant V. Kale Parallel Programming Lab University of Illinois at Urbana-Champaign.
FTC-Charm++: An In-Memory Checkpoint-Based Fault Tolerant Runtime for Charm++ and MPI Gengbin Zheng Lixia Shi Laxmikant V. Kale Parallel Programming Lab.
Teragrid 2009 Scalable Interaction with Parallel Applications Filippo Gioachin Chee Wai Lee Laxmikant V. Kalé Department of Computer Science University.
1 Scalable Cosmological Simulations on Parallel Machines Filippo Gioachin¹ Amit Sharma¹ Sayantan Chakravorty¹ Celso Mendes¹ Laxmikant V. Kale¹ Thomas R.
Debugging Large Scale Parallel Applications Filippo Gioachin Parallel Programming Laboratory Departement of Computer Science University of Illinois at.
Doctoral Defense Filippo Gioachin September 27, 2010 Debugging Large Scale Applications with Virtualization.
Pitfalls: Time Dependent Behaviors CS433 Spring 2001 Laxmikant Kale.
Chapter 2 Operating System Overview Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.
Robust Non-Intrusive Record-Replay with Processor Extraction Filippo Gioachin Gengbin Zheng Laxmikant Kalé Parallel Programming Laboratory Departement.
PADTAD 2008 Memory Tagging in Charm++ Filippo Gioachin Laxmikant V. Kalé Department of Computer Science University of Illinois at Urbana-Champaign.
IPDPS 2009 Dynamic High-level Scripting in Parallel Applications Filippo Gioachin Laxmikant V. Kalé Department of Computer Science University of Illinois.
1 ChaNGa: The Charm N-Body GrAvity Solver Filippo Gioachin¹ Pritish Jetley¹ Celso Mendes¹ Laxmikant Kale¹ Thomas Quinn² ¹ University of Illinois at Urbana-Champaign.
Debugging Tools for Charm++ Applications Filippo Gioachin University of Illinois at Urbana-Champaign.
Lecture 3 – MapReduce: Implementation CSE 490h – Introduction to Distributed Computing, Spring 2009 Except as otherwise noted, the content of this presentation.
Andy Wang COP 5611 Advanced Operating Systems
Interconnection topologies
CharmDebug Filippo Gioachin.
uGNI-based Charm++ Runtime for Cray Gemini Interconnect
Performance Evaluation of Adaptive MPI
Gengbin Zheng Xiang Ni Laxmikant V. Kale Parallel Programming Lab
Operating Systems (CS 340 D)
Multi-Processing in High Performance Computer Architecture:
Mach Kernel Kris Ambrose Kris Ambrose 2003.
Component Frameworks:
Operating Systems.
Software testing and configuration : Embedded software testing
Hybrid Programming with OpenMP and MPI
COMP60621 Fundamentals of Parallel and Distributed Systems
Operating Systems (CS 340 D)
Faucets: Efficient Utilization of Multiple Clusters
Prof. Leonardo Mostarda University of Camerino
BigSim: Simulating PetaFLOPS Supercomputers
Chapter 2 Operating System Overview
Process Management -Compiled for CSIT
Process/Code Migration and Cloning
Operating System Overview
COMP60611 Fundamentals of Parallel and Distributed Systems
Support for Adaptivity in ARMCI Using Migratable Objects
Emulating Massively Parallel (PetaFLOPS) Machines
Presentation transcript:

Debugging Large Scale Applications in a Virtualized Environment Filippo Gioachin Gengbin Zheng Laxmikant Kalé Parallel Programming Laboratory Departement of Computer Science University of Illinois at Urbana-Champaign

LCPC 2010Phil Miller - PPL - UIUC2 Motivations ● Debugging is a fundamental part of software development ● Parallel programs have all the sequential bugs plus other specific to parallelism ● Problems may not appear at small scale – Races between messages ● Latencies in the underlying hardware – Incorrect messaging – Data decomposition

LCPC 2010Phil Miller - PPL - UIUC3 Problems at Large Scale ● Infeasible – Debugger needs to handle many processors – Human can be overwhelmed by information – Long waiting time in queue – Machine not available ● Expensive – Large machine allocations consume a lot of computational resources Virtualize application processors and allocate fewer physical processors

LCPC 2010Phil Miller - PPL - UIUC4 CharmDebug Overview ● Single connection to the application as a whole – Uses the same communication infrastucture as the application

LCPC 2010Phil Miller - PPL - UIUC5 Virtual Processor BigSim Emulator Message Queue Converse Main Thread Worker Thread Communication Thread Communication Thread

LCPC 2010Phil Miller - PPL - UIUC6 Virtualized Emulation ● Use emulation techniques to provide virtual processors to display to the user – Different scenario from BigSim's performance analysis – Debugger needs to communicate with application

LCPC 2010Phil Miller - PPL - UIUC7 Converse Client-Server under Emulated Environment Virtual Processor Worker Thread Communication Thread Message Queue Converse Main Thread Virtual Processor Worker Thread Communication Thread CCS Host Real PE 12 VP 513 VP 87

LCPC 2010Phil Miller - PPL - UIUC8 Usage: Starting

LCPC 2010Phil Miller - PPL - UIUC9 Usage: Debugging

LCPC 2010Phil Miller - PPL - UIUC10 Real Processor Virtualized MPI Converse/Charm++ MPI process MPI process MPI process MPI process MPI process MPI process MPI process MPI process

LCPC 2010Phil Miller - PPL - UIUC11 Resource Consumption: Jacobi (on NCSA's BluePrint) ● User thinks for one minute about what to do: – 8 processors ● 86 sec. ● ~0.2 SU – 1024 procs ● 60.5 sec. ● ~17 SU

LCPC 2010Phil Miller - PPL - UIUC12 Performance

LCPC 2010Phil Miller - PPL - UIUC13 Limitations – Small Machine ● Small memory footprint – Many processors needs to fit into a single physical processor ● Session should be constrained by human speed – Allocation idle most of the time waiting for user input – Bad for computation intensive applications

LCPC 2010Phil Miller - PPL - UIUC14 Limitations – Buggy Program ● Cannot assume correctness of program ● Single address space ● Race conditions Memory Tagging in Charm++ * F. Gioachin, L.V. Kalé: "Memory Tagging in Charm++". Proceedings of the 6th Workshop on Parallel and Distributed Systems: Testing, Analysis, and Debugging (PADTAD '08)

LCPC 2010Phil Miller - PPL - UIUC15 Summary ● Resources are expensive and should not be wasted – Allocate fewer processors than requested – Virtualize application to emulate a large machine – Present to the user illusion of large machine ● Feasible and practical

LCPC 2010Phil Miller - PPL - UIUC16 Poor Man's Virtualization ● Could just run more processes than processors ● Would suffer from ● - Preemptive multitasking ● - Inability to share interconnect ● - Target machines may not allow multiprocessing