Multilingual Debugging Support for Data-driven Parallel Languages Parthasarathy Ramachandran Laxmikant Kale Parallel Programming Laboratory Dept. of Computer.

Slides:



Advertisements
Similar presentations
Three types of remote process invocation
Advertisements

Threads, SMP, and Microkernels
Chap 2 System Structures.
What iS RMI? Remote Method Invocation. It is an approach where a method on a remote machine invokes another method on another machine to perform some computation.
Threads Section 2.2. Introduction to threads A thread (of execution) is a light-weight process –Threads reside within processes. –They share one address.
Scripting Languages For Virtual Worlds. Outline Necessary Features Classes, Prototypes, and Mixins Static vs. Dynamic Typing Concurrency Versioning Distribution.
Memory Management 2010.
User-Level Interprocess Communication for Shared Memory Multiprocessors Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy Presented.
1 MPI-2 and Threads. 2 What are Threads? l Executing program (process) is defined by »Address space »Program Counter l Threads are multiple program counters.
Replay Debugging for Distributed Systems Dennis Geels, Gautam Altekar, Ion Stoica, Scott Shenker.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Virtualization Concept. Virtualization  Real: it exists, you can see it.  Transparent: it exists, you cannot see it  Virtual: it does not exist, you.
1CPSD NSF/DARPA OPAAL Adaptive Parallelization Strategies using Data-driven Objects Laxmikant Kale First Annual Review October 1999, Iowa City.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
Chapter 9 Message Passing Copyright © Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere2 Introduction.
1 Chapter Client-Server Interaction. 2 Functionality  Transport layer and layers below  Basic communication  Reliability  Application layer.
9/13/20151 Threads ICS 240: Operating Systems –William Albritton Information and Computer Sciences Department at Leeward Community College –Original slides.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Supporting Multi-domain decomposition for MPI programs Laxmikant Kale Computer Science 18 May 2000 ©1999 Board of Trustees of the University of Illinois.
Boston, May 22 nd, 2013 IPDPS 1 Acceleration of an Asynchronous Message Driven Programming Paradigm on IBM Blue Gene/Q Sameer Kumar* IBM T J Watson Research.
Support for Debugging Automatically Parallelized Programs Robert Hood Gabriele Jost CSC/MRJ Technology Solutions NASA.
If Exascale by 2018, Really? Yes, if we want it, and here is how Laxmikant Kale.
Advanced / Other Programming Models Sathish Vadhiyar.
A Fault Tolerant Protocol for Massively Parallel Machines Sayantan Chakravorty Laxmikant Kale University of Illinois, Urbana-Champaign.
OPERATING SYSTEM SUPPORT DISTRIBUTED SYSTEMS CHAPTER 6 Lawrence Heyman July 8, 2002.
1CPSD Software Infrastructure for Application Development Laxmikant Kale David Padua Computer Science Department.
The Mach System Abraham Silberschatz, Peter Baer Galvin, Greg Gagne Presentation By: Agnimitra Roy.
Chapter 2 Processes and Threads Introduction 2.2 Processes A Process is the execution of a Program More specifically… – A process is a program.
Optimizing Charm++ Messaging for the Grid Gregory A. Koenig Parallel Programming Laboratory Department of Computer.
Operating Systems CSE 411 CPU Management Sept Lecture 10 Instructor: Bhuvan Urgaonkar.
Message-Passing Computing Chapter 2. Programming Multicomputer Design special parallel programming language –Occam Extend existing language to handle.
Honeywell Displays Testing Ryan Hernandez Matt Lombardo Jeremy Pager Mike Santa Cruz Brad Simons.
Using Charm++ to Mask Latency in Grid Computing Applications Gregory A. Koenig Parallel Programming Laboratory Department.
HPD -- A High Performance Debugger Implementation A Parallel Tools Consortium project
Parallelization Strategies Laxmikant Kale. Overview OpenMP Strategies Need for adaptive strategies –Object migration based dynamic load balancing –Minimal.
Integrated Performance Views in Charm++: Projections meets TAU Scott Biersdorff Allen D. Malony Department Computer and Information Science University.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
1 Opportunities and Challenges of Modern Communication Architectures: Case Study with QsNet CAC Workshop Santa Fe, NM, 2004 Sameer Kumar* and Laxmikant.
Motivation: dynamic apps Rocket center applications: –exhibit irregular structure, dynamic behavior, and need adaptive control strategies. Geometries are.
1 Why Threads are a Bad Idea (for most purposes) based on a presentation by John Ousterhout Sun Microsystems Laboratories Threads!
Some of the utilities associated with the development of programs. These program development tools allow users to write and construct programs that the.
Reference Implementation of the High Performance Debugging (HPD) Standard Kevin London ( ) Shirley Browne ( ) Robert.
Debugging Large Scale Applications in a Virtualized Environment Filippo Gioachin Gengbin Zheng Laxmikant Kalé Parallel Programming Laboratory Departement.
Debugging Tools for Charm++ Applications Filippo Gioachin University of Illinois at Urbana-Champaign.
Flexibility and Interoperability in a Parallel MD code Robert Brunner, Laxmikant Kale, Jim Phillips University of Illinois at Urbana-Champaign.
Group Members Hamza Zahid (131391) Fahad Nadeem khan Abdual Hannan AIR UNIVERSITY MULTAN CAMPUS.
Introduction to Operating Systems Concepts
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.
CS533 Concepts of Operating Systems
Scheduler activations
Parallel Objects: Virtualization & In-Process Components
Hierarchical Architecture
Performance Evaluation of Adaptive MPI
Component Frameworks:
Lecture Topics: 11/1 General Operating System Concepts Processes
Threads Chapter 4.
Hybrid Programming with OpenMP and MPI
Outline Chapter 2 (cont) OS Design OS structure
Faucets: Efficient Utilization of Multiple Clusters
Prof. Leonardo Mostarda University of Camerino
Why Threads Are A Bad Idea (for most purposes)
BigSim: Simulating PetaFLOPS Supercomputers
Debugging Support for Charm++
Outline Operating System Organization Operating System Examples
An Orchestration Language for Parallel Objects
Why Threads Are A Bad Idea (for most purposes)
Why Threads Are A Bad Idea (for most purposes)
Emulating Massively Parallel (PetaFLOPS) Machines
Chapter 13: I/O Systems “The two main jobs of a computer are I/O and [CPU] processing. In many cases, the main job is I/O, and the [CPU] processing is.
Presentation transcript:

Multilingual Debugging Support for Data-driven Parallel Languages Parthasarathy Ramachandran Laxmikant Kale Parallel Programming Laboratory Dept. of Computer Science University of Illinois at Urbana Champaign

Data driven execution: motivation Latency –openMP: remote access –MPI: a blocking receive Response delay: remote processor may be busy Latency Tolerance: –Basic idea: have multiple items on your agenda –Overlap “wait for data” with useful computation

Data driven systems Data flow machines Functional and Logic languages (1983-) –Rediflow (Keller/Lindstrom),.. –Prolog: Conery, Kale,.. Actors Charm/Chare Kernel: 1987 –first implementation on parallel machines Multipol, Cid, Chant, Tulip,..

What is data driven execution Multiple entities per processor –Objects, threads, handlers, … Scheduled based on availability of data

Data driven execution Scheduler Message Q

Data driven execution Existence of scheduler : DDE pre-requisite Implicit control transfer –e.g. when a thread blocks for a recv, control transfers to a module of system’s choice

Data driven execution Advantages: –Adaptive overlap of useful computation with idle time –Multiple modules: efficiency and modularity overlap idle time in one module with useful computation from another MPI/PVM modules can’t do this (as well), without sacrificing modularity

Debugging problem Implicit control transfer leads to debugging difficulties –Unpredictable “jumps” –Need to observe scheduler’s queue –Need to use scheduling quanta I.e. step to the next event Here: –A general purpose framework for debugging data-driven programs

Converse Many data-driven languages/paradigms –Which one is “right”? –Acceptance barriers –Coexistence with monsters (MPI/openMP) Our solution: interoperability –Not the same as C++/Fortran interoperabilty –Ability to integrate modules written in different paradigms in a single program –E.g. MPI, Charm++, Chant

Converse Supports multi-paradigm programming Provides portability Makes it easy to implement RTS for new paradigms Several languages/libraries: –Charm++, Java, threaded MPI, PVM, md-perl, pc++, Nexus, Path, Tempo, Cid, CC++,.. –More info:

Example application:

Debugging Multi-paradigm Multi-lingual Programs New problems: –Scheduler entities belong to different languages –Need ability to set language specific breakpoints Also: –Integration with “normal” debugging methods Source level debugging: e.g. gdb –Need to freeze execution properly

Inadequacy of current methods Can we use gdb? (for example) –Scheduler code inaccessible –Cannot focus on specific breakpoints If we set a breakpoint in the scheduler –Scheduler entities belong to different languages How to view them uniformly? Need support for language implementors

Converse debugging framework Freeze and Thaw: –External intervention –Freeze schedulers (on one or all processors) –Ensure bookkeeping tasks continue –Scheduler processes debugging commands only Scheduler level language-specific breakpoints Language-specific viewing –of scheduler entries (e.g. method invocation, msg) –of resident entities (e.g. objects, threads)

Architecture Uses converse’ client-server interface –Can inject messages into running programs –Can send responses via TCP sockets –Used for other purposes as well: E.g. Web based submission/monitoring

Architecture Debugger client: –Java based, GUI/communication Debugger runtime library –A module in Converse runtime –Provides hooks for languages/libraries GDB interface –Integrates source level debugging –Can customize for other debuggers

Viewing multilingual entities Scheduler’s view: –Each entry –The header contains a handler index –Handlers are registered by languages Language runtimes register callback functions showHeader: returns a short string showContent: returns a long string with newLines –On “freeze” scheduler sends list of headers –On demand, sends the result of showContent headergibberish

Non scheduler entities Objects and other data structures –Not in the scheduler’s queue –But still important to view –Same problem: multiple languages Solution –Each language registers each of its objects –For C++ object, use virtual inheritance –For others, registered callbacks

Breakpoints Are language specific –Charm++: list of remotely invokable methods –Multipol: handler function names, Chant: Code Each language supplies –A list of breakpoints (names and indices) –RTS maintains list of set breakpoints –note: a handler may represent multiple breakpoints –Given a scheduler entry, language RTS identifies breakpoint number

Optimizations Data transfer between the runtime and client –Chunking: send only a viewable screenful of information at a time –Caching: use idle time while user inspects data to collect and prepare information

Summary Debugger is in operational use on –Workstation clusters –Parallel machines (Origin 2000, T3E) Complements source level debuggers Provides a useful functionality for all data- driven languages/libraries If you use Converse for your RTS, you will get this functionality for free…