Dynamic Tuning of Parallel Programs with DynInst Anna Morajko, Tomàs Margalef, Emilio Luque Universitat Autònoma de Barcelona Paradyn/Condor Week, March.

Slides:



Advertisements
Similar presentations
Parallel Virtual Machine Rama Vykunta. Introduction n PVM provides a unified frame work for developing parallel programs with the existing infrastructure.
Advertisements

Paradyn/Condor Week 2004 MATE: Monitoring, Analysis and Tuning Environment Anna Morajko, Tomàs Margalef and Emilio Luque Universitat Autònoma de Barcelona.
1 Automating Auto Tuning Jeffrey K. Hollingsworth University of Maryland
MINJAE HWANG THAWAN KOOBURAT CS758 CLASS PROJECT FALL 2009 Extending Task-based Programming Model beyond Shared-memory Systems.
Practical techniques & Examples
CAL (CAN Application Layer) and CANopen J. Novák Czech Technical University in Prague Faculty of Electrical Engineering Department of Measurement.
1 Coven a Framework for High Performance Problem Solving Environments Nathan A. DeBardeleben Walter B. Ligon III Sourabh Pandit Dan C. Stanzione Jr. Parallel.
© 2006 Barton P. MillerFebruary 2006Binary Code Analysis and Editing A Framework for Binary Code Analysis, and Static and Dynamic Patching Barton P. Miller.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
L22: SC Report, Map Reduce November 23, Map Reduce What is MapReduce? Example computing environment How it works Fault Tolerance Debugging Performance.
C++ fundamentals.
Mapping Techniques for Load Balancing
Parallel Processing (CS526) Spring 2012(Week 5).  There are no rules, only intuition, experience and imagination!  We consider design techniques, particularly.
ADLB Update Recent and Current Adventures with the Asynchronous Dynamic Load Balancing Library Rusty Lusk Mathematics and Computer Science Division Argonne.
Charm++ Load Balancing Framework Gengbin Zheng Parallel Programming Laboratory Department of Computer Science University of Illinois at.
Introduction to the Enterprise Library. Sounds familiar? Writing a component to encapsulate data access Building a component that allows you to log errors.
Portable and Predictable Performance on Heterogeneous Embedded Manycores (ARTEMIS ) ARTEMIS Project Review 28 nd October 2014 Multimedia Demonstrator.
UAB Dynamic Monitoring and Tuning in Multicluster Environment Genaro Costa, Anna Morajko, Paola Caymes Scutari, Tomàs Margalef and Emilio Luque Universitat.
Beyond Automatic Performance Analysis Prof. Dr. Michael Gerndt Technische Univeristät München
Paradyn Week – April 14, 2004 – Madison, WI DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Bernd Mohr Forschungszentrum.
Designing For Testability. Incorporate design features that facilitate testing Include features to: –Support test automation at all levels (unit, integration,
1. There are different assistant software tools and methods that help in managing the network in different things such as: 1. Special management programs.
Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,
TRACEREP: GATEWAY FOR SHARING AND COLLECTING TRACES IN HPC SYSTEMS Iván Pérez Enrique Vallejo José Luis Bosque University of Cantabria TraceRep IWSG'15.
Adventures in Mastering the Use of Performance Evaluation Tools Manuel Ríos Morales ICOM 5995 December 4, 2002.
Adaptive Optimization in the Jalapeño JVM Matthew Arnold Stephen Fink David Grove Michael Hind Peter F. Sweeney Source: UIUC.
University of Maryland The New Dyninst Event Model James Waskiewicz.
CS533 Concepts of Operating Systems Jonathan Walpole.
University of Maryland The DPCL Hybrid Project James Waskiewicz.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Support for Debugging Automatically Parallelized Programs Robert Hood Gabriele Jost CSC/MRJ Technology Solutions NASA.
Scalable Analysis of Distributed Workflow Traces Daniel K. Gunter and Brian Tierney Distributed Systems Department Lawrence Berkeley National Laboratory.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Tekin Bicer Gagan Agrawal 1.
SOFTWARE DESIGN (SWD) Instructor: Dr. Hany H. Ammar
AADEBUG MUNCHEN Non-intrusive on-the-fly data race detection using execution replay Michiel Ronsse - Koen De Bosschere Ghent University - Belgium.
Remote Procedure Calls Adam Smith, Rodrigo Groppa, and Peter Tonner.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.
A performance evaluation approach openModeller: A Framework for species distribution Modelling.
Replay Compilation: Improving Debuggability of a Just-in Time Complier Presenter: Jun Tao.
COP4020 Programming Languages Subroutines and Parameter Passing Prof. Xin Yuan.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
TTCN-3 MOST Challenges Maria Teodorescu
UAB Dynamic Tuning of Master/Worker Applications Anna Morajko, Paola Caymes Scutari, Tomàs Margalef, Eduardo Cesar, Joan Sorribes and Emilio Luque Universitat.
Debugging parallel programs. Breakpoint debugging Probably the most widely familiar method of debugging programs is breakpoint debugging. In this method,
Buffer Overflow Proofing of Code Binaries By Ramya Reguramalingam Graduate Student, Computer Science Advisor: Dr. Gopal Gupta.
University of Maryland Paradyn as a Strict Dyninst Client James Waskiewicz.
Memory Hierarchy Adaptivity An Architectural Perspective Alex Veidenbaum AMRM Project sponsored by DARPA/ITO.
1 Dynamic Emulation and Fault- Injection using Dyninst Presented by: Rean Griffith Programming Systems Lab (PSL)
Gedae, Inc. Gedae: Auto Coding to a Virtual Machine Authors: William I. Lundgren, Kerry B. Barnes, James W. Steed HPEC 2004.
NetLogger Using NetLogger for Distributed Systems Performance Analysis of the BaBar Data Analysis System Data Intensive Distributed Computing Group Lawrence.
Virtual Application Profiler (VAPP) Problem – Increasing hardware complexity – Programmers need to understand interactions between architecture and their.
Miquel Angel Senar Unitat d’Arquitectura de Computadors i Sistemes Operatius Universitat Autònoma de Barcelona Self-Adjusting.
Parallelization Geant4 simulation is an embarrassingly parallel computational problem – each event can possibly be treated independently 1.
Source Level Debugging of Parallel Programs Roland Wismüller LRR-TUM, TU München Germany.
Execution Replay and Debugging. Contents Introduction Parallel program: set of co-operating processes Co-operation using –shared variables –message passing.
Application Communities Phase 2 (AC2) Project Overview Nov. 20, 2008 Greg Sullivan BAE Systems Advanced Information Technologies (AIT)
A Dynamic Tracing Mechanism For Performance Analysis of OpenMP Applications - Caubet, Gimenez, Labarta, DeRose, Vetter (WOMPAT 2001) - Presented by Anita.
FTOP: A library for fault tolerance in a cluster R. Badrinath Rakesh Gupta Nisheeth Shrivastava.
Michael J. Voss and Rudolf Eigenmann PPoPP, ‘01 (Presented by Kanad Sinha)
LECTURE 19 Subroutines and Parameter Passing. ABSTRACTION Recall: Abstraction is the process by which we can hide larger or more complex code fragments.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
Binding & Dynamic Linking Presented by: Raunak Sulekh(1013) Pooja Kapoor(1008)
ANALYSIS TRAIN ON THE GRID Mihaela Gheata. AOD production train ◦ AOD production will be organized in a ‘train’ of tasks ◦ To maximize efficiency of full.
Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.
CompSci 725 Presentation by Siu Cho Jun, William.
Automatic Performance Tuning: Automatic Development of Tunlets
Protocol Verification in Millipede
Parallel Programming in C with MPI and OpenMP
Anant Mudambi, U. Virginia
Presentation transcript:

Dynamic Tuning of Parallel Programs with DynInst Anna Morajko, Tomàs Margalef, Emilio Luque Universitat Autònoma de Barcelona Paradyn/Condor Week, March 2002

UAB 2 Content Our goals - dynamic performance tuning Steering loop  Dynamic tracing  Performance analysis  Dynamic tuning DynInst Application development framework Conclusions and future work

UAB 3 The objective is to: create a system that improves performance of parallel applications during their execution without recompiling and rerunning them Our goals Our approach to performance improvement: Phases:  Tracing  Performance analysis  Tuning All done during run-time

UAB 4 Dynamic performance tuning Suggestions Application User TuningTracing Exe- cution Tool Solution Problem / Events Performance analysis Modifications Instrumentation Application development Source

UAB 5 Machine 1 Machine 2 Machine 3 Dynamic performance tuning Tracer Tuner pvmd Analyzer Tracer Tuner Events Task 1 Task 2 pvmd Change instrumentation Events Change instrumentation Apply solutions

UAB 6 Dynamic performance tuning Requirements: all optimization phases done during run time improve performance changing execution of an application without source code access, re-compile, re-link and re-run Dynamic instrumentation (DynInst) instrumentation and tracing tuning

UAB 7 Dynamic performance tuning Analysis application dependent performance model:  what to measure  how to detect bottlenecks  how to resolve it simple analysis – decisions must be taken in a short time Execution modifications what can be changed - complexity when performing application changes without knowing its internal code and how it can be changed Intrusion minimization of intrusion caused by system modules, instrumentation, event generation and communication

UAB 8 Content Our goals - dynamic performance tuning Steering loop  Dynamic tracing  Performance analysis  Dynamic tuning DynInst Application development framework Conclusions and future work

UAB 9 Dynamic tracer Goals of the dynamic tracing tool: control of the distributed application application instrumentation management event generation intrusion minimization portability and extensibility Version: implementation in C++ for PVM based applications Sun Solaris 2.x / SPARC

UAB 10 Dynamic tracer Distributed application control: Process control with PVM Tasker service VM control with Hoster service Master tracer / Slave tracers Clock synchronization (event timestamp with global order) Machine 2 Slave Tracer pvmd Task 2 Task 3 Machine 1 Task 1 Master Tracer pvmd Machine 3 Slave Tracer Task 4 pvmd

UAB 11 Dynamic tracer Application instrumentation management: Instrumentation inserted into a set of selected functions (configurable) this set can be changed during execution (insert, remove) Different snippets: Event generation (function entry, exit) Line number information (function call points) Snippets call tracing library dynamically loaded into a running task Machine N Tracer lib Task 2 Task 1 Config

UAB 12 Dynamic tracer Event generation: When - global timestamp Where - task What - type of event Additional information (i.e. function call params, source code line number) Machine N Tracer lib Task 1 pvm_send (params) { } pvm_send (params) { } LogEvent (params) {... } LogEvent (params) {... } Trace file Analyzer entry exit

UAB 13 [s] Dynamic tracer Overhead 1,1% - 3% Average execution time (matrix type) Performance measurements tracer generates trace files

UAB 14 Dynamic tracer Event Collection: Centralized event collector receives events Global ordering Flow of ordered events is passed to the analyzer To consider, e.g. event buffering, periodical flush Machine 3 Machine 1 Machine 2 lib Task 1 lib Task 2 lib Task 3 Event collector Analyzer

UAB 15 Performance analysis Clue questions: What should/should not happen during execution (what kind of problems, performance bottlenecks)? What events are needed (what should be collected to help discover what really happened)? How to deduce what really happened (finding bottlenecks and causes)? How to improve performance (finding solutions)?

UAB 16 Performance analysis Performance model some fixed model for an application - need to determine desired performance characteristics model determines what to measure Analyzing events that come from event collector Evaluating the model Giving solutions - the best possible changes that will improve the performance Sending solutions to tuner WORK IN PROGRESS

UAB 17 Dynamic tuning What can be changed? parameters: changing parameter value application contains well-known variables application is prepared for changes set of pre-defined snippets choosing strategy – function replacement changing order of functions dynamic snippet creation it depends on the solutions generated by analyzer very difficult Tested In progress Future

UAB 18 Dynamic tuning Machine 3 Machine 1 Machine 2 Task 1 Task 2 Task 3 Analyzer Tuner Where can be changed? All tuners receives changes (i.e. SPMD) One concrete tuner that tunes one concrete process (i.e. MW)

UAB 19 Steering loop Machine 3 Machine 1 Tracer Tuner pvmd Machine 2 Event collector Events Apply solutions Change instrumentation Tracer Tuner Events Change instrumentation lib Task 1 lib Task 2 Analyzer pvmd lib Task 1 Event Collection Protocol Performance Tuning Protocol

UAB 20 Steering loop Example: master/worker application What should´n happen during execution  Frequent communication  improper size of data sent to worker Model How to prepare application  it should be adapted for the possible changes of work size parameter; if parameter is changed, apply that change What events are needed - Tracing  insert instrumentation: pvm_send, pvm_recv, pk_xxx

UAB 21 Steering loop How to deduce what really happened How to improve performance Improve performance - Tuning Change work size parameter

UAB 22 Content Our goals - dynamic performance tuning Steering loop  Dynamic tracing  Performance analysis  Dynamic tuning DynInst Application development framework Conclusions and future work

UAB 23 DynInst Two principal reasons for using DynInst: instrumentation and tracing collect information about application behavior tuning insert/remove/change code in order to improve application performance

UAB 24 DynInst Instrumentation and tracing: foo_1 (...) { pvm_send (); } pvm_send (params) {... } foo_1 (...) { pvm_send (); } pvm_send (params) {... } Tracer... thread.loadLibrary(“libLog.so”); BPatch_function f = image.findFunction (logEvent); BPatch_funcCallExpr log (f); BPatch_point point (image, pvm_send, BPatch_entry); thread.insertSnippet (log, point);... thread.loadLibrary(“libLog.so”); BPatch_function f = image.findFunction (logEvent); BPatch_funcCallExpr log (f); BPatch_point point (image, pvm_send, BPatch_entry); thread.insertSnippet (log, point); Dynamic library (libLog.so) PVM application Snippet log () { logEvent(format, params) } log () { logEvent(format, params) } logEvent(format,...) { // transfer event } logEvent(format,...) { // transfer event }

UAB 25 for (int i=0;i<100;i++) j = DIM % (nunw); pvm_pkint (x,1,2); pvm_send (...); for (int i=0;i<100;i++) j = DIM % (nunw); pvm_pkint (x,1,2); pvm_send (...); numr = DIM/(nunw); extra = DIM % (nunw); pvm_pkint (x,1,2); pvm_send (...); numr = DIM/(nunw); extra = DIM % (nunw); pvm_pkint (x,1,2); pvm_send (...); numr = DIM/(nunw); extra = DIM % (nunw); pvm_pkint (x,1,2); pvm_send (...); numr = DIM/(nunw); extra = DIM % (nunw); pvm_pkint (x,1,2); pvm_send (...); numr = DIM/(nunw); extra = DIM % (nunw); pvm_pkint (x,1,2); pvm_send (...); numr = DIM/(nunw); extra = DIM % (nunw); pvm_pkint (x,1,2); pvm_send (...); i = DIM*2/(nunw); DIM = j; pvm_pkint (x,3,1); pvm_send (...); i = DIM*2/(nunw); DIM = j; pvm_pkint (x,3,1); pvm_send (...); DynInst Event line number: PVM application pvm_send(...) { }... pvm_send(...) { }... logLineNumber

UAB 26 DynInst Event line number: Tracer BPatch_function f=image.findFunction (logLineNumber); BPatch_funcCallExpr logLine (f); CallPointList cp (“pvm_send”) for each cp addr = cp.getAddress (); thread.GetLineAndFile (addr); thread.insertSnippetBefore (logLineNumber, cp); BPatch_function f=image.findFunction (logLineNumber); BPatch_funcCallExpr logLine (f); CallPointList cp (“pvm_send”) for each cp addr = cp.getAddress (); thread.GetLineAndFile (addr); thread.insertSnippetBefore (logLineNumber, cp); PVM application Snippet foo_1 (...) { pvm_send (); pvm_recv (); }... foo_1 (...) { pvm_send (); pvm_recv (); }... logLine () { logLineNumber (params) } logLine () { logLineNumber (params) }

UAB 27 DynInst Tuning: recv_req (param, value) { BPatch_variable variable = image.FindVariable (param); variable.writeValue (value); } recv_req (param, value) { BPatch_variable variable = image.FindVariable (param); variable.writeValue (value); }... changeParam (param, value) { send_req (tuner, param, value); }... changeParam (param, value) { send_req (tuner, param, value); } main () { int tuneParam; int param; if (param != tuneParam)... } main () { int tuneParam; int param; if (param != tuneParam)... } Tuner PVM application Analyzer

UAB 28 DynInst Problems when using DynInst: Line number We need to scan all functions to be instrumented and additionally instrument them (before execution) Insert snippet that logs line number (before function call) Could be nice to access callee address (call stack??) Limited number of parameters that can be passed to library function call (max. 5 for Sun Solaris) Problems with scanning function call points (e.g. C++ mutatee with operator<<, function sleep) BPatch_function::findPoint (subroutine) BPatch_point::getCalledFunction ()

UAB 29 Content Our goals - dynamic performance tuning Steering loop  Dynamic tracing  Performance analysis  Dynamic tuning DynInst Application development framework Conclusions and future work

UAB 30 Application framework Measure points Tuning points Perf. model Application Framework API Parallel pattern Framework for parallel application development Based on patters, e.g. master worker, pipeline, divide and conquer Support for dynamic tuning system: Measure points, performance model, tuning points Support for parallel application developer: hides implementation details developer provides only specific functionality

UAB 31 Dynamic performance tuning Suggestions Application User TuningTracing Exe- cution Tool Solution Problem / Events Application Development Performance analysis Modifications Instrumentation Application Framework API Measure points Tuning points Perf. model

UAB 32 Content Our goals - dynamic performance tuning Steering loop  Dynamic tracing  Performance analysis  Dynamic tuning DynInst Application development framework Conclusions and future work

UAB 33 Conclusions Dynamic tuning Design of dynamic tuning tool Characteristics Steering loop Design details Implementation of dynamic tracer Bases of performance analysis Tuning methods How we use DynInst Parallel application development framework

UAB 34 Future work Tracer event buffering, changing instrumentation on demand, performance measurements Analyzer performance analysis Tuner other solutions that can be applied when changing??? Find and analyze more examples of what to measure, how to analyze and what to change

Thank you very much