Paradyn/Condor Week 2004 MATE: Monitoring, Analysis and Tuning Environment Anna Morajko, Tomàs Margalef and Emilio Luque Universitat Autònoma de Barcelona.

Slides:



Advertisements
Similar presentations
Using the SQL Access Advisor
Advertisements

Shared-Memory Model and Threads Intel Software College Introduction to Parallel Programming – Part 2.
Requirements Engineering Processes – 2
Design and Evaluation of an Autonomic Workflow Engine Thomas Heinis, Cesare Pautasso, Gustavo Alsonso Dept. of Computer Science Swiss Federal Institute.
2. Getting Started Heejin Park College of Information and Communications Hanyang University.
Process Description and Control
1 Concurrency: Deadlock and Starvation Chapter 6.
1 Vorlesung Informatik 2 Algorithmen und Datenstrukturen (Parallel Algorithms) Robin Pomplun.
Distributed Systems Architectures
Chapter 7 System Models.
Chapter 7 Constructors and Other Tools. Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 7-2 Learning Objectives Constructors Definitions.
© 2008 Pearson Addison Wesley. All rights reserved Chapter Seven Costs.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Cognitive Radio Communications and Networks: Principles and Practice By A. M. Wyglinski, M. Nekovee, Y. T. Hou (Elsevier, December 2009) 1 Chapter 12 Cross-Layer.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.
Processes and Operating Systems
Author: Julia Richards and R. Scott Hawley
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
UNITED NATIONS Shipment Details Report – January 2006.
RXQ Customer Enrollment Using a Registration Agent (RA) Process Flow Diagram (Move-In) Customer Supplier Customer authorizes Enrollment ( )
Towards Automating the Configuration of a Distributed Storage System Lauro B. Costa Matei Ripeanu {lauroc, NetSysLab University of British.
Introduction to Algorithms 6.046J/18.401J
18 Copyright © 2005, Oracle. All rights reserved. Distributing Modular Applications: Introduction to Web Services.
Objectives To introduce software project management and to describe its distinctive characteristics To discuss project planning and the planning process.
DCV: A Causality Detection Approach for Large- scale Dynamic Collaboration Environments Jiang-Ming Yang Microsoft Research Asia Ning Gu, Qi-Wei Zhang,
1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.
Chapter 5 Input/Output 5.1 Principles of I/O hardware
Chapter 1 Introduction Copyright © Operating Systems, by Dhananjay Dhamdhere Copyright © Introduction Abstract Views of an Operating System.
Evaluating Window Joins over Unbounded Streams Author: Jaewoo Kang, Jeffrey F. Naughton, Stratis D. Viglas University of Wisconsin-Madison CS Dept. Presenter:
Robust Window-based Multi-node Technology- Independent Logic Minimization Jeff L.Cobb Kanupriya Gulati Sunil P. Khatri Texas Instruments, Inc. Dept. of.
Excel Functions. Part 1. Introduction 2 An Excel function is a formula or a procedure that is performed in the Visual Basic environment, outside the.
Solve Multi-step Equations
REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.
Chapter 11: Models of Computation
13 Copyright © 2005, Oracle. All rights reserved. Monitoring and Improving Performance.
Database Performance Tuning and Query Optimization
PP Test Review Sections 6-1 to 6-6
Seungmi Choi PlanetLab - Overview, History, and Future Directions - Using PlanetLab for Network Research: Myths, Realities, and Best Practices.
Virtual Memory II Chapter 8.
Use Case Diagrams.
IP Multicast Information management 2 Groep T Leuven – Information department 2/14 Agenda •Why IP Multicast ? •Multicast fundamentals •Intradomain.
Exarte Bezoek aan de Mediacampus Bachelor in de grafische en digitale media April 2014.
Making Time-stepped Applications Tick in the Cloud Tao Zou, Guozhang Wang, Marcos Vaz Salles*, David Bindel, Alan Demers, Johannes Gehrke, Walker White.
Name Convolutional codes Tomashevich Victor. Name- 2 - Introduction Convolutional codes map information to code bits sequentially by convolving a sequence.
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
CONTROL VISION Set-up. Step 1 Step 2 Step 3 Step 5 Step 4.
© 2012 National Heart Foundation of Australia. Slide 2.
1 © 2004, Cisco Systems, Inc. All rights reserved. CCNA 1 v3.1 Module 2 Networking Fundamentals.
1 Processes and Threads Chapter Processes 2.2 Threads 2.3 Interprocess communication 2.4 Classical IPC problems 2.5 Scheduling.
Science as a Process Chapter 1 Section 2.
Executional Architecture
Global Analysis and Distributed Systems Software Architecture Lecture # 5-6.
KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.
Chapter 10: The Traditional Approach to Design
Analyzing Genes and Genomes
Systems Analysis and Design in a Changing World, Fifth Edition
Chapter 12 Analyzing Semistructured Decision Support Systems Systems Analysis and Design Kendall and Kendall Fifth Edition.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Intracellular Compartments and Transport
PSSA Preparation.
Essential Cell Biology
 2003 Prentice Hall, Inc. All rights reserved. 1 Chapter 13 - Exception Handling Outline 13.1 Introduction 13.2 Exception-Handling Overview 13.3 Other.
User Defined Functions Lesson 1 CS1313 Fall User Defined Functions 1 Outline 1.User Defined Functions 1 Outline 2.Standard Library Not Enough #1.
UAB Dynamic Monitoring and Tuning in Multicluster Environment Genaro Costa, Anna Morajko, Paola Caymes Scutari, Tomàs Margalef and Emilio Luque Universitat.
UAB Dynamic Tuning of Master/Worker Applications Anna Morajko, Paola Caymes Scutari, Tomàs Margalef, Eduardo Cesar, Joan Sorribes and Emilio Luque Universitat.
Dynamic Tuning of Parallel Programs with DynInst Anna Morajko, Tomàs Margalef, Emilio Luque Universitat Autònoma de Barcelona Paradyn/Condor Week, March.
Automatic Performance Tuning: Automatic Development of Tunlets
Presentation transcript:

Paradyn/Condor Week 2004 MATE: Monitoring, Analysis and Tuning Environment Anna Morajko, Tomàs Margalef and Emilio Luque Universitat Autònoma de Barcelona Paradyn/Condor Week 2004 April 2004

Paradyn/Condor Week Introduction 2.Dynamic Performance Tuning 3.MATE 4.Tuning Techniques 5.Conclusions and future work Content

Paradyn/Condor Week Introduction Application performance Demand of high performance computation The main goal of parallel/distributed applications: solve a considered problem in the possible fastest way Performance is one of the most important issues Developers must optimize application performance to provide efficient and useful applications

Paradyn/Condor Week Introduction Application performance optimization Steps: monitoring, analysis, tuning Bottlenecks Application development Monitored execution Solutions Source code relation Performance data Application Source Instrumentation Modifications MonitoringTuning Performance analysis Measurements Changes

Paradyn/Condor Week Introduction Application performance optimization Difficulties in finding bottlenecks and determining their solutions for parallel/distributed applications –Many tasks that cooperate with each other High degree of expertise Application behavior may change on input data or environment Difficult task especially for non-expert users

Paradyn/Condor Week Introduction Our goals Investigate if it is possible to optimize performance of parallel/distributed applications dynamically without user intervention Investigate the applicability of dynamic tuning Create a tool that is able to dynamically optimize applications: –automatically improve application performance –improve the application execution during run time –tune without recompiling and rerunning –adapt application to existing conditions Practically evaluate profitability of dynamic tuning

Paradyn/Condor Week Introduction Dynamic automatic tuning User TuningMonitoring Tool Solution Problem / Performance analysis Modifications Performance data Application development Application Execution Source Instrumentation Events

Paradyn/Condor Week Introduction 2.Dynamic Performance Tuning 3.MATE 4.Tuning Techniques 5.Conclusions and future work Content

Paradyn/Condor Week Dynamic Performance Tuning Requirements No user intervention No source recompilation Performance analysis on the fly –Global analysis –Decisions taken in a short time –Not complex analysis and modifications Run time monitoring Run time tuning –Modifications performed carefully Parallel/distributed application control Low intrusion

Paradyn/Condor Week Dynamic Performance Tuning Key question What can be tuned in an application? Application knowledge Limited information about the application Tuning layers Approaches to tuning

Paradyn/Condor Week Dynamic Performance Tuning Tuning layers Application specific code Standard and custom libraries (API+code) Operating system libraries (API+code) Hardware Operating System kernel OS API Libraries code API Application code

Paradyn/Condor Week Dynamic Performance Tuning Application Application code changes –Different bottlenecks that depend on the application implementation Libraries Library code changes API usage –Standard C/C++ library -> memory management, dynamic containers –Custom PVM, MPI -> communication OS Kernel code changes API usage –Adjustment of options (e.g. TCP/IP socket), I/O request grouping More bottlenecks common for wider group of applications Hardware Operating System kernel OS API Libraries code API Application code

Paradyn/Condor Week Dynamic Performance Tuning Approaches to tuning Cooperative –Application must be prepared for tuning –Application-specific knowledge is provided Automatic - black-box –Tuning of any application –No application-specific knowledge is required –Knowledge about bottleneck is required –No changes are introduced into the application source code More automatic, more generic information available More cooperative, more application- specific Hardware Operating System kernel OS API Libraries code API Application code

Paradyn/Condor Week Dynamic Performance Tuning Knowledge representation Measure points –Where the instrumentation must be inserted to provide measurements Performance model –Determines minimal execution time of the entire application Tuning points/actions/synchronization –What and when can be changed in the application point – element that may be changed action – what to invoke on a point synchronization – when a tuning action can be invoked to ensure application correctness Formulas and conditions for optimal behavior measurementsoptimal values

Paradyn/Condor Week Dynamic Performance Tuning Application knowledge Measure points Performance model Tuning point, action, sync Provided by the user Provided automatically by a tuning system Hardware Operating System kernel OS API Libraries code API Application code

Paradyn/Condor Week Dynamic Performance Tuning Manipulation of a running application monitoring – collect information about the behavior of a running application tuning – insert tuning code into a running application that improves its performance Dynamic instrumentation – DynInst

Paradyn/Condor Week Dynamic Performance Tuning Dynamic modifications of a running application with DynInst Function replacement Function invocation One-time function invocation Function call elimination Function parameter changes Variable changes

Paradyn/Condor Week Introduction 2.Dynamic Performance Tuning 3.MATE 4.Tuning Techniques 5.Conclusions and future work Content

Paradyn/Condor Week MATE MATE – Monitoring, Analysis and Tuning Environment prototype implementation in C++ for PVM based applications Sun Solaris 2.x / SPARC

Paradyn/Condor Week MATE Machine 1 Machine 2 Machine 3 pvmd Analyzer pvmd AC instr. events modif. events DMLib Task 1 Task 2 Task 3 instr. AC Application Controller - AC Dynamic Monitoring Library - DMLib Analyzer

Paradyn/Condor Week MATE: Application Controller Services Distributed application control –Startup/exit of tasks (Tasker) –Startup/exit of PVM daemons, slave ACs (Hoster) –Clock synchronization Application model management (Task Manager) Performance monitoring (Monitors) –Manage monitoring instrumentation –Provide monitoring API for Analyzer Performance tuning (Tuners) –Manage tuning instrumentation –Provide tuning API for Analyzer

Paradyn/Condor Week MATE: Application Controller Machine 1 DMLib Task 2 Task 1 Instrument Via DynInst Machine 2 Analyzer add event/ remove event AC Monitor Monitors Instrumentation management via DynInst –Dynamically load DMLib –Generate monitoring snippets that call appropriate library functions –Insert/remove snippets in/from requested points API –AddEventTrace(tid, eventId, funcName, instrPlace, attrs) –RemoveEventTrace(tid,eventId)

Paradyn/Condor Week MATE: Application Controller Tuners Tuning via DynInst –Generate tuning snippet according to the request –Insert tuning snippet API –LoadLibrary(tid,path) –SetVariableValue(tid,params,brkpt) –ReplaceFunction(…) –InsertFunctionCall(…) –OneTimeFunctionCall(…) –RemoveFunctionCall(…) –FunctionParamChange(…) Machine 1 Task 2 Task 1 Tune Via DynInst Machine 2 Analyzer Apply tuning AC Tuner

Paradyn/Condor Week MATE: Dynamic Monitoring Library Services Register event What – event type (id, place) When – global timestamp Where – task identifier Requested attributes – e.g. function call parameters, return value Deliver event to the Analyzer API –DMLib_InitLogger(tid, analyzerHost,port,clockDiff) –DMLib_OpenEvent(id, nAttrs) –DMLib_AddIntAttr(value) –DMLib_AddFloatAttr(value) –DMLib_AddCharAttr(value) –DMLib_AddStringAttr(value) –DMLib_CloseEvent() –DMLib_DoneLogger() Machine 1 DMLib Task 1 pvm_send (p1, p2) { } pvm_send (p1, p2) { } DMLib_OpenEvent(); DMLib_AddIntAttr(); DMLib_CloseEvent(); DMLib_OpenEvent(); DMLib_AddIntAttr(); DMLib_CloseEvent(); Analyzer entry TCP/IP event API implementation

Paradyn/Condor Week MATE: Analyzer Services Automatic performance analysis on the fly –Request for events –Collect incoming events –Find bottlenecks among events applying performance model –Find solutions that overcome bottlenecks –Send tuning request Analyzer is provided with an application knowledge about performance problems Information related to one problem we call a tuning technique A tuning technique describes a complete performance optimization scenario

Paradyn/Condor Week MATE: Analyzer Tunlets Each technique is implemented in MATE as a tunlet A tunlet contains specific code (analysis logic) related to one concrete performance problem –measure points – what events are needed –performance model – how to determine bottlenecks and solutions –tuning actions/points/synchronization - what to change, where, when A tunlet is a C/C++ library dynamically loaded to the Analyzer process Analyzer Tunlet Measure pointsTuning point, action, sync Performance model

Paradyn/Condor Week MATE: Analyzer Events (from DMLibs) via TCP/IP Event Collector thread DTAPI Controller Tunlet Event Repository Application model AC Proxy Tuning request (to tuner) via TCP/IP Instrument. request (to monitor) via TCP/IP MetaData (from ACs) via TCP/IP Tunlet

Paradyn/Condor Week Introduction 2.Dynamic Performance Tuning 3.MATE 4.Tuning Example 5.Conclusions and future work Content

Paradyn/Condor Week Tuning techniques Catalog (set of tuning techniques) OS –Message aggregation –Send/receive TCP/IP buffers size Standard library –Memory allocation PVM library –Communication mode –Data encoding mode –Message fragment size Application –Workload balancing –Number of workers Automatic approach Cooperative approach

Paradyn/Condor Week Tuning Example Workload balancing (App layer) Imbalance problem: –Heterogeneous computing and communication powers –Varying amount of distributed work Goal: –minimize the idle time by balancing the work among the processes considering efficiency of machines Balancing -> faster machines process more work than slower It cannot be statically balanced before program execution (different input data, network load, machine power and load)

Paradyn/Condor Week Tuning Example Workload balancing (App layer) Many scheduling methods -> Factoring Scheduling method –Work is divided into different-size tuples according to the factor Application must be tunable: –well known variable that represents the factor –the factor must be checked before each iteration of the work distribution –the work tuples are calculated using the factoring scheduling method and according to the current factor value

Paradyn/Condor Week Tuning Example Example application Forest Fire propagation – Xfire High computation cost Scenarios: 1) homogeneous and dedicated 2) heterogeneous and dedicated 3) heterogeneous and non-dedicated Benefits: 1) Up to 2% 2) Up to 49% 3) Up to 48%

Paradyn/Condor Week Introduction 2.Dynamic Performance Tuning 3.MATE 4.Tuning Techniques 5.Conclusions and future work Content

Paradyn/Condor Week Conclusions The principal conclusion: dynamic tuning works, is applicable, effective and useful in certain conditions Limits of such tuning -> incomplete application information Classification of layers where tuning can be performed (OS, libraries, apps) Approaches to tuning: automatic and cooperative Application knowledge representation: –measure points, performance model, tuning point/action/sync

Paradyn/Condor Week Conclusions Working prototype environment – MATE – that automatically monitors, analyses and tunes running applications Practical experiments conducted with MATE and parallel/distributed applications prove that it automatically adapts application behavior to existing conditions during run time!

Paradyn/Condor Week Future work Global and local analysis –Scalability (problems with global analysis) –Some problems can be treated locally Performance analysis –How tuning techniques influence other techniques –Other approaches than performance model Metrics –Complementary information provided by metrics Provision of the application knowledge –Tunlet provided externally in a declarative manner Instrumentation evaluation –Prediction of monitoring and tuning instrumentation cost

Paradyn/Condor Week Future work Tuning techniques –OS layer TCP/IP options (e.g. sending without delay – Nagles algorithm) I/O operations (e.g. read/write operations, I/O buffer size) –Library layer Investigation of problems in MPI, numerical libraries –Application layer Automatic selection of algorithm (e.g. sorting algorithm) Recommendations –Provision of good explanation to the user Towards grid

Paradyn/Condor Week 2004 Thesis March, 2004 Thank you very much