Download presentation
Presentation is loading. Please wait.
Published byMercy Bell Modified over 9 years ago
1
UAB Dynamic Tuning of Master/Worker Applications Anna Morajko, Paola Caymes Scutari, Tomàs Margalef, Eduardo Cesar, Joan Sorribes and Emilio Luque Universitat Autònoma de Barcelona Paradyn/Condor Week 2005 March 2005
2
2 Outline Introduction MATE Number of workers Data distribution Conclusions
3
3 Outline Introduction MATE Number of workers Data distribution Conclusions
4
4 Introduction Application performance The main goal of parallel/distributed applications: solve a considered problem in the possible fastest way Performance is one of the most important issues Developers must optimize application performance to provide efficient and useful applications
5
5 Introduction (II) Difficulties in finding bottlenecks and determining their solutions for parallel/distributed applications Many tasks that cooperate with each other Application behavior may change on input data or environment Difficult task especially for non-expert users
6
6 Outline Introduction MATE Number of workers Data distribution Conclusions
7
7 MATE Monitoring, Analysis and Tuning Environment Dynamic automatic tuning of parallel/distributed applications Modifications Instrumentation User TuningMonitoring Tool Solution Problem / Performance analysis Performance data Application development Application Execution Source Events DynInst
8
8 MATE (II) Machine 1 Machine 2 Machine 3 pvmd Analyzer pvmd AC instr. events modif. events DMLib Task 1 Task 2 Task 3 instr. AC Application Controller - AC Dynamic Monitoring Library - DMLib Analyzer
9
9 MATE (II) Machine 1 Machine 2 Machine 3 pvmd Analyzer pvmd AC instr. events modif. events DMLib Task 1 Task 2 Task 3 instr. AC Application Controller - AC Dynamic Monitoring Library - DMLib Analyzer Analyzer Carries out the application performance analysis Detects problems “on the fly” and requests changes
10
10 MATE (II) Machine 1 Machine 2 Machine 3 pvmd Analyzer pvmd AC instr. events modif. events DMLib Task 1 Task 2 Task 3 instr. AC Application Controller - AC Dynamic Monitoring Library - DMLib Analyzer Application Controller (AC) Controls the execution of the application Has a Monitor module to manage instrumentation via DynInst and gather execution information Has a Tuner module to perform tuning via DynInst
11
11 MATE (II) Machine 1 Machine 2 Machine 3 pvmd Analyzer pvmd AC instr. events modif. events DMLib Task 1 Task 2 Task 3 instr. AC Application Controller - AC Dynamic Monitoring Library - DMLib Analyzer Dynamic Monitoring Library (DMLib) Facilitates the instrumentation and data collection Responsible for registration of events
12
12 MATE (III) Automatic performance Analysis on the fly Find bottlenecks among events applying performance model Find solutions that overcome bottlenecks Analyzer is provided with an application knowledge about performance problems Information related to one problem is called a tuning technique A tuning technique describes a complete performance optimization scenario
13
13 MATE (IV) Each tuning technique is implemented in MATE as a “tunlet” A tunlet is a C/C++ library dynamically loaded to the Analyzer process measure points – what events are needed performance model – how to determine bottlenecks and solutions tuning actions/points/synchronization - what to change, where, when Analyzer Tunlet Measure pointsTuning point, action, sync Performance model
14
14 MATE (V) Events (from DMLibs) via TCP/IP Event Collector thread DTAPI Controller Tunlet Event Repository Application model AC Proxy Tuning request (to tuner) via TCP/IP Instrument. request (to monitor) via TCP/IP MetaData (from ACs) via TCP/IP Tunlet
15
15 Outline Introduction MATE Number of workers Data distribution Conclusions
16
16 Number of Workers Master/Worker paradigm Easy to understand concept, but with some bottlenecks Example: inadequate number of workers - workers master idle + workers + communication Master Worker
17
17 Number of Workers (II) Master Workers if tl > then + else Execution Trace of an Homogeneous Master- Worker Application (where are homogeneous: message size workers execution time) Where... tl = latency λ = inverse bandwidth v i = size of tasks sent to worker i, in bytes. n = current number of workers in the application.
18
18 Number of Workers (II) Master Workers tc i Execution Trace of an Homogeneous Master- Worker Application (where are homogeneous: message size workers execution time) Where... tc i = time that worker i spends processing a task
19
19 Number of Workers (II) Master Workers tl + λ*v m Execution Trace of an Homogeneous Master- Worker Application (where are homogeneous: message size workers execution time) Where... tl = latency λ = inverse bandwidth v m = size of results sent back to master
20
20 Number of Workers (III)
21
21 Number of Workers (IV)
22
22 Number of Workers: Tunlet Measure points: The amount of data sent to the workers and received by the master The total computational time of workers The network overhead and bandwidth Machine A (master) Machine B (worker) time receive (entry) receive (exit) send (exit) send (entry) receive (exit) send (entry) send (exit) receive (entry)
23
23 Number of Workers: Tunlet (II) Performance function: Calculation of the optimal number of workers: Tuning actions: To change the value of “numworkers” to add or remove as many workers as is needed
24
24 Experimentation Example application Forest Fire Propagation simulator – Xfire Intensive computing application Master/Worker Simulation of the fireline propagation Calculates the next position of the fireline considering the current fireline position and weather factors, vegetation,etc. Platform Cluster of Pentium 4, 1.8Ghz, SuSE Linux 8.0, connected by 100Mb/sec network
25
25 Experimentation (II) Load in the system We designed different external load patterns They simulate the system’s time-sharing Allow us to reproduce experiments Case Studies Xfire executed with different fixed number of workers without any tuning, introducing external loads Xfire executed under MATE, introducing external loads
26
26 Experimentation (III) 1 2 4 6 8 10 12 14 16 18 20 22 24 26 Xf+MATE 0 200 400 600 800 1000 1200 1400 Case studies Execution time (Sec.) Note that... Execution time of Xfire under MATE is close to the best execution times obtained. Resources devoted to the application using MATE, are used when they are really needed. Starts with 1 worker and adapts it
27
27 Experimentation (IV) Statically, the model fits Dynamically, there are some problems N opt Could be extremely high Computation power added or removed may be not significant considering the previous computational power Solution Finding a “reasonable” number of workers that define a trade off between resources utilization and execution time.
28
28 Outline Introduction MATE Number of workers Data distribution Conclusions
29
29 Data Distribution Imbalance Problem: Heterogeneous computing and communication powers Varying amount of distributed work Master Workers Unbalanced iterationBalanced iteration
30
30 Data Distribution (II) Goal: minimize the idle time by balancing the work among the processes considering efficiency of machines Performance Model Factoring Scheduling method Work is divided into different-size tuples according to the factor Work size (N) Number of Workers (P) Factor (f) Tuples 100021500,500 100020.5250,250,125,125,63,63,32,32,16,16,8,8,4,4,2,2,1,1
31
31 Data Distribution: Tunlet Measure points: The work unit processing time. The latency and bandwidth Performance function: Calculation of the factor. Analyzer simulates the execution considering different factors. Finally, it decides the best factor. Currently we are working on an analytical model to determine the factor Tuning actions: To change the value of “TheFactorF”
32
32 Experimentation Example application Forest Fire Propagation simulator – Xfire Platform Cluster of Pentium 4, 1.8Ghz, SuSE Linux 8.0, connected by 100Mb/sec network
33
33 Experimentation (II) Load in the system We designed different external load patterns They simulate the system’s time-sharing Permit us to reproduce experiments Study Cases Xfire executed without any tuning Xfire, introducing controlled variable external loads Xfire executed under MATE, introducing variable external loads
34
34 Experimentation (III) Note that… Introduction of an extra load increases the execution time. Execution with MATE corrects the factor value to improve the execution time
35
35 Outline Introduction MATE Number of workers Data distribution Conclusions
36
36 Conclusions and open lines Conclusions Prototype environment – MATE – automatically monitors, analyses and tunes running applications Practical experiments conducted with MATE and parallel/distributed applications prove that it automatically adapts application behavior to existing conditions during run time MATE in particular is able to tune Master/Worker applications and overcome the possible bottlenecks: number of workers and data distribution Dynamic tuning works, is applicable, effective and useful in certain conditions.
37
37 Conclusions and open lines Open Lines Determining the “reasonable” number of workers. Considering interaction between different tunlets. Providing the system with other tuning techniques.
38
38 Thank you…
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.