Download presentation
Presentation is loading. Please wait.
Published byShanon Chapman Modified over 9 years ago
1
Trace Generation to Simulate Large Scale Distributed Application Olivier Dalle, Emiio P. ManciniMar. 8th, 2012
2
O.Dalle, E.P. Mancini - Trace generation for large scale distributed applications- 2 Outline Introduction The trace collection The hierarchical architecture The components An example Conclusion
3
Mar. 8th, 2012 O.Dalle, E.P. Mancini - Trace generation for large scale distributed applications- 3 Introduction Most distributed systems, as the Grids, offer massively parallel but loosely coupled resources: an accurate application’s model can help the scheduling decisions Simulators of parallel and distributed applications need accurate model of application behavior: but the size of the traces for long running parallel applications tends to explode
4
Mar. 8th, 2012 O.Dalle, E.P. Mancini - Trace generation for large scale distributed applications- 4 Introduction One solution is to buffer data locally, gathering them after the end of the program (post-mortem): there is some scalability issue We need to minimize the perturbation: the instrumentation compete with the application for the system’s resources.
5
Introduction A distributed application is composed by a set of cooperating tasks The connection between them are in general not homogenous Networks may present some hierarchy, e.g. fat trees, multi switch hops... Can we exploit that hierarchy on the trace generation/instrumentation purposes? O.Dalle, E.P. Mancini - Trace generation for large scale distributed applications- 5
6
The Trace Collection: a Simplified Schema O.Dalle, E.P. Mancini - Trace generation for large scale distributed applications- 6 The classical computational cluster execution model: Several task on several nodes (e.g., MPI) CPU GPU Core
7
The Trace Collection: a Simplified Schema O.Dalle, E.P. Mancini - Trace generation for large scale distributed applications- 7 We need to measure some parameters on each task, collect local data, and gather them.
8
The Trace Collection: a Simplified Schema O.Dalle, E.P. Mancini - Trace generation for large scale distributed applications- 8 We gather the data hierarchically, using local collectors, eventually making local decimations or pre- elaborations. We use the locality principle. In a Grid it is common to have a low quality connecting link between the V.O. sites In HPC the bandwidth of upper levels is shared between more hosts than lower levels
9
The Trace Collection O.Dalle, E.P. Mancini - Trace generation for large scale distributed applications- 9 1. Infrastructure initialization 2. Execution with instrumentation a.Environment update (e.g., LD_PRELOAD) b.Middleware launcher (e.g., mpiexec, qsub …) 3. Data collection a.Overhead estimation b.Events’ measurement 4. Processing and Propagation a.Decimation b.Compression c.Buffering d.… 5. Trace generation a.Data collection b.Post-processing c.Simulator’s trace generation
10
The architecture O.Dalle, E.P. Mancini - Trace generation for large scale distributed applications- 10
11
The sensors O.Dalle, E.P. Mancini - Trace generation for large scale distributed applications- 11 The sensors: Instrument the application’s tasks Compute the instrumentation’s overhead Collect the raw data Send them to the first level collectors
12
The sensors O.Dalle, E.P. Mancini - Trace generation for large scale distributed applications- 12 We assume the system to be heterogeneous Every sensor makes an overhead analysis Then it propagates the information to the management unit
13
The collectors Dec. 14th, 2011 O.Dalle, E.P. Mancini - Trace generation for large scale distributed applications- 13 The collectors gather data from sensors and from other collectors Buffer incoming data Process collected data before sending them to upper levels Decimation Compression …
14
The Management Unit O.Dalle, E.P. Mancini - Trace generation for large scale distributed applications- 14 Launches the collector daemons Launches the application Gather the data from the top collector Convert and store the data in the required format Managed with scripts or graphical interface
15
An Example of Data Collection O.Dalle, E.P. Mancini - Trace generation for large scale distributed applications- 15 We are interested to analyze the I/O of a parallel synthetic benchmark We want to check the overhead The benchmark is a MPI application of n tasks Every task runs on a different node and writes random data on the local file system
16
An Example of Data Collection: O.Dalle, E.P. Mancini - Trace generation for large scale distributed applications- 16 We use the management unit to: 1.Create a hierarchical schema 2.Create the MPI launch scripts 3.Launch the collectors and the instrumented application 1.Collect the results $MPIDIR/mpiexec.hydra \ -env LD_PRELOAD \ $DTDIR/libdt_sensor.so \ $HOME/bench/bench mpiexec qsub …
17
Conclusion O.Dalle, E.P. Mancini - Trace generation for large scale distributed applications- 17 Collecting large traces in distributed systems may perturb the application’s execution. We presented a system that efficiently collects traces at run-time or post-mortem. We use a hierarchical schema matching the network links’ capacity, with distributed buffering and processing Future improvement will include the automatic discovery of the network topology
18
Thank you
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.