Presentation is loading. Please wait.

Presentation is loading. Please wait.

SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston.

Similar presentations


Presentation on theme: "SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston."— Presentation transcript:

1 SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston

2 2 Resource Selection for Network/Grid Applications Application Network ? where is the best performance Data Sim 1 GUI Model Pre Stream

3 3 Current approaches to Node Selection 1. Measure and model network properties, such as available bandwidth and CPU loads (with tools like NWS) 2. Find “best” nodes for execution based on network status But expected application performance based on measured resource status may not be accurate depends on application characteristics – hard to model translation, e.g., unused bandwidth vs expected throughput data may be stale as frequent measurements are expensive Data Sim 1 GUI Model Pre Stream

4 4 Our Approach Application Network PREDICT APPLICATION PERFORMANCE BY RUNNING A SMALL PROGRAM REPRESENTATIVE OF ACTUAL DISTRIBUTED APPLICATION Data Sim 1 GUI Model Pre Stream

5 5 Performance Skeleton is a synthetic short running program whose execution characteristics mirror the application it represents An application and its skeleton have similar communication pattern CPU usage memory usage synchronization pattern Goal: Performance of a skeleton is directly related to the performance of the application under any condition e.g., a skeleton executes in.1% of the time the application takes to execute on any part of a shared network Performance Skeleton

6 6 Central Contribution of This Paper Data Sim 1 GUI Model Pre Stream Data Sim 1 GUI Model Pre Stre am CREATE SKELETON Framework for Automatic Construction of Performance Skeletons Application Skeleton

7 7 Data Sim 1 GUI Model Pre Stream Data Sim 1 GUI Model Pre Stre am CREATE SKELETON Automatic Construction of Skeletons Record Execution Trace Application Skeleton Compress execution trace into execution signature Construct skeleton program from execution signature

8 8 Data Sim 1 GUI Model Pre Stream Data Sim 1 GUI Model Pre Stre am CREATE SKELETON Automatic Construction of Skeletons Record Execution Trace Application Skeleton Compress execution trace into execution signature Construct skeleton program from execution signature

9 9 Recording of Execution Trace Implemented for MPI applications Link MPI application with PMPI based profiling library –no source code modification / analysis required Execute on a dedicated testbed Records all MPI function calls –Call name, start time, stop time, parameters passed –Timing done to microsecond granularity CPU busy = time between two consecutive MPI calls

10 10 Data Sim 1 GUI Model Pre Stream Data Sim 1 GUI Model Pre Stre am CREATE SKELETON Automatic Construction of Skeletons Record Execution Trace Application Skeleton Compress execution trace into execution signature Construct skeleton program from execution signature

11 11 Generation of Execution Signature …1 Application execution typically follows cyclic patterns Goal: Determine cyclic patterns and form loop structure by identifying repeating execution behavior. –Repeating patterns should be broadly similar Step 1:Execution trace to symbol strings –Cluster similar execution events Replace all events in cluster by average event –Each cluster is then assigned a unique symbol –Execution trace is replaced by string of symbols: , , , , , , , , , , , , , , , , , , ,  …

12 12 Generation of Execution Signature …2 Step 2: Compress string by Identifying Cycles –Similar to longest substring matching problem –Algorithm builds loop structure recursively from symbol strings e.g. , , , , , , , , , , , , , , , , , , ,  is replaced by [ , ,  ] 4, [ ,[  ] 2,  ] 2 –Typically signature is multiple orders of magnitude smaller than trace Step 3: Adaptively increase degree of clustering –until signature is compact enough

13 13 Data Sim 1 GUI Model Pre Stream Data Sim 1 GUI Model Pre Stre am CREATE SKELETON Automatic Construction of Skeletons Record Execution Trace Application Skeleton Compress execution trace into execution signature Construct skeleton program from execution signature

14 14 Generate Performance Skeleton Program Goal:Execution time of performance skeleton should be a fixed factor K less than application execution time Reduce Iterations of each loop by a factor K –Add remainder iterations to events outside of all loops Process events outside loop as follows: –Reduce execution time of compute operations by a factor K –Reduce execution time of message exchanges by reducing bytes exchanged by a factor K Communication operations not scaled linearly due to latency. Considering latency would make approach architecture-specific Replace symbols by C language statements

15 15 Experimental Validation Skeletons constructed for Class B NAS MPI benchmarks are executed in following sharing scenarios Competing processes on one node Competing processes on all nodes Competing traffic on one link Competing traffic on all links Competing process and traffic on one node and link Skeleton execution time is used to predict application execution time. Setup: Intel Xeon dual CPU 1.7 GHz nodes running Linux 2.4.7. Gigabit crossbar switch. iproute to simulate link sharing

16 16 Prediction Accuracy Graph shows error between predicted and measured application execution time Skeleton execution is 1/10 th of Application execution average error: 6% max error 18% Error is higher for scenarios with competing traffic

17 17 Comparison with other methods Average Prediction: Average slowdown of entire benchmark is used to predict execution time for each program. Class S Prediction: Class S benchmark(~1sec) programs used as skeletons for Class B (30-900s)benchmarks

18 18 Preliminary Conclusions Performance estimation with skeleton has high accuracy Need to incorporate memory access patterns and fine grain CPU behavior for execution across architectures Implementation limited to mpi applications –basic approach should work for other paradigms Skeletons may have other uses as a fast way of estimating application performance –e.g. on a slow simulated future system

19 19 Questions Contact jaspal@uh.edu ssodhi@microsoft.com


Download ppt "SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston."

Similar presentations


Ads by Google