Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dynamic Feedback: An Effective Technique for Adaptive Computing Pedro Diniz and Martin Rinard Department of Computer Science University of California,

Similar presentations


Presentation on theme: "Dynamic Feedback: An Effective Technique for Adaptive Computing Pedro Diniz and Martin Rinard Department of Computer Science University of California,"— Presentation transcript:

1 Dynamic Feedback: An Effective Technique for Adaptive Computing Pedro Diniz and Martin Rinard Department of Computer Science University of California, Santa Barbara http://www.cs.ucsb.edu/~{pedro,martin}

2 Basic Issue: Efficient Implementation of Atomic Operations in Object-Based Languages Approach: Reduce Lock Overhead by Coarsening Lock Granularity Problem: Coarsening Lock Granularity May Reduce Available Concurrency

3 Solution: Dynamic Feedback Multiple Lock Coarsening Policies Dynamic Feedback Generate Multiple Versions of Code Measure Dynamic Overhead of Each Policy Dynamically Select Best Version Context Parallelizing Compiler Irregular Object-Based Programs Pointer-Based Data Structures Commutativity Analysis

4 Talk Outline Lock Coarsening Dynamic Feedback Experimental Results Related Work Conclusions

5 Model of Computation Parallel Programs Serial Phases Parallel Phases Atomic Operations on Shared Objects Mutual Exclusion Locks Acquire Constructs Release Constructs Atomic Operations Serial Phase Serial Phase Parallel Phase L.acquire() L.release() Mutual Exclusion Region

6 Problem: Lock Overhead L.acquire() L.release() L.acquire() L.release()

7 Solution: Lock Coarsening OriginalAfter Lock Coarsening L.acquire() L.release() L.acquire() L.release() L.acquire() L.release() Reference: Diniz and Rinard “Synchronization Transformations for Parallel Computing”, POPL97

8 Lock Coarsening Trade-Off Advantage: Reduces Number of Executed Acquires and Releases Reduces Acquire and Release Overhead Disadvantage: May Introduce False Exclusion Multiple Processors Attempt to Acquire Same Lock Processor Holding the Lock is Executing Code that was Originally in No Mutual Exclusion Region

9 False Exclusion OriginalAfter Lock Coarsening L.acquire() L.release() L.acquire() L.release() L.acquire() L.release() L.acquire() L.release() L.acquire() L.release() False Exclusion

10 Lock Coarsening Policy Goal: Limit Potential Severity of False Exclusion Mechanism: Multiple Lock Coarsening Policies Original: Never Coarsen Granularity Bounded: Coarsen Granularity Only Within Cycle-Free Subgraphs of ICFG Aggressive: Always Coarsen Granularity

11 Choosing Best Policy Best Lock Coarsening Policy May Depend On Topology of Data Structures Dynamic Schedule Of Computation Information Required to Choose Best Policy Unavailable at Compile Time Complications Different Phases May Have Different Best Policy In Same Phase, Best Policy May Change Over Time

12 Solution: Dynamic Feedback Generated Code Executes Sampling Phases: Measure Performance of Different Policies Production Phases : Use Best Policy From Sampling Phase Periodically Resample to Discover Best Policy Changes AggressiveOriginalBounded Time Overhead Sampling Phase Production PhaseSampling Phase Aggressive Code Version Original

13 Guaranteed Performance Bounds Assumptions: Overhead Changes Bounded by Exponential Decay Functions Worst Case Scenario: No Useful Work During Sampling Phase Sampled Overheads Are Same For All Versions Overhead of Selected Version Increases at Maximum Rate Overhead of Other Versions Decreases at Maximum Rate S P SS Overhead Time V0V0

14 Guaranteed Performance Bound Definition 1. Policy p is at Most  Worse Than Policy p over a Time Interval T if Work =  0 T (1 - o i (t)) dt where (1 -  ) P + (1/ ) e (- P) Š (  - 1) SN + (1/ ) Result 1. To Guarantee this Bound Work - Work Š T  T i T j T i j i Definition 2. Dynamic Feedback is at Most  Worse Than the Optimal if Work - Work Š (P+SN)  P+SN opt P 0 where Work =  1 P+SN (1 - o 1 (t)) dt P+SN opt

15 Guaranteed Performance Bounds (1 -  ) P + (1/ ) e (- P) (  - 1) SN + (1/ ) Production Interval P Constraint Values Feasible Region Production Interval Too Long: May Execute Suboptimal Policy for Long Time Production Interval Too Short: Unable to Amortize Sampling Overhead Basic Constraint: Decay Rate ( ) Must be Small Enough

16 Dynamic Feedback: Implementation Code Generation Measuring Policy Overhead Interval Selection Interval Expiration Policy Switch

17 Code Generation Statically Generate Different Code Versions for Each Policy Alternative: Dynamic Code Generation Advantages of Static Code Generation: Simplicity of Implementation Fast Policy Switching Potential Drawback of Static Code Generation Code Size (In Practice Not a Problem)

18 Measuring Policy Overhead Sources of Overhead Locking Overhead Waiting Overhead Compute Locking Overhead Count Number of Executed Acquire/Release Constructs Estimate Waiting Overhead Count Number of Spins on Locks Waiting to be Released Sampling Time Sampled Overhead = Number of Spins Number of Acquire/Release x x Spin Time Acquire/Release Execution Time ( ) + ( )

19 Interval Selection and Expiration Fixed Interval Values Sampling Interval: 10 milliseconds Production Interval: 10 seconds Good Results for Wide Range of Interval Values Polling Code for Expiration Detection Location: Back Edges of Parallel Loop Advantage: Low Overhead Disadvantage: Potential Interaction with Iteration Size Atomic Operations Polling Points

20 Policy Switch Synchronous Processors Poll Timer to Detect Interval Expiration Barrier At End of Each Interval Advantages: Consistent Transitions Clean Overhead Measurements Disadvantages: Need to Synchronize All Processors Potential Idle Time At Barrier

21 Experimental Results Parallelizing Compiler Based on Commutativity Analysis [PLDI’96] Set of Complete Scientific Applications Barnes-Hut N-Body Solver (1500 lines of C++) Liquid Water Simulation Code (1850 lines of C++) Seismic Modeling String Code (2050 lines of C++) Different Lock Coarsening Policies Dynamic Feedback Performance on Stanford DASH Multiprocessor

22 Code Sizes 0 20 40 60 Size Text Segment (Kbytes) Barnes-Hut Serial Original Dynamic 0 20 40 60 Size Text Segment (Kbytes) Water Serial Original Dynamic 0 20 40 60 Size Text Segment (Kbytes) String Serial Original Dynamic

23 Lock Overhead 0 20 40 60 Percentage Lock Overhead Barnes-Hut (16K Particles) Original Bounded Aggressive Percentage of Time that the Single Processor Execution Spends Acquiring and Releasing Mutual Exclusion Locks 0 20 40 60 Percentage Lock Overhead Water (512 Molecules) Original Bounded Aggressive 0 20 40 60 Percentage Lock Overhead String (Big Well Model) Original Aggressive

24 Contention Overhead Contention Percentage Percentage of Time that Processors Spend Waiting to Acquire Locks Held by Other Processors 100 0 25 50 75 0481216 Processors 0 25 50 75 100 0481216 Processors 0 25 50 75 100 0481216 Processors Original Bounded Aggressive Barnes-Hut (16K Particles) Water (512 Molecules) String (Big Well Model)

25 Performance Results: Barnes-Hut Ideal Aggressive Dynamic Feedback Bounded Original Barnes-Hut on DASH (16K Particles) 0 4 8 12 16 0481216 Number of Processors Speedup

26 Performance Results: Water Ideal Bounded Original Aggressive Dynamic Feedback Water on DASH (512 Molecules) 0 4 8 12 16 0481216 Number of Processors Speedup

27 Performance Results: String String on DASH (Big Well Model) Ideal Original Aggressive Dynamic Feedback 0 4 8 12 16 0481216 Number of Processors Speedup

28 Summary Code Size Is Not An Issue Lock Coarsening Has Significant Performance Impact Best Lock Coarsening Policy Varies With Application Dynamic Feedback Delivers Code With Performance Comparable to The Best Static Lock Coarsening Policy

29 Related Work Adaptive Execution Techniques (Saavedra Park:PACT96) Dynamic Dispatch Optimizations (Hölzle Ungar:PLDI94) Dynamic Code Generation (Engler:PLDI96) Profiling (Brewer:PPoPP95) Synchronization Optimizations (Plevyak et al:POPL95)

30 Conclusions Dynamic Feedback Generated Code Adapts to Different Execution Environments Integration with Parallelizing Compiler Irregular Object-Based Programs Pointer-Based Linked Data Structures Commutativity Analysis Evaluation with Three Complete Applications Performance Comparable to Best Hand-Tuned Optimization

31 BACKUP SLIDES

32 0 2 4 6 8 10 12 14 16 Speedup 0246810121416 Number of Processors Ideal Aggressive Bounded Original Barnes-Hut (16K Particles) Performance Results : Barnes-Hut

33 Performance Results: Water Ideal Aggressive Bounded Original 0 2 4 6 8 10 12 14 16 0246810121416 Speedup Number of Processors Water (512 Molecules)

34 Performance Results: String String (Big Well Model) Speedup Number of Processors 0 2 4 6 8 10 12 14 16 0246810121416 Ideal Original Aggressive

35 Policy Switch Timer Expires Policy 1 Policy 2 Timer Expires

36 Motivation Challenges: Match Best Implementation to Environment Heterogeneous and Mobile Systems Goal: Develop Mechanisms to Support Code that Adapts to Environment Characteristics Technique: Dynamic Feedback

37 Overhead for Barnes-Hut 0 0.1 0.2 0.3 0.4 0.5 0510152025 Sampled Overhead Execution Time (Seconds) Original Aggressive Bounded Barnes-Hut on DASH (8 Processors) FORCES Loop Data Set - 16K Particles

38 Overhead for Water Water on DASH (8 Processors) INTERF Loop Data Set - 512 Molecules 0 0.1 0.2 0.3 0.4 0.5 0102030405060 Sampled Overhead Execution Time (Seconds) Original Bounded

39 Overhead for Water Water on DASH (8 Processors) POTENG Loop Data Set - 512 Molecules 0 0.2 0.4 0.6 0.8 1 0102030405060 Sampled Overhead Execution Time (Seconds) Aggressive Original

40 Overhead for String String on DASH (8 Processors) PROJFWD Loop Data Set -Big Well 0 0.2 0.4 0.6 0.8 1 0100200300400500 Sampled Overhead Execution Time (Seconds) Aggressive Original

41 Dynamic Feedback AggressiveOriginalBounded Time Overhead Sampling PhaseProduction PhaseSampling Phase Aggressive Code Version


Download ppt "Dynamic Feedback: An Effective Technique for Adaptive Computing Pedro Diniz and Martin Rinard Department of Computer Science University of California,"

Similar presentations


Ads by Google