Download presentation
Presentation is loading. Please wait.
Published byMegan Evans Modified over 9 years ago
1
Dynamic Feedback: An Effective Technique for Adaptive Computing Pedro Diniz and Martin Rinard Department of Computer Science University of California, Santa Barbara http://www.cs.ucsb.edu/~{pedro,martin}
2
Basic Issue: Efficient Implementation of Atomic Operations in Object-Based Languages Approach: Reduce Lock Overhead by Coarsening Lock Granularity Problem: Coarsening Lock Granularity May Reduce Available Concurrency
3
Solution: Dynamic Feedback Multiple Lock Coarsening Policies Dynamic Feedback Generate Multiple Versions of Code Measure Dynamic Overhead of Each Policy Dynamically Select Best Version Context Parallelizing Compiler Irregular Object-Based Programs Pointer-Based Data Structures Commutativity Analysis
4
Talk Outline Lock Coarsening Dynamic Feedback Experimental Results Related Work Conclusions
5
Model of Computation Parallel Programs Serial Phases Parallel Phases Atomic Operations on Shared Objects Mutual Exclusion Locks Acquire Constructs Release Constructs Atomic Operations Serial Phase Serial Phase Parallel Phase L.acquire() L.release() Mutual Exclusion Region
6
Problem: Lock Overhead L.acquire() L.release() L.acquire() L.release()
7
Solution: Lock Coarsening OriginalAfter Lock Coarsening L.acquire() L.release() L.acquire() L.release() L.acquire() L.release() Reference: Diniz and Rinard “Synchronization Transformations for Parallel Computing”, POPL97
8
Lock Coarsening Trade-Off Advantage: Reduces Number of Executed Acquires and Releases Reduces Acquire and Release Overhead Disadvantage: May Introduce False Exclusion Multiple Processors Attempt to Acquire Same Lock Processor Holding the Lock is Executing Code that was Originally in No Mutual Exclusion Region
9
False Exclusion OriginalAfter Lock Coarsening L.acquire() L.release() L.acquire() L.release() L.acquire() L.release() L.acquire() L.release() L.acquire() L.release() False Exclusion
10
Lock Coarsening Policy Goal: Limit Potential Severity of False Exclusion Mechanism: Multiple Lock Coarsening Policies Original: Never Coarsen Granularity Bounded: Coarsen Granularity Only Within Cycle-Free Subgraphs of ICFG Aggressive: Always Coarsen Granularity
11
Choosing Best Policy Best Lock Coarsening Policy May Depend On Topology of Data Structures Dynamic Schedule Of Computation Information Required to Choose Best Policy Unavailable at Compile Time Complications Different Phases May Have Different Best Policy In Same Phase, Best Policy May Change Over Time
12
Solution: Dynamic Feedback Generated Code Executes Sampling Phases: Measure Performance of Different Policies Production Phases : Use Best Policy From Sampling Phase Periodically Resample to Discover Best Policy Changes AggressiveOriginalBounded Time Overhead Sampling Phase Production PhaseSampling Phase Aggressive Code Version Original
13
Guaranteed Performance Bounds Assumptions: Overhead Changes Bounded by Exponential Decay Functions Worst Case Scenario: No Useful Work During Sampling Phase Sampled Overheads Are Same For All Versions Overhead of Selected Version Increases at Maximum Rate Overhead of Other Versions Decreases at Maximum Rate S P SS Overhead Time V0V0
14
Guaranteed Performance Bound Definition 1. Policy p is at Most Worse Than Policy p over a Time Interval T if Work = 0 T (1 - o i (t)) dt where (1 - ) P + (1/ ) e (- P) Š ( - 1) SN + (1/ ) Result 1. To Guarantee this Bound Work - Work Š T T i T j T i j i Definition 2. Dynamic Feedback is at Most Worse Than the Optimal if Work - Work Š (P+SN) P+SN opt P 0 where Work = 1 P+SN (1 - o 1 (t)) dt P+SN opt
15
Guaranteed Performance Bounds (1 - ) P + (1/ ) e (- P) ( - 1) SN + (1/ ) Production Interval P Constraint Values Feasible Region Production Interval Too Long: May Execute Suboptimal Policy for Long Time Production Interval Too Short: Unable to Amortize Sampling Overhead Basic Constraint: Decay Rate ( ) Must be Small Enough
16
Dynamic Feedback: Implementation Code Generation Measuring Policy Overhead Interval Selection Interval Expiration Policy Switch
17
Code Generation Statically Generate Different Code Versions for Each Policy Alternative: Dynamic Code Generation Advantages of Static Code Generation: Simplicity of Implementation Fast Policy Switching Potential Drawback of Static Code Generation Code Size (In Practice Not a Problem)
18
Measuring Policy Overhead Sources of Overhead Locking Overhead Waiting Overhead Compute Locking Overhead Count Number of Executed Acquire/Release Constructs Estimate Waiting Overhead Count Number of Spins on Locks Waiting to be Released Sampling Time Sampled Overhead = Number of Spins Number of Acquire/Release x x Spin Time Acquire/Release Execution Time ( ) + ( )
19
Interval Selection and Expiration Fixed Interval Values Sampling Interval: 10 milliseconds Production Interval: 10 seconds Good Results for Wide Range of Interval Values Polling Code for Expiration Detection Location: Back Edges of Parallel Loop Advantage: Low Overhead Disadvantage: Potential Interaction with Iteration Size Atomic Operations Polling Points
20
Policy Switch Synchronous Processors Poll Timer to Detect Interval Expiration Barrier At End of Each Interval Advantages: Consistent Transitions Clean Overhead Measurements Disadvantages: Need to Synchronize All Processors Potential Idle Time At Barrier
21
Experimental Results Parallelizing Compiler Based on Commutativity Analysis [PLDI’96] Set of Complete Scientific Applications Barnes-Hut N-Body Solver (1500 lines of C++) Liquid Water Simulation Code (1850 lines of C++) Seismic Modeling String Code (2050 lines of C++) Different Lock Coarsening Policies Dynamic Feedback Performance on Stanford DASH Multiprocessor
22
Code Sizes 0 20 40 60 Size Text Segment (Kbytes) Barnes-Hut Serial Original Dynamic 0 20 40 60 Size Text Segment (Kbytes) Water Serial Original Dynamic 0 20 40 60 Size Text Segment (Kbytes) String Serial Original Dynamic
23
Lock Overhead 0 20 40 60 Percentage Lock Overhead Barnes-Hut (16K Particles) Original Bounded Aggressive Percentage of Time that the Single Processor Execution Spends Acquiring and Releasing Mutual Exclusion Locks 0 20 40 60 Percentage Lock Overhead Water (512 Molecules) Original Bounded Aggressive 0 20 40 60 Percentage Lock Overhead String (Big Well Model) Original Aggressive
24
Contention Overhead Contention Percentage Percentage of Time that Processors Spend Waiting to Acquire Locks Held by Other Processors 100 0 25 50 75 0481216 Processors 0 25 50 75 100 0481216 Processors 0 25 50 75 100 0481216 Processors Original Bounded Aggressive Barnes-Hut (16K Particles) Water (512 Molecules) String (Big Well Model)
25
Performance Results: Barnes-Hut Ideal Aggressive Dynamic Feedback Bounded Original Barnes-Hut on DASH (16K Particles) 0 4 8 12 16 0481216 Number of Processors Speedup
26
Performance Results: Water Ideal Bounded Original Aggressive Dynamic Feedback Water on DASH (512 Molecules) 0 4 8 12 16 0481216 Number of Processors Speedup
27
Performance Results: String String on DASH (Big Well Model) Ideal Original Aggressive Dynamic Feedback 0 4 8 12 16 0481216 Number of Processors Speedup
28
Summary Code Size Is Not An Issue Lock Coarsening Has Significant Performance Impact Best Lock Coarsening Policy Varies With Application Dynamic Feedback Delivers Code With Performance Comparable to The Best Static Lock Coarsening Policy
29
Related Work Adaptive Execution Techniques (Saavedra Park:PACT96) Dynamic Dispatch Optimizations (Hölzle Ungar:PLDI94) Dynamic Code Generation (Engler:PLDI96) Profiling (Brewer:PPoPP95) Synchronization Optimizations (Plevyak et al:POPL95)
30
Conclusions Dynamic Feedback Generated Code Adapts to Different Execution Environments Integration with Parallelizing Compiler Irregular Object-Based Programs Pointer-Based Linked Data Structures Commutativity Analysis Evaluation with Three Complete Applications Performance Comparable to Best Hand-Tuned Optimization
31
BACKUP SLIDES
32
0 2 4 6 8 10 12 14 16 Speedup 0246810121416 Number of Processors Ideal Aggressive Bounded Original Barnes-Hut (16K Particles) Performance Results : Barnes-Hut
33
Performance Results: Water Ideal Aggressive Bounded Original 0 2 4 6 8 10 12 14 16 0246810121416 Speedup Number of Processors Water (512 Molecules)
34
Performance Results: String String (Big Well Model) Speedup Number of Processors 0 2 4 6 8 10 12 14 16 0246810121416 Ideal Original Aggressive
35
Policy Switch Timer Expires Policy 1 Policy 2 Timer Expires
36
Motivation Challenges: Match Best Implementation to Environment Heterogeneous and Mobile Systems Goal: Develop Mechanisms to Support Code that Adapts to Environment Characteristics Technique: Dynamic Feedback
37
Overhead for Barnes-Hut 0 0.1 0.2 0.3 0.4 0.5 0510152025 Sampled Overhead Execution Time (Seconds) Original Aggressive Bounded Barnes-Hut on DASH (8 Processors) FORCES Loop Data Set - 16K Particles
38
Overhead for Water Water on DASH (8 Processors) INTERF Loop Data Set - 512 Molecules 0 0.1 0.2 0.3 0.4 0.5 0102030405060 Sampled Overhead Execution Time (Seconds) Original Bounded
39
Overhead for Water Water on DASH (8 Processors) POTENG Loop Data Set - 512 Molecules 0 0.2 0.4 0.6 0.8 1 0102030405060 Sampled Overhead Execution Time (Seconds) Aggressive Original
40
Overhead for String String on DASH (8 Processors) PROJFWD Loop Data Set -Big Well 0 0.2 0.4 0.6 0.8 1 0100200300400500 Sampled Overhead Execution Time (Seconds) Aggressive Original
41
Dynamic Feedback AggressiveOriginalBounded Time Overhead Sampling PhaseProduction PhaseSampling Phase Aggressive Code Version
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.