Dynamic Feedback: An Effective Technique for Adaptive Computing Pedro Diniz and Martin Rinard Department of Computer Science University of California,

Dynamic Feedback: An Effective Technique for Adaptive Computing Pedro Diniz and Martin Rinard Department of Computer Science University of California, Santa Barbara http://www.cs.ucsb.edu/~{pedro,martin}

Basic Issue: Efficient Implementation of Atomic Operations in Object-Based Languages Approach: Reduce Lock Overhead by Coarsening Lock Granularity Problem: Coarsening Lock Granularity May Reduce Available Concurrency

Solution: Dynamic Feedback Multiple Lock Coarsening Policies Dynamic Feedback Generate Multiple Versions of Code Measure Dynamic Overhead of Each Policy Dynamically Select Best Version Context Parallelizing Compiler Irregular Object-Based Programs Pointer-Based Data Structures Commutativity Analysis

Talk Outline Lock Coarsening Dynamic Feedback Experimental Results Related Work Conclusions

Model of Computation Parallel Programs Serial Phases Parallel Phases Atomic Operations on Shared Objects Mutual Exclusion Locks Acquire Constructs Release Constructs Atomic Operations Serial Phase Serial Phase Parallel Phase L.acquire() L.release() Mutual Exclusion Region

Problem: Lock Overhead L.acquire() L.release() L.acquire() L.release()

Solution: Lock Coarsening OriginalAfter Lock Coarsening L.acquire() L.release() L.acquire() L.release() L.acquire() L.release() Reference: Diniz and Rinard “Synchronization Transformations for Parallel Computing”, POPL97

Lock Coarsening Trade-Off Advantage: Reduces Number of Executed Acquires and Releases Reduces Acquire and Release Overhead Disadvantage: May Introduce False Exclusion Multiple Processors Attempt to Acquire Same Lock Processor Holding the Lock is Executing Code that was Originally in No Mutual Exclusion Region

False Exclusion OriginalAfter Lock Coarsening L.acquire() L.release() L.acquire() L.release() L.acquire() L.release() L.acquire() L.release() L.acquire() L.release() False Exclusion

Lock Coarsening Policy Goal: Limit Potential Severity of False Exclusion Mechanism: Multiple Lock Coarsening Policies Original: Never Coarsen Granularity Bounded: Coarsen Granularity Only Within Cycle-Free Subgraphs of ICFG Aggressive: Always Coarsen Granularity

Choosing Best Policy Best Lock Coarsening Policy May Depend On Topology of Data Structures Dynamic Schedule Of Computation Information Required to Choose Best Policy Unavailable at Compile Time Complications Different Phases May Have Different Best Policy In Same Phase, Best Policy May Change Over Time

Solution: Dynamic Feedback Generated Code Executes Sampling Phases: Measure Performance of Different Policies Production Phases : Use Best Policy From Sampling Phase Periodically Resample to Discover Best Policy Changes AggressiveOriginalBounded Time Overhead Sampling Phase Production PhaseSampling Phase Aggressive Code Version Original

Guaranteed Performance Bounds Assumptions: Overhead Changes Bounded by Exponential Decay Functions Worst Case Scenario: No Useful Work During Sampling Phase Sampled Overheads Are Same For All Versions Overhead of Selected Version Increases at Maximum Rate Overhead of Other Versions Decreases at Maximum Rate S P SS Overhead Time V0V0

Guaranteed Performance Bound Definition 1. Policy p is at Most  Worse Than Policy p over a Time Interval T if Work = 0 T (1 - o i (t)) dt where (1 -  ) P + (1/ ) e (- P) Š (  - 1) SN + (1/ ) Result 1. To Guarantee this Bound Work - Work Š T  T i T j T i j i Definition 2. Dynamic Feedback is at Most  Worse Than the Optimal if Work - Work Š (P+SN)  P+SN opt P 0 where Work = 1 P+SN (1 - o 1 (t)) dt P+SN opt

Guaranteed Performance Bounds (1 -  ) P + (1/ ) e (- P) (  - 1) SN + (1/ ) Production Interval P Constraint Values Feasible Region Production Interval Too Long: May Execute Suboptimal Policy for Long Time Production Interval Too Short: Unable to Amortize Sampling Overhead Basic Constraint: Decay Rate ( ) Must be Small Enough

Dynamic Feedback: Implementation Code Generation Measuring Policy Overhead Interval Selection Interval Expiration Policy Switch

Code Generation Statically Generate Different Code Versions for Each Policy Alternative: Dynamic Code Generation Advantages of Static Code Generation: Simplicity of Implementation Fast Policy Switching Potential Drawback of Static Code Generation Code Size (In Practice Not a Problem)

Measuring Policy Overhead Sources of Overhead Locking Overhead Waiting Overhead Compute Locking Overhead Count Number of Executed Acquire/Release Constructs Estimate Waiting Overhead Count Number of Spins on Locks Waiting to be Released Sampling Time Sampled Overhead = Number of Spins Number of Acquire/Release x x Spin Time Acquire/Release Execution Time ( ) + ( )

Interval Selection and Expiration Fixed Interval Values Sampling Interval: 10 milliseconds Production Interval: 10 seconds Good Results for Wide Range of Interval Values Polling Code for Expiration Detection Location: Back Edges of Parallel Loop Advantage: Low Overhead Disadvantage: Potential Interaction with Iteration Size Atomic Operations Polling Points

Policy Switch Synchronous Processors Poll Timer to Detect Interval Expiration Barrier At End of Each Interval Advantages: Consistent Transitions Clean Overhead Measurements Disadvantages: Need to Synchronize All Processors Potential Idle Time At Barrier

Experimental Results Parallelizing Compiler Based on Commutativity Analysis [PLDI’96] Set of Complete Scientific Applications Barnes-Hut N-Body Solver (1500 lines of C++) Liquid Water Simulation Code (1850 lines of C++) Seismic Modeling String Code (2050 lines of C++) Different Lock Coarsening Policies Dynamic Feedback Performance on Stanford DASH Multiprocessor

Code Sizes 0 20 40 60 Size Text Segment (Kbytes) Barnes-Hut Serial Original Dynamic 0 20 40 60 Size Text Segment (Kbytes) Water Serial Original Dynamic 0 20 40 60 Size Text Segment (Kbytes) String Serial Original Dynamic

Lock Overhead 0 20 40 60 Percentage Lock Overhead Barnes-Hut (16K Particles) Original Bounded Aggressive Percentage of Time that the Single Processor Execution Spends Acquiring and Releasing Mutual Exclusion Locks 0 20 40 60 Percentage Lock Overhead Water (512 Molecules) Original Bounded Aggressive 0 20 40 60 Percentage Lock Overhead String (Big Well Model) Original Aggressive

Contention Overhead Contention Percentage Percentage of Time that Processors Spend Waiting to Acquire Locks Held by Other Processors 100 0 25 50 75 0481216 Processors 0 25 50 75 100 0481216 Processors 0 25 50 75 100 0481216 Processors Original Bounded Aggressive Barnes-Hut (16K Particles) Water (512 Molecules) String (Big Well Model)

Performance Results: Barnes-Hut Ideal Aggressive Dynamic Feedback Bounded Original Barnes-Hut on DASH (16K Particles) 0 4 8 12 16 0481216 Number of Processors Speedup

Performance Results: Water Ideal Bounded Original Aggressive Dynamic Feedback Water on DASH (512 Molecules) 0 4 8 12 16 0481216 Number of Processors Speedup

Performance Results: String String on DASH (Big Well Model) Ideal Original Aggressive Dynamic Feedback 0 4 8 12 16 0481216 Number of Processors Speedup

Summary Code Size Is Not An Issue Lock Coarsening Has Significant Performance Impact Best Lock Coarsening Policy Varies With Application Dynamic Feedback Delivers Code With Performance Comparable to The Best Static Lock Coarsening Policy

Related Work Adaptive Execution Techniques (Saavedra Park:PACT96) Dynamic Dispatch Optimizations (Hölzle Ungar:PLDI94) Dynamic Code Generation (Engler:PLDI96) Profiling (Brewer:PPoPP95) Synchronization Optimizations (Plevyak et al:POPL95)

Conclusions Dynamic Feedback Generated Code Adapts to Different Execution Environments Integration with Parallelizing Compiler Irregular Object-Based Programs Pointer-Based Linked Data Structures Commutativity Analysis Evaluation with Three Complete Applications Performance Comparable to Best Hand-Tuned Optimization

BACKUP SLIDES

0 2 4 6 8 10 12 14 16 Speedup 0246810121416 Number of Processors Ideal Aggressive Bounded Original Barnes-Hut (16K Particles) Performance Results : Barnes-Hut

Performance Results: Water Ideal Aggressive Bounded Original 0 2 4 6 8 10 12 14 16 0246810121416 Speedup Number of Processors Water (512 Molecules)

Performance Results: String String (Big Well Model) Speedup Number of Processors 0 2 4 6 8 10 12 14 16 0246810121416 Ideal Original Aggressive

Policy Switch Timer Expires Policy 1 Policy 2 Timer Expires

Motivation Challenges: Match Best Implementation to Environment Heterogeneous and Mobile Systems Goal: Develop Mechanisms to Support Code that Adapts to Environment Characteristics Technique: Dynamic Feedback

Overhead for Barnes-Hut 0 0.1 0.2 0.3 0.4 0.5 0510152025 Sampled Overhead Execution Time (Seconds) Original Aggressive Bounded Barnes-Hut on DASH (8 Processors) FORCES Loop Data Set - 16K Particles

Overhead for Water Water on DASH (8 Processors) INTERF Loop Data Set - 512 Molecules 0 0.1 0.2 0.3 0.4 0.5 0102030405060 Sampled Overhead Execution Time (Seconds) Original Bounded

Overhead for Water Water on DASH (8 Processors) POTENG Loop Data Set - 512 Molecules 0 0.2 0.4 0.6 0.8 1 0102030405060 Sampled Overhead Execution Time (Seconds) Aggressive Original

Overhead for String String on DASH (8 Processors) PROJFWD Loop Data Set -Big Well 0 0.2 0.4 0.6 0.8 1 0100200300400500 Sampled Overhead Execution Time (Seconds) Aggressive Original

Dynamic Feedback AggressiveOriginalBounded Time Overhead Sampling PhaseProduction PhaseSampling Phase Aggressive Code Version

Dynamic Feedback: An Effective Technique for Adaptive Computing Pedro Diniz and Martin Rinard Department of Computer Science University of California,

Similar presentations

Presentation on theme: "Dynamic Feedback: An Effective Technique for Adaptive Computing Pedro Diniz and Martin Rinard Department of Computer Science University of California,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Dynamic Feedback: An Effective Technique for Adaptive Computing Pedro Diniz and Martin Rinard Department of Computer Science University of California,

Similar presentations

Presentation on theme: "Dynamic Feedback: An Effective Technique for Adaptive Computing Pedro Diniz and Martin Rinard Department of Computer Science University of California,"— Presentation transcript:

Similar presentations

About project

Feedback