Download presentation
Presentation is loading. Please wait.
Published byMark Hamilton Modified over 8 years ago
1
Effective Fine-Grain Synchronization For Automatically Parallelized Programs Using Optimistic Synchronization Primitives Martin Rinard University of California, Santa Barbara
2
Problem Efficiently Implementing Atomic Operations On Objects Key Issue Mutual Exclusion Locks Versus Optimistic Synchronization Primitives Context Parallelizing Compiler For Irregular Object-Based Programs Linked Data Structures Commutativity Analysis
3
Talk Outline Histogram Example Advantages and Limitations of Optimistic Synchronization Synchronization Selection Algorithm Experimental Results
4
Histogram Example class histogram { private: int counts[N]; public:void update(int i) { counts[i]++; } }; parallel for (i = 0; i < iterations; i++) { int c = f(i); h->update(c); } 3 7 4 1 2 0 5 8
5
Cloud Of Parallel Histogram Updates iteration 8 iteration 7 iteration 6 iteration 5 iteration 2 3 7 4 1 2 0 5 8 iteration 0 iteration 1 iteration 3 iteration 4 Updates Must Execute Atomically Histogram
6
One Lock Per Object class histogram { private: int counts[N]; lock mutex; public:void update(int i) { mutex.acquire(); counts[i]++; mutex.release(); } }; Problem: False Exclusion
7
One Lock Per Item class histogram { private: int counts[N]; lock mutex[N]; public:void update(int i) { mutex[i].acquire(); counts[i]++; mutex[i].release(); } }; Problem: Memory Consumption
8
Optimistic Synchronization 3 7 4 1 2 0 5 8 Histogram Load Old Value Compute New Value Into Local Storage Commit Fails Retry Update Commit Succeeds Write New Value Commit Point No Write Between Load and Commit Write Between Load and Commit
9
Parallel Updates With Optimistic Synchronization 3 7 4 1 2 0 5 8 Commit Succeeds Write New Value Load Old Value Compute New Value Into Local Storage Load Old Value Compute New Value Into Local Storage Commit Fails Retry Update
10
Optimistic Synchronization In Modern Processors Load Linked (LL) - Used To Load Old Value Store Conditional (SC) - Used To Commit New Value Atomic Increment Using Optimistic Synchronization Primitives retry:LL$2,0($4)# Load Old Value addiu$3,$2,1# Compute New Value Into # Local Storage SC$3,0($4)# Attempt To Store New Value beq$3,0,retry# Retry If Failure
11
Optimistically Synchronized Histogram class histogram { private: int counts[N]; public:void update(int i) { do { new_count = LL(counts[i]); new_count++ } while (!SC(new_count, counts[i])); } };
12
Aspects of Optimistic Synchronization Advantages Slightly More Efficient Than Locked Updates No Memory Overhead No Data Cache Overhead Potentially Fewer Memory Consistency Requirements Advantages In Other Contexts No Deadlock, No Priority Inversions, No Lock Convoys Limitations Existing Primitives Support Only Single Word Updates Each Update Must Be Synchronized Individually Lack of Fairness
13
Synchronization In Automatically Parallelized Programs Serial Program Unsynchronized Parallel Program Synchronized Parallel Program CommutativityAnalysis Synchronization Selection Assumption: Operations Execute Atomically Requirement: Correctly Synchronize Atomic Operations Goal: Choose An Efficient Synchronization Mechanism for Each Operation
14
Atomicity Issues In Generated Code Serial Program Synchronized Parallel Program CommutativityAnalysis Synchronization Selection Unsynchronized Parallel Program Assumption: Operations Execute Atomically Requirement: Correctly Synchronize Atomic Operations Goal: Choose An Efficient Synchronization Mechanism For Each Operation
15
Use Optimistic Synchronization Whenever Possible
16
Model Of Computation Objects With Instance Variables class histogram { private: int counts[N]; }; Operations Update Objects By Modifying Instance Variables void histogram::update(int i) { counts[i]++; } h->update(1) 4 2 5 4 2 5 4 3 5
17
Commutativity Analysis Compiler Computes Extent Of Computation Representation of All Operations in Computation In Example: { histogram::update } Do All Pairs Of Operations Commute? No - Generate Serial Code Yes - Automatically Generate Parallel Code In Example: h->update(i) and h->update(j) commute for all i, j
18
Synchronization Requirements Traditional Parallelizing Compilers Parallelize Loops With Independent Iterations Barrier Synchronization Commutativity Analysis Parallel Operations May Update Same Object For Generated Code To Execute Correctly, Operations Must Execute Atomically Code Generation Algorithm Must Insert Synchronization
19
Default Synchronization Algorithm class histogram { private: int counts[N]; lock mutex; One Lock Per Object public:void update(int i) { mutex.acquire(); counts[i]++; mutex.release(); } }; Operations Acquire and Release Lock
20
Synchronization Constraints Operation counts[i] = counts[i]+1; aaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaa temp = counts[i]; counts[i] = counts[j]; counts[j] = temp; Synchronization Constraint Can Use Optimistic Synchronization - Read/Compute/Write Update To A Single Instance Variable Must Use Lock Synchronization - Updates Involve Multiple Interdependent Instance Variables
21
Synchronization Selection Constraints Can Use Optimistic Synchronization Only For Single Word Updates That All Updates To Same Instance Variable Must Use Same Synchronization Mechanism Read An Instance Variable Compute A New Value That Depends On No Other Updated Instance Variable Write New Value Back Into Instance Variable
22
Synchronization Selection Algorithm Operates At Granularity Of Instance Variables Compiler Scans All Updates To Each Instance Variable If A Class Has A Lock Synchronized Variable, Class is Marked Lock Synchronized If All Updates Can Use Optimistic Synchronization, Instance Variable Is Marked Optimistically Synchronized If At Least One Update Must Use Lock Synchronization, Instance Variable Is Marked Lock Synchronized
23
Synchronization Selection In Example class histogram { private: int counts[N]; public:void update(int i) { counts[i]++; } }; Optimistically Synchronized Instance Variable histogram NOT Marked As Lock Synchronized Class
24
Code Generation Algorithm All Lock Synchronized Classes Augmented With Locks Operations That Update Lock Synchronized Variables Acquire and Release the Lock in the Object Operations That Update Optimistically Synchronized Variables Use Optimistic Synchronization Primitives
25
Optimistically Synchronized Histogram class histogram { private: int counts[N]; public:void update(int i) { do { new_count = LL(counts[i]); new_count++ } while (!SC(new_count, counts[i])); } };
26
Experimental Results
27
Methodology Implemented Parallelizing Compiler Implemented Synchronization Selection Algorithm Parallelized Three Complete Scientific Applications Barnes-Hut, String, Water Produced Four Versions Optimistic (All Updates Optimistically Synchronized) Item Lock (Produced By Hand) Object Lock Coarse Lock Used Inline Intrinsic Locks With Exponential Backoff Measured Performance On SGI Challenge XL
28
Time For One Update Time for One Cached Update On Challenge XL Time for One Uncached Update On Challenge XL 0 0.1 0.2 0.3 0.4 Update Time (microseconds) Locked Optimistic Unsynchronized 0 2 4 6 8 Update Time (microseconds) Locked Optimistic Unsynchronized Data And Lock On Different Cache Lines
29
Synchronization Frequency Barnes-Hut String Water Coarse Lock Object Lock Optimistic, Item Lock Object Lock Optimistic, Item Lock Coarse Lock Object Lock Optimistic, Item Lock 051015 Microseconds Per Synchronization 661 25
30
Memory Consumption For Barnes-Hut OptimisticItem LockObject LockCoarse Lock 0 10 20 30 40 50 Memory Consumption (MBytes) Total Memory Used To Store Objects
31
Memory Consumption For String Total Memory Used To Store Objects OptimisticItem LockObject Lock 0 1 2 3 4 5 Memory Consumption (MBytes)
32
Memory Consumption For Water Total Memory Used To Store Objects OptimisticItem LockObject LockCoarse Lock 0 0.5 1 1.5 Memory Consumption (MBytes)
33
0 8 16 24 081624 Speedup Processors 0 8 16 24 081624 Processors 0 8 16 24 081624 Processors 0 8 16 24 081624 Processors OptimisticItem LockObject LockCoarse Lock Speedups For Barnes-Hut
34
Speedups For String 0 8 16 24 081624 Processors 0 8 16 24 081624 Processors 0 8 16 24 081624 Speedup Processors OptimisticItem LockObject Lock
35
Speedups For Water 0 8 16 24 081624 Processors 0 8 16 24 081624 Processors 0 8 16 24 081624 Processors 0 8 16 24 081624 Speedup Processors OptimisticItem LockObject LockCoarse Lock
36
Acknowledgements Pedro Diniz Parallelizing Compiler Silicon Graphics Challenge XL Multiprocessor Rohit Chandra, T.K. Lakshman, Robert Kennedy, Alex Poulos Technical Assistance With SGI Hardware and Software
37
Bottom Line Optimistic Synchronization Offers No Memory Overhead No Data Cache Overhead Reasonably Small Execution Time Overhead Good Performance On All Applications Good Choice For Parallelizing Compiler Minimal Impact On Parallel Program Simple, Robust, Works Well In Range Of Situations Major Drawback Current Primitives Support Only Single Word Updates Use Optimistic Synchronization Whenever Applicable
38
Future The Efficient Implementation Of Atomic Operations On Objects Will Become A Crucial Issue For Mainstream Software Small-Scale Shared-Memory Multiprocessors Multithreaded Applications and Libraries Popularity of Object-Oriented Programming Specific Example: Java Standard Library Optimistic Synchronization Primitives Will Play An Important Role
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.