Presentation is loading. Please wait.

Presentation is loading. Please wait.

Automatic Detection of Extended Data-Race-Free Regions

Similar presentations


Presentation on theme: "Automatic Detection of Extended Data-Race-Free Regions"— Presentation transcript:

1 Automatic Detection of Extended Data-Race-Free Regions
Present self + team Previous decades: Time is money  Performance Current decade: Power is money  Energy This work improves energy efficiency without damaging performance. Sweden Alexandra Jimborean Jonatan Waern, Per Ekemark, Stefanos Kaxiras, Alberto Ros (Murcia, Spain)

2 Software-hardware co-designs Compiler-assisted cache coherence
Motivation and goal Multi- and many-core processors and heterogeneous architectures pose challenges on scalable cache coherence Simplify or even abandon hardware cache coherence Goal Software-hardware co-designs Compiler-assisted cache coherence

3 Traditional cache coherence
Cache coherence ensures correct propagation of changes of data held in multiple private caches. Problem: Not all updates to variables have to be propagated. Unnecessary propagation of updates uses resources in vain. Background Private data does not require coherence! Solution: Design a dual-mode cache coherence protocol and enable coherence on-demand.

4 Private-shared data Motivation Runtime classification
Static classification Motivation Compiler identifies most accesses to target temporarily private data! Identify the largest period during which data remains private: extended data race-free regions

5 Agenda Outline xDRF regions of private accesses
Examples Compile-time xDRF analysis xDRF regions for optimizing coherence protocols Results: performance, energy on many-cores Outline

6 Data race free code as input
Data-race free code, no data races given synchronization LOCK Regions UNLOCK

7 Synchronization points
Data-race free code, no data races given synchronization Synchronizations divide code in two categories LOCK Regions UNLOCK

8 Parallel DRF regions do not share data.
DRF and nDRF regions Data-race free code, no data races given synchronization Synchronizations divide code in two categories DRF regions outside synchronized code nDRF regions, i.e. synchronized code LOCK Regions UNLOCK Disclaimer: DRF/nDRF our notations Parallel DRF regions do not share data.

9 Happens-before sync are xDRF boundaries.
Data-flow across a synchronization point  sync enforces a happens-before. Regions Happens-before sync are xDRF boundaries.

10 xDRF regions extend across enclave nDRFs.
Lock-set sync No data-flow across a synchronization point  sync ensures atomicity nDRF is enclave Regions xDRF regions extend across enclave nDRFs.

11 Lock-set sync Regions No data-flow across a synchronization point 
sync ensures atomicity nDRF is enclave Regions Extend the synchronization-free semantics across enclave synchronization points.

12 Sync is an xDRF boundary  Different xDRF regions.
Data-flow across sync xDRF Conflict Conflict xDRF extends across sync. Sync is an xDRF boundary  Different xDRF regions.

13 Data-flow across sync Signal-wait with locks, breaks xDRF. xDRF

14 No data-flow across sync
Signal-wait with locks, no data-flow across sync, enclave in xDRF. xDRF

15 Data-flow across transitive sync
xDRF Transitive synchronization. Conflict

16 xDRF static analysis xDRF analysis
Distinguish between nDRF regions which represent xDRF boundaries and nDRF regions enclave in xDRF regions. xDRF analysis

17 Check the sync variable of each nDRF.
Identify nDRF regions Identify synchronization points (nDRFs) 1. nDRF Regions Check the sync variable of each nDRF.

18 Matching nDRFs sync on the same synchronization variable.
Matching nDRF regions Identify nDRFs that sync one with another. 1. nDRF Regions Matching nDRFs sync on the same synchronization variable.

19 DRF paths 2. DRF regions nDRF DRF-before = {1,2} DRF-after = {3}
Not yet processed or previously identified as xDRF boundaries. nDRF DRF-before = {1,2} DRF-after = {3} DRF before Currently analyzed nDRF 2. DRF regions DRF after nDRF DRF region before = Union of DRF paths before DRF region after = Union of DRF paths after

20 Currently analyzed nDRF
DRF regions DRF before DRF-before = {1,2,3} DRF-after = {4,5} nDRF nDRF Currently analyzed nDRF 2. DRF regions DRF after nDRF nDRF

21 DRF regions 2. DRF regions DRF-before = {1,2,3} DRF-after = {1,3} nDRF
Currently analyzed nDRF DRF after nDRF More details in the paper.

22 Merging DRF regions 2. DRF regions nDRF DRF-before = {1,2}
DRF-after = {3} DRF before 1. No conflict between DRF-before and DRF-after 2. DRF regions DRF after 2. No conflict between DRFs and current nDRF nDRF

23 Merging DRF regions 3. xDRF regions nDRF DRF-before = {1,2}
DRF-after = {3} DRF before 1. No conflict between DRF-before and DRF-after xDRF region Enclave 3. xDRF regions DRF after 2. No conflict between DRFs and current nDRF nDRF Current nDRF is enclave. Merge DRF-before and DRF-after in one xDRF.

24 xDRF regions 3. xDRF Regions xDRF-before = {1,2} xDRF-after = {3}
Not yet processed or previously identified as non-enclave. xDRF-before = {1,2} xDRF-after = {3} xDRF before Enclave 3. xDRF Regions Currently analyzed nDRF xDRF after If no conflict, merge xDRF-before and xDRF-after in one xDRF.

25 Transitive xDRF regions
xDRF-before = {1,2,3,4} xDRF-after = {5} Matching enclave nDRFs 3. xDRF regions More details in the paper.

26 xDRF in practice Evaluation Number of static nDRF
70% non-enclave (inherent to the applications) 20% enclave (automatically detected) 10% potentially enclave (oracle) Number of executed xDRF regions Enclave nDRFs are on the hot path Compiler approaches oracle Evaluation

27 How are xDRF regions useful
Compiler-assisted cache coherence protocol Deactivate coherence for xDRF accesses (temporarily private data) xDRF - cogerence

28 Traditional cache coherence
3xN coherence actions xDRF - coherence Traditional c.c.

29 Optimized cache coherence
3xN coherence actions N+1 coherence actions xDRF - coherence Traditional c.c. Optimized c.c.

30 xDRF cache coherence xDRF - coherence 3xN coherence actions N+1
Traditional c.c. Optimized c.c. xDRF c.c.

31 Performance – coherence prot.
-4% DRF only 7% Compiler 8% Oracle Evaluation Normalized to a traditional protocol Compiler competitive with oracle

32 Energy savings – coherence protocol
-10% DRF only 12% Compiler 16% Oracle Evaluation Compiler competitive with oracle

33 Conclusions Conclusions Compiler techniques to detect xDRF regions
xDRF regions enable cache coherence optimizations Performance 7%, energy savings 12% Conclusions

34 Automatic Detection of Extended Data-Race-Free Regions
Present self + team Previous decades: Time is money  Performance Current decade: Power is money  Energy This work improves energy efficiency without damaging performance. Sweden Thank you! Alexandra Jimborean Jonatan Waern, Per Ekemark , Stefanos Kaxiras, Alberto Ros (Murcia, Spain) Travelling to this conference was financed by the Swedish Wenner-Gren foundation

35 Enable compiler optimizations over
Future work Enable compiler optimizations over xDRF regions Typical compiler optimizations for multi-threaded code operate within synchronization free regions xDRF extends the scope of the optimizations making them more effective Future work

36 Future work Future work
Instead of splitting xDRF regions upon a conflict, place nDRF markings around the conflicting accesses xDRF limits: barriers, signal-waits, joins Any shared (conflicting) accesses are handled as nDRF  Larger xDRF regions Future work


Download ppt "Automatic Detection of Extended Data-Race-Free Regions"

Similar presentations


Ads by Google