Presentation is loading. Please wait.

Presentation is loading. Please wait.

University of Michigan Electrical Engineering and Computer Science 1 Dynamic Acceleration of Multithreaded Program Critical Paths in Near-Threshold Systems.

Similar presentations


Presentation on theme: "University of Michigan Electrical Engineering and Computer Science 1 Dynamic Acceleration of Multithreaded Program Critical Paths in Near-Threshold Systems."— Presentation transcript:

1 University of Michigan Electrical Engineering and Computer Science 1 Dynamic Acceleration of Multithreaded Program Critical Paths in Near-Threshold Systems Hyoun Kyu Cho and Scott Mahlke University of Michigan, Ann Arbor December 2, 2012

2 University of Michigan Electrical Engineering and Computer Science Critical Path 2 Longest path between source and sink in DAG

3 University of Michigan Electrical Engineering and Computer Science Critical Path 3 [Saidi`08] 3 2 5 3 5 2 9 3 14 2 2 5 2 3 10

4 University of Michigan Electrical Engineering and Computer Science Critical Path for Multithreaded Programs 4 Call Unlock StartLock EndLock (a) Mutex Lock T1 T2 Call ArBarrier (b) Barrier T1 T2 Call T3 ArBarrier LvBarrier [Hollingsworth`98]

5 University of Michigan Electrical Engineering and Computer Science Scalability of Multithreaded Programs 5 Some benchmarks does not scale very well!

6 University of Michigan Electrical Engineering and Computer Science CPU Time Wasted on Synchronizations 6 Synchronization is major bottleneck!

7 University of Michigan Electrical Engineering and Computer Science Arrival Time Variation 7

8 University of Michigan Electrical Engineering and Computer Science Accelerating Critical Path 8 ACS [Suleman et al. ASPLOS `09] –Critical sections Voltage Boosting [Dreslinski `11] –Transactional bottlenecks Booster [Miller et al. HPCA `12] –Alleviate performance variation –Reactive acceleration for barriers

9 University of Michigan Electrical Engineering and Computer Science Challenges and Opportunities of NTC 9 Poor single thread performance Very sensitive to process variation –Running at the slowest one leads to severe loss –Likely to have performance heterogeneity Potential for bigger frequency boosting

10 University of Michigan Electrical Engineering and Computer Science Objectives 10 Systematic way of identifying critical paths Dealing with performance variation Flexible control of core boosting

11 University of Michigan Electrical Engineering and Computer Science System Architecture 11 offlineonline Target Program Intermediate Representation Monitoring Logic Compilation Parallelism Analysis Instrumented Executable Monitor instrumentation Observe Adjust Priority Schedule Weighted Probabilistic Priority Scheduler

12 University of Michigan Electrical Engineering and Computer Science Lottery Scheduling 12 Each thread holds a number of tickets Scheduler select fast mode thread by picking a ticket Efficient implementation of proportional-share resource management Responsive, flexible control over relative execution rate [Waldspurger`94] 10 total = 20 random [0.. 19] = 15 2512 ∑ = 10 ∑ > 15? no ∑ = 12 ∑ > 15? no ∑ = 17 ∑ > 15? yes

13 University of Michigan Electrical Engineering and Computer Science Progress Monitoring 13 For data parallel threads Slower threads are more likely to be in critical path Divide task into multiple smaller chunks and instrument monitoring code Monitoring code reduce number of tickets

14 University of Michigan Electrical Engineering and Computer Science Example of Progress Monitoring 14 … pthread_barrier_wait(barrier); long PROGRESS_GRANULE = (k2 – k1) / NUM_STEPS; for ( i = k1 ; i < k2 ; i++ ) { float x_cost = dist(points->p[i],points->p[x],points->dim) * points->p[i].weight; float current_cost = points->p[i].cost; if ( x_cost < current_cost ) { switch_membership[i] = 1; cost_of_opening_x += x_cost – current_cost; } else { int assign = points->p[i].assign; lower[center_table[assign]] += current_cost – x_cost; } if ( (i – k1) % PROGRESS_GRANULE == 0 ) halve_priority_tickets(); } pthread_barrier_wait(barrier); … Loop Body

15 University of Michigan Electrical Engineering and Computer Science Priority Delegation 15 Thread holding a mutex is likely to be in critical path –Increase tickets when acquire mutex More likely to be in critical path if other threads are waiting –Temporarily transfer waiting thread’s ticket to the thread holding mutex

16 University of Michigan Electrical Engineering and Computer Science Performance Evaluation 16 Post processing traces –Generated on 32-core machine Four 8-core Intel Xeon X7560 24MB L3 cache per chip 32GB Total memory –Augmented progress time indication 1.5x, 2x, 5x, 10x acceleration for 1 fast mode core Varying scheduling quantum from 1us to 1ms

17 University of Michigan Electrical Engineering and Computer Science Speedup for Streamcluster 17 H/W OS User mode

18 University of Michigan Electrical Engineering and Computer Science Current Status 18 Target Program Intermediate Representation Monitoring Logic Compilation Parallelism Analysis Instrumented Executable Monitor instrumentation Observe Adjust Priority Schedule Weighted Probabilistic Priority Scheduler Normal Turbo Normal Turbo Cores

19 University of Michigan Electrical Engineering and Computer Science Conclusion & Future Work 19 Introduce S/W framework to improve multithreaded programs’ performance using core boosting Combines static analysis, dynamic monitoring, and probabilistic priority scheduling to predict critical paths Shows 5% ~ 27% performance improvement for streamcluster Better model the tradeoff between performance and energy Predicting critical paths on other type of parallelism

20 University of Michigan Electrical Engineering and Computer Science Thank you! 20


Download ppt "University of Michigan Electrical Engineering and Computer Science 1 Dynamic Acceleration of Multithreaded Program Critical Paths in Near-Threshold Systems."

Similar presentations


Ads by Google