Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Leveraging the Core-Level Complementary Effects of PVT Variations to Reduce Timing Emergencies in Multi-Core Processors Guihai Yan 1, Xiaoyao Liang 2,

Similar presentations


Presentation on theme: "1 Leveraging the Core-Level Complementary Effects of PVT Variations to Reduce Timing Emergencies in Multi-Core Processors Guihai Yan 1, Xiaoyao Liang 2,"— Presentation transcript:

1 1 Leveraging the Core-Level Complementary Effects of PVT Variations to Reduce Timing Emergencies in Multi-Core Processors Guihai Yan 1, Xiaoyao Liang 2, Yinhe Han 1, and Xiaowei Li 1 1. Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS) 2. NVIDIA Corporation Jun. 23, 2010

2 2 Key Laboratory of Computer System and Architecture, ICT CAS Outline  Introduction to PVT variations  Analyzing “complementary effect” Timing domain Frequency domain  Implementation challenges & solutions  Experimental results

3 3 Key Laboratory of Computer System and Architecture, ICT CAS Introduction to variations  Variation sources Process variation –Random dopant fluctuation –Sub-wave length lithography Voltage variation –Parasitic power delivery networks –Application variability –Inductive noise, IR-drop Temperature variation –Imbalanced activity –Hotspot  We focus on the primary manifestation Performance variation

4 4 Key Laboratory of Computer System and Architecture, ICT CAS Process variation  Sub-wavelength Lithography “What you get is not what you want” Systematic  Random dopant fluctuations Vth variation Random Sub-wavelength lithography [Borkar, DAC’09] [Aitken, ATS’07] Max Freq. differentiate by 20% ! [Teodorescu, ISCA’08] P variation is time-independent, “DC component”

5 5 Key Laboratory of Computer System and Architecture, ICT CAS Temperature variation  Application-specific  Slow-varying Milliseconds Typical thermal constant: 2ms [Donald, ISCA’06] Measured PentiumM processor temperatures T variation is slow-varying, “Low-frequency components”

6 6 Key Laboratory of Computer System and Architecture, ICT CAS Voltage variation  Fast-changing Inductive noise –a.k.a. L(di/dt) problem IR-drop Hierarchical PDN Why it is harder to keep a constant voltage level ? V variation is fast-changing, “High-frequency components” Example Power budget: 100W Working voltage: 1V Current: 100A To keep voltage fluctuation between ±5%, R PDN < 0.5 mOhm

7 7 Key Laboratory of Computer System and Architecture, ICT CAS Resultant impact of PVT variations Fast cores Slow coresViolent apps. High temp. Low temp.Mild apps.  Timing (Delay) Variation

8 8 Key Laboratory of Computer System and Architecture, ICT CAS Prior solutions  Strive to compensate P, V, and T variation individually Mitigate P variation –ReCycle[ISCA’06], Body Bias[Micro’07], ReVIVal[ISCA’08] et al. Stabilize V variation –Pipeline damping[ISCA’03], DeCoR[HPCA’08] et al. Balance T variation –Hotspot [ISCA’03], DVFS + Activity Migration[ISCA’03, HPCA’01, TODAES’07] et al.  Other timing-oriented solutions Razor[JSSC’06], EVAL[Micro’08], Tribeca[Micro’09] et al.

9 9 Key Laboratory of Computer System and Architecture, ICT CAS Our perspective  Focus on the essential Timing issue Delay variation Process variation Voltage variation Temp. variation Not Necessarily aggregated, but can cancel off each others in some cases. Hence, “Complementary” Design Goal: Minimize Delay variation Process VoltageTemp. Delay

10 10 Key Laboratory of Computer System and Architecture, ICT CAS Some terms  Timing emergency (TE)  Emergency level (EL) “Density” of TE Define: EL = # of TE per 100 millions cycles  Violent vs. Mild Voltage –Large fluctuation = Violent –Small fluctuation = Mild Temperature –“Hot” = Violent –“Cool” = Mild Process –Slow corner = Violent –Fast corner = Mild Time Delay Timing Emergency Threshold Mild Violent Voltage Traces

11 11 Key Laboratory of Computer System and Architecture, ICT CAS How PVT Variations Complement each other ?  Observation in time domain What if exchange the threads on Core1 and Core2? T. Mild, V. Mild Core1: Large margin, low EL T. Violent, V. Violent Core2: Little margin, High EL  Time Delay Threshold Time Delay T Violent, V Violent T Mild, V Mild T Mild, V Violent T Violent, V Mild Emergency Excessive headroom Mild + Violent

12 12 Key Laboratory of Computer System and Architecture, ICT CAS Frequency domain analysis  Y(f) = FFT(D(t))  Sample interval: 5ns  Span of analysis: 1ms DC component: “P” Low freq. component: “T” High freq. component : “V”

13 13 Key Laboratory of Computer System and Architecture, ICT CAS The strength of each component of PVT variations Migrate threads = “ Graft” V component PT PT

14 14 Key Laboratory of Computer System and Architecture, ICT CAS Frequency domain analysis (cont.)  Relative frequency spectrum deviations on 2GHz quad-core processor. P: 0-100Hz, T: 100Hz-1MHz, V: 1MHz-250MHz.  Potential Core3 and Core4 are mild  Strategy exchange threads on Core1 and Core4, Core2 and Core 3

15 15 Key Laboratory of Computer System and Architecture, ICT CAS How to exploit such “complementary effect”?  Straightforward approach T componentP componentV component Product testVoltage sensorTemp. sensorAging sensor Xyz sensor Pros. Conceptually simple Cons. Slow: V. and T. sensor are slow Incomprehensive: e.g. what if aging ?  Our approach: Delay sensor-based scheme Delay sensor V component(P+T) component Pros. Fast Comprehensive (Timing) Cons. Need a little trick

16 16 Key Laboratory of Computer System and Architecture, ICT CAS Implementation (cont.)  What we have known Delay variation –Delay sensors  What we need to know The strength of PT and V component How to bridge the gap?  Three challenges Infer PVT component from delay Values On-the-fly thread migration decision-making On-the-fly variation prediction

17 17 Key Laboratory of Computer System and Architecture, ICT CAS Top view of architecture Timing Emergency Aware + Thread Migration TEA-TM

18 18 Key Laboratory of Computer System and Architecture, ICT CAS Infer PVT component from Delay Values  Use mean delay to infer PT component ( < 1MHz ) This simplification greatly facilitates cost-efficient implementation of TEA-TM. Then, how about “V component”? Mean delay PT component

19 19 Key Laboratory of Computer System and Architecture, ICT CAS On-the-fly TEA-TM Decision Making  Urgent First Policy (UFP) Do NOT directly rely on accurate V-component  Basic idea: Migrate the threads running on the highest EL core to the core with the smallest PT component. —— Always right, but may not be optimum! EL = PT “+” V Core1Core2 Emergency Level PT Component TM Refer to our paper for the more sophisticated “DUFP” heuristic

20 20 Key Laboratory of Computer System and Architecture, ICT CAS On-the-fly Variation Prediction  Objective: reducing the emergency level in the future Emergency Level PT component Linear prediction mechanism EL prediction result

21 21 Key Laboratory of Computer System and Architecture, ICT CAS Experiments  Methodology Trace-based evaluation  Modeled processor Quad-core Superscalar 2GHz  PDN Similar to Intel Xeon 5500 quad-core microprocessor 130W (peak 150W)  Workload

22 22 Key Laboratory of Computer System and Architecture, ICT CAS Metrics  Relative throughput loss  Relative Fairness Where,

23 23 Key Laboratory of Computer System and Architecture, ICT CAS Impact of TM interval on average EL reduction  No migration overhead accounted  1ms at 2GHz, migration overhead is negligible  0.3 ms at 2GHz, migration overhead < 15%   Perf. Overhead & EL Reduction Overall Throughput Minimal TM Interval Large Migration Penalty Large Emergency Rate When take migration penalty into account

24 24 Key Laboratory of Computer System and Architecture, ICT CAS Reduction in Relative Throughput Loss  TM Interval: 0.2ms, Accuracy: 90%  Developing more sophisticated heuristics

25 25 Key Laboratory of Computer System and Architecture, ICT CAS Fairness Improvement  80% fairness improvement

26 26 Key Laboratory of Computer System and Architecture, ICT CAS Conclusion  Analyzing the complementary effect from both time and frequency domain  Presenting a delay sensor-based scheme (TEA-TM) to exploit the comp. effect Simple, cost-efficient  The experimental results show Improved throughput Improved fairness

27 27 Key Laboratory of Computer System and Architecture, ICT CAS


Download ppt "1 Leveraging the Core-Level Complementary Effects of PVT Variations to Reduce Timing Emergencies in Multi-Core Processors Guihai Yan 1, Xiaoyao Liang 2,"

Similar presentations


Ads by Google