Download presentation
Presentation is loading. Please wait.
1
1 Leveraging the Core-Level Complementary Effects of PVT Variations to Reduce Timing Emergencies in Multi-Core Processors Guihai Yan 1, Xiaoyao Liang 2, Yinhe Han 1, and Xiaowei Li 1 1. Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS) 2. NVIDIA Corporation Jun. 23, 2010
2
2 Key Laboratory of Computer System and Architecture, ICT CAS Outline Introduction to PVT variations Analyzing “complementary effect” Timing domain Frequency domain Implementation challenges & solutions Experimental results
3
3 Key Laboratory of Computer System and Architecture, ICT CAS Introduction to variations Variation sources Process variation –Random dopant fluctuation –Sub-wave length lithography Voltage variation –Parasitic power delivery networks –Application variability –Inductive noise, IR-drop Temperature variation –Imbalanced activity –Hotspot We focus on the primary manifestation Performance variation
4
4 Key Laboratory of Computer System and Architecture, ICT CAS Process variation Sub-wavelength Lithography “What you get is not what you want” Systematic Random dopant fluctuations Vth variation Random Sub-wavelength lithography [Borkar, DAC’09] [Aitken, ATS’07] Max Freq. differentiate by 20% ! [Teodorescu, ISCA’08] P variation is time-independent, “DC component”
5
5 Key Laboratory of Computer System and Architecture, ICT CAS Temperature variation Application-specific Slow-varying Milliseconds Typical thermal constant: 2ms [Donald, ISCA’06] Measured PentiumM processor temperatures T variation is slow-varying, “Low-frequency components”
6
6 Key Laboratory of Computer System and Architecture, ICT CAS Voltage variation Fast-changing Inductive noise –a.k.a. L(di/dt) problem IR-drop Hierarchical PDN Why it is harder to keep a constant voltage level ? V variation is fast-changing, “High-frequency components” Example Power budget: 100W Working voltage: 1V Current: 100A To keep voltage fluctuation between ±5%, R PDN < 0.5 mOhm
7
7 Key Laboratory of Computer System and Architecture, ICT CAS Resultant impact of PVT variations Fast cores Slow coresViolent apps. High temp. Low temp.Mild apps. Timing (Delay) Variation
8
8 Key Laboratory of Computer System and Architecture, ICT CAS Prior solutions Strive to compensate P, V, and T variation individually Mitigate P variation –ReCycle[ISCA’06], Body Bias[Micro’07], ReVIVal[ISCA’08] et al. Stabilize V variation –Pipeline damping[ISCA’03], DeCoR[HPCA’08] et al. Balance T variation –Hotspot [ISCA’03], DVFS + Activity Migration[ISCA’03, HPCA’01, TODAES’07] et al. Other timing-oriented solutions Razor[JSSC’06], EVAL[Micro’08], Tribeca[Micro’09] et al.
9
9 Key Laboratory of Computer System and Architecture, ICT CAS Our perspective Focus on the essential Timing issue Delay variation Process variation Voltage variation Temp. variation Not Necessarily aggregated, but can cancel off each others in some cases. Hence, “Complementary” Design Goal: Minimize Delay variation Process VoltageTemp. Delay
10
10 Key Laboratory of Computer System and Architecture, ICT CAS Some terms Timing emergency (TE) Emergency level (EL) “Density” of TE Define: EL = # of TE per 100 millions cycles Violent vs. Mild Voltage –Large fluctuation = Violent –Small fluctuation = Mild Temperature –“Hot” = Violent –“Cool” = Mild Process –Slow corner = Violent –Fast corner = Mild Time Delay Timing Emergency Threshold Mild Violent Voltage Traces
11
11 Key Laboratory of Computer System and Architecture, ICT CAS How PVT Variations Complement each other ? Observation in time domain What if exchange the threads on Core1 and Core2? T. Mild, V. Mild Core1: Large margin, low EL T. Violent, V. Violent Core2: Little margin, High EL Time Delay Threshold Time Delay T Violent, V Violent T Mild, V Mild T Mild, V Violent T Violent, V Mild Emergency Excessive headroom Mild + Violent
12
12 Key Laboratory of Computer System and Architecture, ICT CAS Frequency domain analysis Y(f) = FFT(D(t)) Sample interval: 5ns Span of analysis: 1ms DC component: “P” Low freq. component: “T” High freq. component : “V”
13
13 Key Laboratory of Computer System and Architecture, ICT CAS The strength of each component of PVT variations Migrate threads = “ Graft” V component PT PT
14
14 Key Laboratory of Computer System and Architecture, ICT CAS Frequency domain analysis (cont.) Relative frequency spectrum deviations on 2GHz quad-core processor. P: 0-100Hz, T: 100Hz-1MHz, V: 1MHz-250MHz. Potential Core3 and Core4 are mild Strategy exchange threads on Core1 and Core4, Core2 and Core 3
15
15 Key Laboratory of Computer System and Architecture, ICT CAS How to exploit such “complementary effect”? Straightforward approach T componentP componentV component Product testVoltage sensorTemp. sensorAging sensor Xyz sensor Pros. Conceptually simple Cons. Slow: V. and T. sensor are slow Incomprehensive: e.g. what if aging ? Our approach: Delay sensor-based scheme Delay sensor V component(P+T) component Pros. Fast Comprehensive (Timing) Cons. Need a little trick
16
16 Key Laboratory of Computer System and Architecture, ICT CAS Implementation (cont.) What we have known Delay variation –Delay sensors What we need to know The strength of PT and V component How to bridge the gap? Three challenges Infer PVT component from delay Values On-the-fly thread migration decision-making On-the-fly variation prediction
17
17 Key Laboratory of Computer System and Architecture, ICT CAS Top view of architecture Timing Emergency Aware + Thread Migration TEA-TM
18
18 Key Laboratory of Computer System and Architecture, ICT CAS Infer PVT component from Delay Values Use mean delay to infer PT component ( < 1MHz ) This simplification greatly facilitates cost-efficient implementation of TEA-TM. Then, how about “V component”? Mean delay PT component
19
19 Key Laboratory of Computer System and Architecture, ICT CAS On-the-fly TEA-TM Decision Making Urgent First Policy (UFP) Do NOT directly rely on accurate V-component Basic idea: Migrate the threads running on the highest EL core to the core with the smallest PT component. —— Always right, but may not be optimum! EL = PT “+” V Core1Core2 Emergency Level PT Component TM Refer to our paper for the more sophisticated “DUFP” heuristic
20
20 Key Laboratory of Computer System and Architecture, ICT CAS On-the-fly Variation Prediction Objective: reducing the emergency level in the future Emergency Level PT component Linear prediction mechanism EL prediction result
21
21 Key Laboratory of Computer System and Architecture, ICT CAS Experiments Methodology Trace-based evaluation Modeled processor Quad-core Superscalar 2GHz PDN Similar to Intel Xeon 5500 quad-core microprocessor 130W (peak 150W) Workload
22
22 Key Laboratory of Computer System and Architecture, ICT CAS Metrics Relative throughput loss Relative Fairness Where,
23
23 Key Laboratory of Computer System and Architecture, ICT CAS Impact of TM interval on average EL reduction No migration overhead accounted 1ms at 2GHz, migration overhead is negligible 0.3 ms at 2GHz, migration overhead < 15% Perf. Overhead & EL Reduction Overall Throughput Minimal TM Interval Large Migration Penalty Large Emergency Rate When take migration penalty into account
24
24 Key Laboratory of Computer System and Architecture, ICT CAS Reduction in Relative Throughput Loss TM Interval: 0.2ms, Accuracy: 90% Developing more sophisticated heuristics
25
25 Key Laboratory of Computer System and Architecture, ICT CAS Fairness Improvement 80% fairness improvement
26
26 Key Laboratory of Computer System and Architecture, ICT CAS Conclusion Analyzing the complementary effect from both time and frequency domain Presenting a delay sensor-based scheme (TEA-TM) to exploit the comp. effect Simple, cost-efficient The experimental results show Improved throughput Improved fairness
27
27 Key Laboratory of Computer System and Architecture, ICT CAS
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.