Download presentation
Presentation is loading. Please wait.
Published byReginald Maxwell Modified over 9 years ago
1
11 1 Process Variation in Near-threshold Wide SIMD Architectures Sangwon Seo 1, Ronald G. Dreslinski 1, Mark Woh 1, Yongjun Park 1, Chaitali Chakrabarti 2, Scott Mahlke 1, David Blaauw 1, Trevor Mudge 1 University of Michigan 1, Arizona State University 2
2
22 2 Near Threshold Computing Super Threshold high performance high energy consumption Near Threshold 10x energy reduction 10x performance degradation Sub Threshold exponentially decreasing performance increasing leakage becomes dominant 2
3
33 3 Near-threshold Computing Advantage: High energy efficiency Disadvantage Low performance throughput Compensated with very wide SIMD architecture Sensitive to variations in threshold voltage More critical issues in wide SIMD architectures Increased probability of timing errors Expensive error recovery mechanisms 3
4
44 4 Near-threshold Computing Advantage: High energy efficiency Disadvantage Low performance throughput Compensated with very wide SIMD architecture Sensitive to variations in threshold voltage More critical issues in wide SIMD architectures Increased probability of timing errors Expensive error recovery mechanisms How bad is the delay variation in wide SIMD architectures running at near-threshold voltages? How to mitigate the variation-induced timing errors? 4
5
55 5 Delay Variations in 90nm 5 ~ 2.3x ~1.6x Uncorrelated variations are averaged out over the chain.
6
66 6 Delay Variations – f(Vdd=0.55V, N) 6 A long chain helps, but the effect diminishes as N increases. Variations are exacerbated with technology scaling.
7
77 7 Delay Variations – f(Vdd, N=50) 7 LER causes high variations in advanced technology nodes Strict Design Rules Metal-Gates w/ high-k material or SOI Advanced lithography
8
88 8 Delay Distribution – 90nm GP 8 1 critical path delay = delay of a chain of 50 FO4 inverters. 1-wide system delay = max (delays of 100 critical paths ) 128-wide system delay = max (delays of 128 1-wide system) Performance Drop
9
99 9 Variation Effects on 128-wide SIMD Architecture 9 - Structural Duplication - Voltage margining - Frequency margining
10
10 Near-threshold Wide SIMD Architecture: Diet SODA 10 [Seo et al. ISLPED 2010 ]
11
11 Structural Duplication 11 SIMD Function Unit #7 SIMD Function Unit #6 SIMD Function Unit #5 SIMD Function Unit #4 SIMD Function Unit #3 SIMD Function Unit #2 SIMD Function Unit #1 SIMD Function Unit #0 SIMD Function Unit #9 SIMD Function Unit #8 Crossbar Datapath#7 Datapath#6 Datapath#5 Datapath#4 Datapath#3 Datapath#2 Datapath#1 Datapath#0 8-wide+2-spare system Increase number of processing resources
12
12 Structural Duplication 12 SIMD Function Unit #7 SIMD Function Unit #6 SIMD Function Unit #5 SIMD Function Unit #4 SIMD Function Unit #3 SIMD Function Unit #2 SIMD Function Unit #1 SIMD Function Unit #0 SIMD Function Unit #9 SIMD Function Unit #8 Crossbar Datapath#6 Datapath#5 Datapath#4 Datapath#3 Datapath#2 Datapath#1 Datapath#0 8-wide+2-spare system Use the spares if required.
13
13 Structural Duplication – 90nm GP 13 6 spares are required to match the chip delay of baseline architecture.
14
14 Voltage Margining 14 Delay distributions: 45nm PTM model is used Increase supply voltage
15
15 Frequency Margining Increase clock period Applicable for applications with relaxed time constraints For advanced technology nodes, this is impractical Caveat Consider its impact on system SIMD subsystem clock period (Tclk@NTV) memory subsystem clock period (Tclk@FV) 15
16
16 Structural Duplication vs. Voltage Margining 16
17
17 Combination of two schemes – 45nm GP 17 128-wide system @ 0.6V 26 spares17mV boost5mV + 8 spares10mV + 2 spares
18
18 Variation-Aware Diet SODA 18
19
19 Conclusions Near-threshold operation of wide SIMD system can have timing problems due to process variations. Variation effects on a 128-wide SIMD architecture are marginal for 90nm technology node, but could be non- negligible for current/future technology nodes. A combination of structural duplication and voltage margining provides a minimal power overhead solution to mitigate variation-induced timing problems in wide SIMD architectures. 19
20
20 Questions? Thank you! 20
21
21 Backup Slides 21
22
22 Local Spares vs. Global Spares 22 Local Sparing 1 out of 4 (2 spares) Global Sparing (2 spares) + small overhead - burst errors + burst errors - Large overhead
23
23 Local Spares vs. Global Spares 23 Global sparing is better than local sparing. XRAM crossbar supports global sparing. 128 + 8 global spares 128 + 32 local spares (1 out of 4)
24
24 Variation-Aware Diet SODA 24 With little area and power overhead, delay variations can be solved.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.