Download presentation
Presentation is loading. Please wait.
Published byDora Hubbard Modified over 9 years ago
1
By: Jabulani Nyathi Washington State University School of EECS April 30, 2009 Circuits and Architectures to Deliver Low Power and High Speed Systems.
2
Outline CMOS Scaling Its benefits and The challenges it brings about Various Techniques for Limiting Leakage Currents Their shortfalls Bridging the speed-Power Gap The Tunable Body Biasing Scheme Emerging Devices and Technologies Concluding Remarks
3
CMOS Scaling and its Benefits Aggressive CMOS scaling has been a very positive development allowing: Fast switching devices, thus high speed computing. Massive integration due to miniaturization No longer do we need multiple chips to implement a microprocessor and its peripherals In fact, we can now have multiple computing elements on a single die resulting in system on a chip.
4
CMOS Scaling and its Challenges CMOS scaling results in: increased leakage currents (5X/node) and Increased dynamic power dissipation. The interconnect does not scale as fast as the transistor thus Highly integrated designs require elaborate clock distribution schemes. IPs within a System on a Chip would be difficult to synchronize with a single clock source.
5
Scaling Implications Module 1 Local Interconnects Module 2 Global Interconnects Scaled
6
Dynamic Vs Leakage Power
7
Research Motivation Desire to Bridge the Speed-Power Gap by Exploring the feasibility of optimizing devices to operate effectively in both sub-threshold and above threshold voltages. Emerging Technologies that are Ultra-Low power can benefit from increased speed. Wearable computers, sensor networks, implantable medical technology Emphasis on design for energy-efficiency
8
Existing Low Power Design Approaches Solve energy dissipation problem from a region of operation standpoint Sub-threshold design DTMOS: shows a 5.5 times increase in current Dynamic threshold provides energy efficiency SBB: 4.4 times frequency increase Above threshold (Super-threshold) design MTCMOS: high and low threshold devices VT Scheme: reduce power by 50% using ABB and “sleep”/“active” modes Architectural Gating Techniques: 45% of total power
9
DTMOS/SBB Output Voltage Clamping Traditional SBB, DTMOS, TBB 600 mV 1.8 V
10
Proposed Approach Change approach to include all possible operating regions: Tunable Body Biasing (TBB) Sub-threshold and super-threshold operation bridged Ultra-low energy and low speed or high energy and high speed Utilize body biasing to improve performance of sub-threshold operation Target increased performance at sub-threshold and slightly above threshold. Save energy by eliminating idle time and process continuously with variable power supplies (perform just in time task completion) Target applications Mobile, battery operated (power constrained), variable processing devices Cell phones, PDAs, notebooks, wireless sensors, embedded systems, ASICs, medical technology, etc.
11
TBB Implementation Goals Attain ON state current gain while minimizing OFF state leakage current increase Highlight advantages of sub-threshold operation while allowing super-threshold operation if needed Control bulk terminal to tunable potentials depending on V DD and desired region of operation MOS Bulk Control Circuits Multiplexer-based approach Two transistors per bulk control circuit Utilizes V thn0
12
TBB Bulk Control Circuits Relies on passing of good/poor logic “1” and logic “0” properties of pass-transistors Requires external control signals SubVt and SubVt_b V DD TBB MOS Bulk Control Signal pMOS BulknMOS Bulk V SS <V DD ≤V thn0 V SS V DD V DD > V thn0 V DD – V thn0 V thn0
13
TBB Bulk Control Circuit Simulation Sub-threshold: pBulk = 0 V Super-threshold: pBulk = V DD – V thn0
14
Device Optimization TBB encourages varying supply voltages How will devices be sized for optimal operation at any supply voltage? Maintain symmetric switching Examine inverter at varying supply voltages
15
Device Optimization (Switching Point) V DD Ideal Inverter Threshold Simulated Inverter Threshold Percent Variation 1.8 V900 mV 0.0% 1.0 V500 mV498 mV0.4% 376.2 mV188.1 mV198.7 mV5.6% 188.1 mV94.05 mV108.6 mV13.4%
16
Sub-threshold Noise Margins Noise Margins significant for proper logic levels TBB and Traditional static CMOS inverter have comparable noise margins TBB V IH is 12.5% worse TBB V IL is 14.3% better
17
Propagation Delay Gate Traditional Delay TBB Delay % Decrease TG 98 ns 14 ns 86 Inv 125 ns20 ns 84 NAND 133 ns18 ns 86 NOR 163 ns25 ns 85 XOR 289 ns40 ns 89
18
Review of SubVth Circuits Benefits So far, the presentation has shown: TBB requires control of MOS bulks to span the operating regions of interest. Implementation is successful. Study of simple logic gates showed: TBB gives a dramatic speed increase (up to 7x) Static CMOS design style is suitable for sub-threshold and super- threshold operation Sizing of efficient devices for the TBB approach is possible However, how will a complex system perform? Design with previous knowledge (logic style, sizing) Analyze post-layout simulations
19
Complex System-on-Chip Design Using TBB Work addresses the challenges of Global Interconnect Delays Clock distribution Synchronization of unrelated clocks and Power dissipation
20
Conclusion TBB scheme has been devised to span all regions of operation from ultra-low power to high-speed. New kind of body biasing Forward-biasing causes exponential sub-threshold current gain Leads to 7 times frequency increase in simple logic gates Focus on sub-threshold and slightly above threshold to utilize leakage Bulk control circuits are effective 4% area and 8.9% power dissipation increase Static CMOS is ideal overall design style Device sizing at either sub-threshold or super-threshold allows efficient operation with variable supply voltages
21
Concluding Remarks Allowing tunable operation allows the designer to choose operating point (kHz, MHz, GHz) – Energy Dissipation is affected. Other schemes do not offer this flexibility TBB can lead to significant energy savings LFSR results show TBB gives: Maximal 5.7 times speed increase (sub-threshold) Comparable energy at super-threshold and favorable at sub- threshold Favorable EDP at all operating regions Operate at the same speed with less energy dissipation Idle state leakage current can be minimized by collapsing the supply voltage
22
ROUTER CHIP Integrating Research Into Instruction Data Path Circuits Memory Design Sub-System
23
Incorporating Research into Instruction A long term objective is to place some of the integrated chips on development boards such as those Digilent Inc produces. The integrated chips become part of a system and can be used in some of our low level courses. Most important is the use of these programmable boards to show case the research outcomes, particularly to visiting prospective students. A sample development board:
24
Questions and Comments Welcome!
25
Multiple Clock Domain Synchronization
26
Reducing Interconnect Delays Improved latency and bandwidth Global interconnects are pipelined at or near the rate of computation
27
Sources of Power Consumption Most straight forward method to reduce power consumption from any source is to reduce V DD Controlling frequency directly manipulates dynamic power Controlling device threshold manipulates leakage current, affecting leakage and short circuit power.
28
Distributed FIFO Control Circuitry
29
Traditional Body BiasingTunable Body Biasing Tunable BB % diff VddLocalClock2currentLocalClock2current V delay (ps) freq (GHz) uA delay (ps) freq (GHz) uAfreqcurrent 1111.293100103.19.729887.8-3.6 0.7172.555.81240177.75.61042-3.4-16 0.351354.50.73837114380.695472.9-5.8-2.7 0.2967000.01032.81166400.06015.05148379.8 Traditional vs. Tunable Body Biasing The synchronizer/buffer shows an increase in performance at sub-threshold voltages when using tunable body biasing
30
Tunable Body Biasing Current (uA)Power (uW) Vdd (V) Max Freq (GHz)PeakAvgIdlePeakAvgIdle Traditional Body Biasing 14559723828.696559723828.696 0.722222803.44.8731555.4562.383.411 0.350.125131.135.581.46845.88512.4530.514 0.20.017.4522.8951.3491.490.5790.27 Tunable Body Biasing 14514024609.54514024609.54 0.7220508334.4231435583.13.096 0.350.16713239.81.58946.213.930.556 0.20.0159.4684.031.2391.8940.8060.248
31
Pursuit of Low Power Operation It is likely that not all IP blocks in a SoC need to operate at high speed Power dissipation for those IP blocks could be reduced by operating at a lower voltage TBB offers the possibility to dynamically operate at either sub-threshold or super- threshold voltages
32
Variable Voltage SoC Consider a SoC with 50 IP blocks, each requiring communication at a rate of 10 MHz Each IP could operate at sub- threshold levels The channel could operate at super-threshold voltages while the IP blocks are in sub-threshold Vdd1 Vdd2 Vdd3 Vdd4 Vdd5
33
Idle vs Operating Power IdleOperating Vdd (V) Current (uA)Power (uW) Current (uA) Power (uW) 116.9 2988 0.75.33.711042729.4 0.351.50.52572.925.52 0.20.9250.1855.0511.01 During idle periods, it is advantageous to reduce leakage current by Reducing the power supply voltage or Increasing the threshold voltage (e.g. bulk voltage manipulation)
34
Speed at Varying V DD TBB 5.7x Faster At 376.2 mV TBB 20% Faster At 1.8 V
35
Energy-delay Product EDP of TBB outperforms Traditional at ALL operating regions, significantly in super-threshold
36
Regions of Operation 3.9 MHz with 0.6 fJ/cycle 222.2 MHz with 103 fJ/cycle 1.1 GHz with 3.85 nJ/cycle
37
Contributions of this work Proposed scheme alleviates the communication bottleneck and offers a way to synchronize SoC multiple clocks Perform data transfers up to 10 GHz Proposed scheme maintains high performance under the influence of any clock skew 6.5 GHz for any process corner and any skew Low power FIFO scheme with a small impact on area when used in SoCs with many modules
38
Contributions of this work Process corners have a minor impact on performance, resulting in a 10% reduction of speed The optimal voltage for minimum energy consumption per transaction is at 2V th Introduction of TBB to address leakage and dynamic power dissipation 500% increase in performance at sub-threshold voltages with a modest 80% increase in power 5-10% less power dissipation than traditional body biasing
39
Summary of Proposed FIFO Scheme Linear FIFO scheme that addresses Signal propagation across communication channel Sustained throughput over long distances Successful Synchronization Synchronizes equal, rational & arbitrary clocks 6.5 GHz sustained performance after process corner analysis using 3 stages. Compared to CN scheme Fewer devices per stage, fewer stages needed 25% higher performance, 12% lower power Operates at both super- and sub-threshold voltages Lower instantaneous power demands from local clocks (less di/dt) Optimal energy per transaction at 0.7V in a 65nm process Sub-threshold reduces power by 3 orders of magnitude Tunable Body Biasing provides 50% increased performance in sub-threshold while maintaining super-threshold operation
40
TBB Scalability Technology 180 nm90 nm Body Biasing and Operating Region Total Average Power Dissipation Static Power Contribution [%] Total Average Power Dissipation Static Power Contribution [%] Traditional in Sub-threshold 193 pW0.1%13.1 nW1.8% Traditional in Super-threshold 39.6 μWNegligible22.1 μWnegligible TBB in Sub-threshold 1430 pW25.2%20.4 nW6.1% TBB in Super-threshold 39.4 μW0.000034%22.1 μW0.0025% At 180 nm, TBB sub-threshold static power % is large At 90 nm, the % difference is much less Total TBB sub-threshold power is large Total TBB sub-threshold power isn’t so large
41
LFSR Energy vs. Frequency
42
TBB Implementation Cont.
44
Logic Gate Analysis (Power)
45
Inverter Power Dissipation V DD Power Dissipation [fW] Average Power [nW] Maximum Frequency [MHz] Period [ns] 0.32628.273.50.4162400.0 0.426211.4130.02.6380.0 0.564315.64651.641.724.0 1.882.3068.60833.31.2 V DD Power Dissipation [fW] Average Power [nW] Maximum Frequency [MHz] Period [ns] 0.32628.5222.42.6380.0 0.426213.00259.820.50.0 0.564315.132102.0138.97.2 1.881.4781.51000.1.0
46
Logic Gate Analysis (Energy)
47
Logic Gate Analysis (EDP)
48
Logic Gate Analysis (Fan-in)
49
Logic Gate Analysis (Logic Styles)
50
LFSR Power Dissipation
51
Device Optimization (Optimal Region)
52
Regions of Operation Design Super-threshold (1.8 V) Sub-threshold (250 mV) Optimal (750 mV) Delay (ns)Energy (fJ)Delay (ns)Energy (fJ)Delay (ns)Energy (fJ) Traditional LFSR 0.7437.620000105774.1 TBB LFSR 0.6437450022.84.573.6 GHzkHzMHz
53
Logic Gate Results Results Highlights TBB, SBB, and DTMOS increase speed up to 7 times in sub-threshold Static CMOS has best overall logic style performance Pseudo-nMOS, Domino, and pass-transistor still are valuable in niche situations TBB and Traditional Noise Margins are comparable
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.