Presentation is loading. Please wait.

Presentation is loading. Please wait.

By: Jabulani Nyathi Washington State University School of EECS April 30, 2009 Circuits and Architectures to Deliver Low Power and High Speed Systems.

Similar presentations


Presentation on theme: "By: Jabulani Nyathi Washington State University School of EECS April 30, 2009 Circuits and Architectures to Deliver Low Power and High Speed Systems."— Presentation transcript:

1 By: Jabulani Nyathi Washington State University School of EECS April 30, 2009 Circuits and Architectures to Deliver Low Power and High Speed Systems.

2 Outline CMOS Scaling  Its benefits and  The challenges it brings about Various Techniques for Limiting Leakage Currents  Their shortfalls Bridging the speed-Power Gap  The Tunable Body Biasing Scheme Emerging Devices and Technologies Concluding Remarks

3 CMOS Scaling and its Benefits Aggressive CMOS scaling has been a very positive development allowing:  Fast switching devices, thus high speed computing.  Massive integration due to miniaturization No longer do we need multiple chips to implement a microprocessor and its peripherals In fact, we can now have multiple computing elements on a single die resulting in system on a chip.

4 CMOS Scaling and its Challenges CMOS scaling results in:  increased leakage currents (5X/node) and  Increased dynamic power dissipation. The interconnect does not scale as fast as the transistor thus  Highly integrated designs require elaborate clock distribution schemes.  IPs within a System on a Chip would be difficult to synchronize with a single clock source.

5 Scaling Implications Module 1 Local Interconnects Module 2 Global Interconnects Scaled

6 Dynamic Vs Leakage Power

7 Research Motivation Desire to Bridge the Speed-Power Gap by Exploring the feasibility of optimizing devices to operate effectively in both sub-threshold and above threshold voltages. Emerging Technologies that are Ultra-Low power can benefit from increased speed.  Wearable computers, sensor networks, implantable medical technology  Emphasis on design for energy-efficiency

8 Existing Low Power Design Approaches Solve energy dissipation problem from a region of operation standpoint  Sub-threshold design DTMOS: shows a 5.5 times increase in current  Dynamic threshold provides energy efficiency SBB: 4.4 times frequency increase  Above threshold (Super-threshold) design MTCMOS: high and low threshold devices VT Scheme: reduce power by 50% using ABB and “sleep”/“active” modes  Architectural Gating Techniques: 45% of total power

9 DTMOS/SBB Output Voltage Clamping Traditional SBB, DTMOS, TBB 600 mV 1.8 V

10 Proposed Approach Change approach to include all possible operating regions: Tunable Body Biasing (TBB)  Sub-threshold and super-threshold operation bridged  Ultra-low energy and low speed or high energy and high speed Utilize body biasing to improve performance of sub-threshold operation  Target increased performance at sub-threshold and slightly above threshold.  Save energy by eliminating idle time and process continuously with variable power supplies (perform just in time task completion) Target applications  Mobile, battery operated (power constrained), variable processing devices Cell phones, PDAs, notebooks, wireless sensors, embedded systems, ASICs, medical technology, etc.

11 TBB Implementation Goals  Attain ON state current gain while minimizing OFF state leakage current increase  Highlight advantages of sub-threshold operation while allowing super-threshold operation if needed  Control bulk terminal to tunable potentials depending on V DD and desired region of operation MOS Bulk Control Circuits  Multiplexer-based approach Two transistors per bulk control circuit Utilizes V thn0

12 TBB Bulk Control Circuits Relies on passing of good/poor logic “1” and logic “0” properties of pass-transistors Requires external control signals  SubVt and SubVt_b V DD TBB MOS Bulk Control Signal pMOS BulknMOS Bulk V SS <V DD ≤V thn0 V SS V DD V DD > V thn0 V DD – V thn0 V thn0

13 TBB Bulk Control Circuit Simulation Sub-threshold: pBulk = 0 V Super-threshold: pBulk = V DD – V thn0

14 Device Optimization TBB encourages varying supply voltages  How will devices be sized for optimal operation at any supply voltage?  Maintain symmetric switching  Examine inverter at varying supply voltages

15 Device Optimization (Switching Point) V DD Ideal Inverter Threshold Simulated Inverter Threshold Percent Variation 1.8 V900 mV 0.0% 1.0 V500 mV498 mV0.4% 376.2 mV188.1 mV198.7 mV5.6% 188.1 mV94.05 mV108.6 mV13.4%

16 Sub-threshold Noise Margins Noise Margins significant for proper logic levels TBB and Traditional static CMOS inverter have comparable noise margins  TBB V IH is 12.5% worse  TBB V IL is 14.3% better

17 Propagation Delay Gate Traditional Delay TBB Delay % Decrease TG 98 ns 14 ns 86 Inv 125 ns20 ns 84 NAND 133 ns18 ns 86 NOR 163 ns25 ns 85 XOR 289 ns40 ns 89

18 Review of SubVth Circuits Benefits So far, the presentation has shown:  TBB requires control of MOS bulks to span the operating regions of interest. Implementation is successful.  Study of simple logic gates showed: TBB gives a dramatic speed increase (up to 7x) Static CMOS design style is suitable for sub-threshold and super- threshold operation  Sizing of efficient devices for the TBB approach is possible However, how will a complex system perform?  Design with previous knowledge (logic style, sizing)  Analyze post-layout simulations

19 Complex System-on-Chip Design Using TBB Work addresses the challenges of  Global Interconnect Delays  Clock distribution  Synchronization of unrelated clocks and  Power dissipation

20 Conclusion TBB scheme has been devised to span all regions of operation from ultra-low power to high-speed. New kind of body biasing  Forward-biasing causes exponential sub-threshold current gain Leads to 7 times frequency increase in simple logic gates  Focus on sub-threshold and slightly above threshold to utilize leakage Bulk control circuits are effective  4% area and 8.9% power dissipation increase Static CMOS is ideal overall design style  Device sizing at either sub-threshold or super-threshold allows efficient operation with variable supply voltages

21 Concluding Remarks Allowing tunable operation allows the designer to choose operating point (kHz, MHz, GHz) – Energy Dissipation is affected.  Other schemes do not offer this flexibility  TBB can lead to significant energy savings LFSR results show TBB gives:  Maximal 5.7 times speed increase (sub-threshold)  Comparable energy at super-threshold and favorable at sub- threshold  Favorable EDP at all operating regions  Operate at the same speed with less energy dissipation Idle state leakage current can be minimized by collapsing the supply voltage

22  ROUTER CHIP Integrating Research Into Instruction Data Path Circuits Memory Design Sub-System

23 Incorporating Research into Instruction A long term objective is to place some of the integrated chips on development boards such as those Digilent Inc produces. The integrated chips become part of a system and can be used in some of our low level courses. Most important is the use of these programmable boards to show case the research outcomes, particularly to visiting prospective students. A sample development board:

24 Questions and Comments Welcome!

25 Multiple Clock Domain Synchronization

26 Reducing Interconnect Delays Improved latency and bandwidth Global interconnects are pipelined at or near the rate of computation

27 Sources of Power Consumption Most straight forward method to reduce power consumption from any source is to reduce V DD Controlling frequency directly manipulates dynamic power Controlling device threshold manipulates leakage current, affecting leakage and short circuit power.

28 Distributed FIFO Control Circuitry

29 Traditional Body BiasingTunable Body Biasing Tunable BB % diff VddLocalClock2currentLocalClock2current V delay (ps) freq (GHz) uA delay (ps) freq (GHz) uAfreqcurrent 1111.293100103.19.729887.8-3.6 0.7172.555.81240177.75.61042-3.4-16 0.351354.50.73837114380.695472.9-5.8-2.7 0.2967000.01032.81166400.06015.05148379.8 Traditional vs. Tunable Body Biasing The synchronizer/buffer shows an increase in performance at sub-threshold voltages when using tunable body biasing

30 Tunable Body Biasing Current (uA)Power (uW) Vdd (V) Max Freq (GHz)PeakAvgIdlePeakAvgIdle Traditional Body Biasing 14559723828.696559723828.696 0.722222803.44.8731555.4562.383.411 0.350.125131.135.581.46845.88512.4530.514 0.20.017.4522.8951.3491.490.5790.27 Tunable Body Biasing 14514024609.54514024609.54 0.7220508334.4231435583.13.096 0.350.16713239.81.58946.213.930.556 0.20.0159.4684.031.2391.8940.8060.248

31 Pursuit of Low Power Operation It is likely that not all IP blocks in a SoC need to operate at high speed Power dissipation for those IP blocks could be reduced by operating at a lower voltage TBB offers the possibility to dynamically operate at either sub-threshold or super- threshold voltages

32 Variable Voltage SoC Consider a SoC with 50 IP blocks, each requiring communication at a rate of 10 MHz Each IP could operate at sub- threshold levels The channel could operate at super-threshold voltages while the IP blocks are in sub-threshold Vdd1 Vdd2 Vdd3 Vdd4 Vdd5

33 Idle vs Operating Power IdleOperating Vdd (V) Current (uA)Power (uW) Current (uA) Power (uW) 116.9 2988 0.75.33.711042729.4 0.351.50.52572.925.52 0.20.9250.1855.0511.01 During idle periods, it is advantageous to reduce leakage current by  Reducing the power supply voltage or  Increasing the threshold voltage (e.g. bulk voltage manipulation)

34 Speed at Varying V DD TBB 5.7x Faster At 376.2 mV TBB 20% Faster At 1.8 V

35 Energy-delay Product EDP of TBB outperforms Traditional at ALL operating regions, significantly in super-threshold

36 Regions of Operation 3.9 MHz with 0.6 fJ/cycle 222.2 MHz with 103 fJ/cycle 1.1 GHz with 3.85 nJ/cycle

37 Contributions of this work  Proposed scheme alleviates the communication bottleneck and offers a way to synchronize SoC multiple clocks Perform data transfers up to 10 GHz Proposed scheme maintains high performance under the influence of any clock skew 6.5 GHz for any process corner and any skew  Low power FIFO scheme with a small impact on area when used in SoCs with many modules

38 Contributions of this work Process corners have a minor impact on performance, resulting in a 10% reduction of speed The optimal voltage for minimum energy consumption per transaction is at 2V th Introduction of TBB to address leakage and dynamic power dissipation  500% increase in performance at sub-threshold voltages with a modest 80% increase in power  5-10% less power dissipation than traditional body biasing

39 Summary of Proposed FIFO Scheme Linear FIFO scheme that addresses  Signal propagation across communication channel Sustained throughput over long distances  Successful Synchronization Synchronizes equal, rational & arbitrary clocks 6.5 GHz sustained performance after process corner analysis using 3 stages.  Compared to CN scheme Fewer devices per stage, fewer stages needed 25% higher performance, 12% lower power  Operates at both super- and sub-threshold voltages Lower instantaneous power demands from local clocks (less di/dt) Optimal energy per transaction at 0.7V in a 65nm process Sub-threshold reduces power by 3 orders of magnitude Tunable Body Biasing provides 50% increased performance in sub-threshold while maintaining super-threshold operation

40 TBB Scalability Technology 180 nm90 nm Body Biasing and Operating Region Total Average Power Dissipation Static Power Contribution [%] Total Average Power Dissipation Static Power Contribution [%] Traditional in Sub-threshold 193 pW0.1%13.1 nW1.8% Traditional in Super-threshold 39.6 μWNegligible22.1 μWnegligible TBB in Sub-threshold 1430 pW25.2%20.4 nW6.1% TBB in Super-threshold 39.4 μW0.000034%22.1 μW0.0025% At 180 nm, TBB sub-threshold static power % is large At 90 nm, the % difference is much less Total TBB sub-threshold power is large Total TBB sub-threshold power isn’t so large

41 LFSR Energy vs. Frequency

42 TBB Implementation Cont.

43

44 Logic Gate Analysis (Power)

45 Inverter Power Dissipation V DD Power Dissipation [fW] Average Power [nW] Maximum Frequency [MHz] Period [ns] 0.32628.273.50.4162400.0 0.426211.4130.02.6380.0 0.564315.64651.641.724.0 1.882.3068.60833.31.2 V DD Power Dissipation [fW] Average Power [nW] Maximum Frequency [MHz] Period [ns] 0.32628.5222.42.6380.0 0.426213.00259.820.50.0 0.564315.132102.0138.97.2 1.881.4781.51000.1.0

46 Logic Gate Analysis (Energy)

47 Logic Gate Analysis (EDP)

48 Logic Gate Analysis (Fan-in)

49 Logic Gate Analysis (Logic Styles)

50 LFSR Power Dissipation

51 Device Optimization (Optimal Region)

52 Regions of Operation Design Super-threshold (1.8 V) Sub-threshold (250 mV) Optimal (750 mV) Delay (ns)Energy (fJ)Delay (ns)Energy (fJ)Delay (ns)Energy (fJ) Traditional LFSR 0.7437.620000105774.1 TBB LFSR 0.6437450022.84.573.6 GHzkHzMHz

53 Logic Gate Results Results Highlights  TBB, SBB, and DTMOS increase speed up to 7 times in sub-threshold  Static CMOS has best overall logic style performance Pseudo-nMOS, Domino, and pass-transistor still are valuable in niche situations  TBB and Traditional Noise Margins are comparable


Download ppt "By: Jabulani Nyathi Washington State University School of EECS April 30, 2009 Circuits and Architectures to Deliver Low Power and High Speed Systems."

Similar presentations


Ads by Google