Download presentation
Presentation is loading. Please wait.
Published byPatrick Willis Modified over 9 years ago
1
Word-Size Optimization for Low Energy, Variable Workload Sub-threshold Systems Sudhanshu Khanna, Anurag Nigam ECE 632 – Fall 2008 University of Virginia @virginia.edu
2
–Energy constrained Sub-Vt systems Medical devices Environmental sensors –Need to lower E in order to enable “lifelong” operation –SMALL “FORM-FACTOR” => Area Reduction –Total E = Active E + Sleep E Introduction
3
Top Level Problems Addressed Energy Reduction –Active –Sleep Mode Area Reduction Adaptation of Super-threshold designs to sub-threshold
4
Current Approaches Voltage Regulated from THIS off-chip, (expensive) DC-DC converter Ref: K.Craig, R.Matthews, EE632 Fall 2008
5
Our approach Make the “starting point” design more E-efficient, Specifically for Sleep Mode operation
6
Sure way of lowering CV 2 : Lower V => Sub-threshold 1.2V0.2V Logic System Can we optimize the Logic system for sub- Vt operation, or should it be the same
7
Sure way of lowering CV 2 : Lower V => Sub-threshold 1.2V 0.2V Logic System Smaller Logic System Make the system as small as feasible. Use it over and over till the required operation is done. Then goto sleep and leak less !! How do we make the system smaller: USE A SMALLER WORD-SIZE Will using the SMALL system over and over increase the ACTIVE Energy???
8
Smaller Word-Size: Problems Addressed For Sure, small word-size means: –Lower Area –Lower Sleep Energy –Higher Delay We need to find: –How much is the Area/Sleep E benefit ? –Impact of multi-cycle operation on Active E ?? –Can we somehow make them faster without losing the Sleep E and Area advantage ???
9
Smaller Word-Size: Our Contribution For Sure, small word-size means: –Lower Area –Lower Sleep Energy –Higher Delay We need to find: –How much is the Area/Sleep E benefit ? –Impact of multi-cycle operation on Active E ?? –Can we somehow make them faster without losing the Sleep E and Area advantage ??? > 20x area benefit > 33x sleep energy benefit Multi-cycle operation increases Active E But the final value of the Active E is about the same/lesser than that of a 32-bit system. Yes, delay degradation can be overcome !!! while still being more energy efficient
10
Systems Compared Addition of two 32-bit numbers using: –Large word-size (32-bit) Kogge-Stone Adder Ripple Carry Adder Full-Adder –Small word-size (1-bit) 1-bit taken for simplicity, the trends would be valid for other word- sizes e.g. 16-bit, 8-bit etc. Addition is taken as a sample digital function. However, trends founds can be generalized to other digital functions as well.
11
32-bit Kogge-Stone Adder (KSA), 32-bit Ripple Carry Adder (RCA) 32 Bit Register 32 Bit KSA or RCA PA PB Reset CLK PA = Parallel input A PB = Parallel input B OUT = Parallel output from Sum Register 32 Bit OUT
12
Small-Word Size system n-bit Full Adder n-Bit Register CLK In general, an n-bit word system will have n-bit operands Let the smaller word-size be n. Then the system will look like this: Just like a 32-bit system, but only smaller! n < 32 In case n = 1, the system will take 32 clock cycles to add two 32-bit numbers. Hence the higher delay. 1-bit Full Adder 1-Bit Register CLK n = 1 1-bit Serial Adder (SA)
13
Serial ADC 1-bit Full Adder 1-Bit RegisterSerial DAC Serial Multiplier CLK Analog Input Analog Output CLK 1-Bit Register 1-bit input from other part of chip Simulated 1-bit SA A conceptual fully-serial 1-bit system
14
32-bit Serial Adder (SA) using Full-Adder 32 Bit Shift Register 32 Bit Shift Register 32 Bit Shift Register 1 Bit Full Adder Carry Flip Flop PA PB CLK Cin Cout Regular 32-bit word system, But parallel adder replaced by 1-bit full adder => LOWER SLEEP ENERGY Takes 32 cycles but is amenable for use in a an un-modified 32-bit word system 1 Bit OUT
15
Energy drawn for addition of two 32-bit numbers is measured for all the 4 systems: –32-bit KSA –32-bit RCA –32-bit SA –1-bit SA Clock and register power taken into account Important Metric: Energy per operation Large word-size systems Small word-size system
16
Active Energy @ VDD = 300mV HIGH Edyn ~ Etot ~ 6pJ But leakage current is 1.7x lower Shows that active energy of 1-bit system < 32-bit systems 40% active energy benefit @ 22nm 33x reduction in leakage current (note that above plot is only showing active energy)
17
Conclusions @ 300mV 1-bit SA has 40% lower active E than the best 32-bit system 1-bit SA has 33x lesser leakage current than the best 32-bit system 32-bit SA has 1.7x lesser leakage current than 32-bit KSA Thus multi-cycle operation doesn’t increase active energy too much Hence once sleep time is added, benefits of small- word systems will increase => if word-size limited to 32, serial addition will save energy if the application has lot of sleep time e.g. in sensor nodes !!! => if word-size limited to 32, serial addition will save energy if the application has lot of sleep time e.g. in sensor nodes Hence once sleep time is added, benefits of small- word systems will increase
18
Logic System small word VDD incs => delay decs Can be used to make small-word size systems faster !!! But, impact of the VDD increase on Energy ??? 0.4V 1.2V 0.2V Logic System 0.2V Already compared Logic System small word
19
Energy @ constant delay Delay is equal Now we compare energy at constant delay Small word-size more energy efficient even after the VDD increase But the margins of energy benefits do go down The same is not true in super-Vt ! WHY??? Difference in On-Current Equation in super-Vt and sub-Vt 0.2V Logic System 0.4V Logic System small word
20
SMALL SLOPE LARGE SLOPE SMALL SLOPE LARGE SLOPE Sub-VtSuper-Vt VDD change => no impact on E !!
21
Pareto-Optimal E-D Curve Super-Vt -> 32-bit system is pareto-optimal Sub-Vt -> 1-bit system is pareto-optimal Cross-over: 1-bit system becoming optimal Super-VtSub-Vt
22
Generality of Trends 1-bit system is used as an example. Energy and area benefits will be achieved in any small word- size system. Shift in pareto-optimal curve happens because of difference in I on equation. Hence this behavior can be observed in other parts of a digital system as well, and not just addition. Opens energy saving opportunities in more areas of digital design
23
Logic System small word Conclusions @ constant delay While going into sub-Vt operation, re-look the word-size of the system being used. Optimal word-size goes down: Small word size gives lower E and Area and matches delay 0.2V Logic System 0.4V Energy less Leakage less Area ($$$) less Delay Same
24
Different Word-Size Systems 1-bit ( Digital Audio System – Sharp) 4-bit ( Marc4 Micro controller, Intel 4040) 8-bit ( Micro controllers, Intel 8080 processor) 16-bit ( Intel 8086 processor) 64-bit ( Athlon 64, Opteron processor)
25
FIR Filter Used in many real time DSP systems ( audio, video processing) 4-Tap FIR Filter K(i): Filter Coefficients Serial Implementation of a Parallel FIR filter
26
Delay Multiplier 4-input Parallel Adder X(n)X(n-2)X(n-1)X(n-3) K0K0 K3K3 K2K2 K1K1 Y(n) K 0, K 1,K 2,K 3 : Filter Coefficients Stored in memory
27
Serial Parallel Multiplier 1-bit Serial Adder Register Y(n) Filter Coefficients (K 3, K 2, K 1, K 0 ) X(n): serial input data Serial output From memory
28
QUESTIONS
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.