1 System-level Power Optimization. 2 Contents  Low Power System Implementation Techniques  Circuit level  Clock gating  MTCMOS  Multiple voltage.

1 System-level Power Optimization

2 Contents  Low Power System Implementation Techniques  Circuit level  Clock gating  MTCMOS  Multiple voltage supply  Architecture level  Memory Optimization  Bus Optimization  Dynamic Power Management in System Level  Introduction to DPM  Structure of DPM  Component-level DPM scheme  DPM Policy  Dynamic Voltage Scaling

3 Circuit Level Low Power System Implementation Techniques  Clock gating  Most popular method for power reduction of clock signals  Need circuit to generate enable signal  Increases complexity of control logic  Timing critical to avoid clock glitches at AND gate output  Additional gate delay on clock signal -> clock skew

4 Circuit Level Low Power System Implementation Techniques  Power gating ;  Disconnecting the power source  Applicable for each voltage island  Long transient due to large capacitance  An generate noise due to large inductive component  Needs good power switch

5  MTCMOS ; a kind of power gating?  Low V TH devices in logic to maintain performance when active.  High V TH current switch (header or footer) to cutoff leakage path when sleep.  Scheduling algorithm which controls sleep signal is important. Logic V DD sleep Virtual GND Virtual V DD header footer sleep Input Output Circuit-Level Low-Power System Implementation Techniques

6  Multiple Supply Voltages  Slows down non-critical path with lower voltage supply  Two or more power grids  Need high-efficiency voltage converters for dynamic voltage scaling ; Down conversion is cheaper than up-conversion.  Dynamic power scheduling algorithm is important. * + - + + Low voltage supplyCritical path: need high speed logic High voltage supply In Circuit-Level Low-Power System Implementation Techniques

7 Architecture-Level Low-Power System Implementation Techniques  Memory Optimization  Code density minimization  Goal  Minimize program memory occupation to reduce the bandwidth of processor-memory communication  Approaches  Employ custom instruction sets  Object code compression

8 Memory Optimization  Custom instruction set  Shorter size instruction sets than regular instruction sets  Example : ARM Thumb code (16-bit instruction)  Need a specific architecture for 16-it instruction support 32bit Inst 1 32bit Inst 2 Inst 3 Inst 4 Inst 5 Inst 1 Inst 2Inst 3 Inst 4Inst 5 In this case, 3/5 bandwidth reduction

9 Memory Optimization  Object code compression  The size is the same for all instructions, but some or all instructions are encoded and saved in instruction memory.  Available solution for embedded processors  Same architecture can be used for different subset of instructions  Exploit the small subset of instructions used by firmware code  Approaches  Full code compression  Selective code compression

10 Memory Optimization  Full code compression  Replace all instructions with binary patterns of minimum width, [log 2 N], where N is the number of instructions in the inst. set  Advantage  Memory bandwidth for instruction is decreased.  Advantageous when k > log 2 N  Disadvantage  Size of IDT may be very large because N is not small.  log 2 N may not be a multiple of 8. IDT : Instruction Decompression Table k bits Core k Memory bits log 2 N bits Core Memory k IDT log 2 N Addr. Inst.

11 Memory Optimization  Selective Code Compression  Most program traces are covered by a small subset of instructions.  Compression of only such subset – instructions that maximize program coverage  Program is a mix of compressed and uncompressed instructions. 8 bits Core Memory k IDT 8 Addr. Inst. k Buffer Controller

12 Memory Optimization  Advantage  Size of IDT is fixed and limited.  Instruction fetching/decompression logic has reduced complexity.  Disadvantage  Requires a controller to handle instruction fetching

13 Memory Optimization  Data density optimization  Same principle as code density optimization  For the purpose of reducing memory traffic  dynamic size of the data-set  More complex than code compression, because both compression and decompression are required  Hardware compression/decompression unit needed  Design trade-off between speed and power

14  Bus power optimization  A large amount of power is dissipated in data communication over heavily-loaded on-chip or off-chip busses.   Reduce switching activity on busses via signal encoding for power saving  Approaches  Bus-invert coding  Gray code addressing P Bus = n x C x V dd 2 x freq x activity, for an n-bit bus Architecture Level Low Power System Implementation Techniques

15 Bus Optimization  Bus-invert coding  Add redundant line INV to bus  When INV = 0  Data is equal to remaining bus lines  When INV = 1  Data is complement of remaining bus lines  At each cycle decide whether sending the true or compliment signal leads to fewer toggles Source data Received data Data bus INV signal Polarity Decision logic

16 Bus Optimization  Gray code addressing  Most instruction addresses are consecutive  Use Gray code to address  Word-oriented machines  Increments by 4 (32 bit) or by 8 (64bit)  Modify Gray code to switch 1 bit per increment  Gray code adder needed for jump DecGray(i=1)Gray(i=4)Gray(i=8) 012345678012345678 0000 0001 0011 0010 0110 0111 0101 0100 1100 0000 0001 0011 0010 0100 0101 0111 0110 1100 0000 0001 0011 0010 0110 0111 0101 0100 1000 i : increment

17 Introduction to DPM  Dynamic Power Management (DPM)  DPM controls power consumption of components based on its usage.  Prediction of component usage is essential.  Methods  Shutdown (clock gating, power gating)  Slowdown (frequency scaling, voltage scaling, V TH scaling) f V DD T/2T idle 0.6 V DD V DD

18 Structure of DPM  Levels of embodiments of DPM  Component level  Circuit, Block  Power mode  System level  Policy  The procedure which controls the power level of each module in a system Circuit … Block 1 Policy Circuit … Block n Circuit … System power mode request

19 Component Level DPM Scheme  Circuit level  Clock off by clock gating  Power off by footer/header of MTCMOS  Multiple voltage supply  Block level  Power off by shutdown of power supply to IPs  When power off pattern of two block are similar, shutdown together. IP #2 IP #1 GND source Virtual V DD Virtual GND V DD source

20 Component Level DPM Scheme  Power mode  Each state has combination of enabled DPM technique.  ex) The case that system uses clock gating and block shutdown  Transitions between modes of operation have a cost. Run 10μs 90μs 160ms 90μs P=50mW P=0.16mW P=400mW Wait for interruptWait for wake-up event Power state machine for the StrongARM processor Idle Sleep Power mode Clock gating Block shutdown Rundisabled Idleenableddisabled Sleepenabled SA-100 Microprocessor Technical Reference Manual, Intel, 1998

21 DPM Policy  Predictive technique  Uses a regression equation based on previous “ On ” and “ Off ” times of the component to estimate the next “ turn on ” time.  Limitation  It cannot handle components with more than two power modes. Running (R) Sleep (S) Wake-up Go-to-sleep Predictive power management scheme RRI RESRW delay RRI REWR RRI RESRW RRI RESRWI Pre-wakeup scheme I: Idle state E: Entering state W: Waking up state M. Srivastava et al, “Predictive system shutdown and other architectural techniques for energy efficient programmable computation”, IEEE TVLSI, Vol. 4, No.1,1996 C.H. Hwang et al, “A predictive system shutdown method for energy saving of event-driven computation”, Proc. Int. Conf. on Computer Aided Design, pages 28-32, Nov. 1997

22 DPM Policy  Markov process  Markov process is a process which uses a previous state and pre-characterized probability to choose next state.  Power management optimization has been studied within the framework of Markov process.  When system is modeled as Markov chains  It can model the uncertainty in system power consumption and response times.  It can model complex systems with many power states, buffers, queues.  It can compute power management policies that are globally optimum. G.A. Paleologo et al, “Policy optimization for dynamic power management”, Proc. DAC, 1998

23 DPM Policy Power Manager Service RequestorService Provider queue Request Observation Command  Structure of stochastic DPM  FSM of each module

24 Dynamic Voltage Scaling  DVS  Reducing VDD is a single most effective way to reduce power consumption.  Reducing VDD is limited by the worst-case condition.  Performance requirement varies with time.  Solution  Slowdown : perform the job with just-in-time performance

25 DVS Applied Processor  Transition overhead  Max 70μs for 5~80MHz transition  Max 4μJ for 5~80MHz transition ARM Core 16KB Cache System Co-processor Bus interface Write Buffer VCO CPU Regulator F desired V DD System BUS 64KB SRAM... 0.5MB I/O Chip V Bat T.D. Burd et al, “A dynamic voltage scaled microprocessor system”, IEEE JSSC, Nov. 2000

26 DPM using DVS on SoC  Divide SoC into 4 power domains  Persistent 3.3V : I/O drivers and receivers  Persistent 1.0V : PLL  Persistent 1.8V : RTC, sleep management  DVS : 1.0V ~ 1.8V (10mV/μs) K.J. Nowka et al, “A 32-bit PowerPC System-on-a-Chip with support for dynamic voltage scaling and dynamic frequency scaling”, IEEE JSSC, Nov. 2002

1 System-level Power Optimization. 2 Contents  Low Power System Implementation Techniques  Circuit level  Clock gating  MTCMOS  Multiple voltage.

Similar presentations

Presentation on theme: "1 System-level Power Optimization. 2 Contents  Low Power System Implementation Techniques  Circuit level  Clock gating  MTCMOS  Multiple voltage."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 System-level Power Optimization. 2 Contents  Low Power System Implementation Techniques  Circuit level  Clock gating  MTCMOS  Multiple voltage.

Similar presentations

Presentation on theme: "1 System-level Power Optimization. 2 Contents  Low Power System Implementation Techniques  Circuit level  Clock gating  MTCMOS  Multiple voltage."— Presentation transcript:

Similar presentations

About project

Feedback