Class Report 林常仁 Low Power Design: System and Algorithm Levels
Why Low Power Battery life in portable systems Packaging and cooling cost Digital noise immunity Power supply rail design Environmental concerns Goal: reduce power dissipations but maintaining adequate throughput rate
Low Power Design Approaches System: Hardware-software partitioning, power distribution Algorithms: Complexity, concurrency, locality, regularity, data representation Architecture: Parallelism, pipelined, signal correlations Circuit/Logic: Size, logic design, logic style Technology: Scaling, threshold reduction, advanced packaging Run at minimum allowable voltage Reduce effective switching capacitance per sample
Level of Power Reduction Level of Abstraction Expected Saving Algorithm10–99% Architecture10-90% Logic Level20-40% Layout Level10-30% Device Level10-30% General Purpose Applicable Increasing Leverage
System Level Optimization System partition is very important for low power implementation of time-slicing OFDM receiver or system-on-chip (SOC) application Energy consumption determines the battery life. Functions are implemented in different modes: -- Active modes with different clocks (voltage) -- Standby mode with slow clock -- Sleep or suspend mode (slowest clock or shut down)
Power Reduction by Clock Gating Module Unit 1 Enable 1 Module Unit 2 Enable 2 Module Unit N Enable 2 Clock Need circuit in standby mode or active mode to generate enable signals Modules will be partitioned by -- application functions -- speed of implementation In SOC applications, the global clock might activate the local clock generator Reducing power consumption can use a global synchronous local synchronous (GALS) design style
Stopping Clock of Unused Block Function A Function B Function A Function B 0 1
Algorithm Level Optimization Apply fast algorithm to reduce the average switched capacitance C L per sample Multiplies are traded-off with adds Can be combined with other low area/power techniques via voltage scaling Select the suitable algorithm to meet the requirements and to reduce the computations Algorithm transforms: parallel/pipelined processing, look ahead, retiming, folding, unfolding, strength reduction
Algorithm Optimization - Example x0x0 x1x1 x1x1 x2x2 h0h0 h1h1 h0h0 h1h1 h0h0 h 1 -h 0 h1h1 y0y0 y1y1 y0y0 y1y1 x 0 +x 1 x 1 +x 2 x1x1 4 multipliers, 2 adds3 multipliers, 5 adds Winograd’s algorithm reduce the number of multiplies at the price of the number of adds
Precomputation-Based Optimization Comparator A > B A(n-1) B(n-1) B(n-2) A(n-2) B(0) A(0) Load Disable When A(n-1) B(n-1) Achieve up to 75% power reduction with 3% area overhead In the worst case, there are an additional 1 to 5 more gate delay
Don’t Care Optimization x1x1 x2x2 x3x3 xnxn R1 h f R2 x1x1 x2x2 x3x3 xnxn R1 h f R2 LE FF
Comparison of 8X8 DCT Algorithms AlgorithmMultiplicationsAdditions Brute Force4096 Row-Column1024 Chen [CSF77] Ligtenberg [LV86] Arai [AAN88]80464 Feig [FW92]54462 Lee [CL92]112472
References A. P. Chandreakasan and R. W. Brodersen, Minimizing Power Consumption in Digital CMOS Circuits, IEEE Proceedings, pp , April M. Mehendale and S. D. Sherlekar, VLSI Synthesis of DSP Kernels, Kluwer Academic Publishers, K. K. Parhi, VLSI Digital Signal Processing Systems – Design and Implementation, John Wiley & Sons, S.S. Rofail and K. Yeo, Low-Voltage, Low-Power Digital BiCMOS Circutis, Prentice Hall, 2000.