Power-Aware Compilation CS 671 April 22, 2008
CS 671 – Spring Why Worry about Power Dissipation? Environment Thermal issues: affect cooling, packaging, reliability, timing Battery life
CS 671 – Spring Power Dissipation Trends Hot Plate Nuclear Reactor Pentium Pentium Pro Pentium 2 Pentium 3 Pentium 4 (Prescott) Pentium 4
CS 671 – Spring Cooking-Aware Computing
CS 671 – Spring Intel vs. Duracell No Moore’s Law in batteries: 2-3%/year growth Processor (MIPS) Hard Disk (capacity) Memory (capacity) Battery (energy stored) x 14x 12x 10x 8x 6x 4x 2x 1x Improvement (compared to year 0) Time (years)
CS 671 – Spring Environment Protection Agency (EPA): computers consume 10% of commercial electricity consumption Includes peripherals, possibly also manufacturing Data center growth was cited as a contribution to the 2000/2001 California Energy Crisis Equivalent power (with only 30% efficiency) for AC CFCs used for refrigeration Lap burn Fan noise Environment
CS 671 – Spring Where Does the Juice Go in Laptops?
CS 671 – Spring What can we do about it? Two components to the problem: #1: Understand where and why power is dissipated #2: Think about ways to reduce it at all levels of computing hierarchy In the past, #1 is difficult to accomplish except at the circuit level Consequently most low-power efforts were all circuit related Now We Know Why Power is Important
CS 671 – Spring Power: The Basics Dynamic “switching” power vs. Static “leakage” power Dynamic power dominates, but static power increasing in importance Trends in each Static power: steady, per-cycle energy cost Dynamic power: capacitive and short-circuit Capacitive power: charging/discharging at transitions from 0 1 and 1 0 Short-circuit power: power due to brief short-circuit current during transitions. Most research focuses on capacitive, but recent work on others
CS 671 – Spring Temperature Capacitive (Dynamic) Power Static (Leakage) Power Minimum Voltage 20 cycles Di/Dt (Vdd/Gnd Bounce) Voltage (V) Current (A) VinVout CLCL Vdd Power Issues in Microprocessors
CS 671 – Spring Capacitive Power Dissipation Power ~ ½ CV 2 Af Capacitance: Function of wire length, transistor size Supply Voltage: Has been dropping with successive fab generations Clock frequency: Increasing… Activity factor: How often, on average, do wires switch?
CS 671 – Spring Lowering Dynamic Power Reducing Vdd has a quadratic effect Has a negative (~linear) effect on performance however Lowering C L May improve performance as well Keep transistors small (keeps intrinsic capacitance (gate and diffusion) small) Reduce switching activity A function of signal transition stats and clock rate Clock gating idle units Impacted by logic and architecture decisions
CS 671 – Spring Power vs. Energy
CS 671 – Spring Power vs. Energy Power consumption in watts Determines battery life in hours Sets packaging limits Energy efficiency in joules Rate at which energy is consumed over time Energy = power * delay (joules = watts * seconds) Lower energy number means less power to perform a computation at same frequency
CS 671 – Spring Power vs. Energy Metrics Power-delay Product (PDP) = P avg * t PDP is the average energy consumed per switching event Energy-delay Product (EDP) = PDP * t Takes into account that one can trade increased delay for lower energy/operation
CS 671 – Spring Low-Power Software Strategies Code running on CPU Code optimizations for low power Code accessing memory objects SW optimizations for memory Data flowing on the buses I/O coding for low power Compiler controlled power management CPU Cache Memory
CS 671 – Spring Code Optimizations for Low Power High-level operations (e.g. C statement) can be compiled into different instruction sequences –different instructions & ordering have different power Instruction Selection Select a minimum-power instruction mix for executing a piece of high level code Instruction Packing & Dual Memory Loads Two on-chip memory banks –Dual load vs. two single loads –Almost 50% energy savings
CS 671 – Spring Code Optimizations for Low Power Reorder instructions to reduce switching effect at functional units and I/O buses Cold scheduling minimizes instruction bus transitions Operand swapping Swap the operands at the input of multiplier Result is unaltered, but power changes significantly! Other standard compiler optimizations Intermediate level: Software pipelining, dead code elimination, redundancy elimination Low level: Register allocation and other machine specific optimizations
CS 671 – Spring Code Optimizations for Low Power Use processor-specific instruction styles on ARM the default int type is ~ 20% more efficient than char or short as the latter result in sign or zero extension on ARM conditional instructions can be used instead of branches
CS 671 – Spring ARM vs. THUMB ARM – 32-bit, requires fewer instructions THUMB – 16-bit, more instructions Switching between ARM/THUMB takes time
CS 671 – Spring Minimizing Memory Access Costs Reduce memory access, better use of registers Register access consumes less power than memory access Easy way: minimize number of r/w operations Cache optimizations Reorder memory accesses to improve cache hit rates Can use existing techniques for high-performance code generation
CS 671 – Spring Minimizing Memory Access Costs Loop optimizations such as loop unrolling, loop fusion also reduce memory power consumption More effective: explicitly target minimization of switching activity on I/O busses and exploiting memory hierarchy Data allocation to minimize I/O bus transitions –map large arrays with known access patterns to main memory to minimize address bus transitions –works in conjunction with coding of address busses Exploiting memory hierarchy –organizing video and DSP data to maximize the higher levels (lower power) of memory hierarchy
CS 671 – Spring Observation: Execution-time Variation Significant variation in execution time of real-time tasks But, variation is not random due to correlation in underlying signal (speech, sensor etc.)
CS 671 – Spring Observation: Applications Tolerant to Deadline Misses E.g. sensor networks Computation deadline misses lead to data loss Packet loss common in wireless links Significant probability of error in sensor signals noisy sensor channels Applications designed to tolerate noisy/bad data by exploiting spatio-temporal redundancy high transient losses acceptable if localized in time or space If the communication is noisy, and applications are loss tolerant, is it worthwhile to strive for perfect noise-free computing?
CS 671 – Spring Exploiting Execution-time Variation and Tolerance to Deadlines Idea: predict execution time of task instance and dynamically scale voltage so as to minimize shutdown Execution time prediction learn distribution of execution times (pdf) provide hints –MPEG decode can tell whether frame is P, I, or F But, some deadlines are missed! Adaptive control loop to keep missed deadlines < limit Provides adaptive power-fidelity trade-off
CS 671 – Spring Compiler-Controlled DVFS MICRO’05 – Princeton Use compiler to find (predict) large regions where low frequency won’t hurt performance
CS 671 – Spring Sensor Network Compilation PLDI 2007 – University of Pittsburgh 1 bit over the wire == 1000 executed instructions Rework binary “patches” to minimize difference from original binary
CS 671 – Spring Power-Aware Compilation Not all optimizations target performance Power-aware optimizations are Most important on embedded systems Most effective on VLIW architectures Still present primarily in the research community It’s important to rethink many of our notions of “optimization”