Software-Controlled Processor Speed Setting for Low-Power Streaming Multimedia Andrea Acquaviva, Luca Benini, Bruno Riccò D.E.I.S. - Università di Bologna
Motivation and Basic Idea Energy consumption in wearable devices affects: battery size, weight, and time system costs and reliability In μ-processor based architectures the CPU is the greatest contributor to energy Such architecture allows software-driven energy optimization Application-driven CPU speed-setting improves energy efficiency through just-in-time computation even with fixed voltage
Outline Background Contribution of the work Speed-Setting & Energy Optimization Clock frequency & Performance Experimental Results Conclusions
Background System Level Power optimization Application – side (workload – adaptive algorithms) Workload information Fast adaptation Operating system level (task scheduling) System information Slow adaptation Both problems have been investigated in various works included in the bibliography of the paper
Background (cont’d) Traditional Power optimization techniques Variable voltage based Workload-dependent voltage scheduling External hardware Discrete frequency range, Vdd Shutdown based Binary version worse adaptation Time and energy during transitions In contrast with previous work, we state that energy can be saved even with fixed voltage memory latency I/O synchronization
Contribution of the work Effectiveness of clock speed setting in multimedia streaming algorithm Automatic run – time processor frequency setting for energy minimization of MP3 decoding Streaming – multimedia workload characterization for speed-setting policies
Variable Frequency Energy as a function of frequency Energy consumption: T is given by: Hence the energy equation can be written as:
Variable Frequency (cont’d) Real time constraint: Nuseful and TMAX fixed, Nidle = Nidle(f) f > fmax The relation between f and the frame rate is not linear
Variable frequency (cont’d) Reduces costs of memory latency Reduces costs of I/O synchronization Discrete frequency range Adaptation mismatch
Multimedia Systems Hardware Software Wireless network, wired link from a host Wearable system: General purpose P (e.g. SoC) I/O HW units (DMA, IC, buffers…) Some external chips (ex. audio CODEC) Software Data processing algorithm MPEG decoding an audio stream P I/O EXT
The MPEG3 decoder An MPEG stream is composed by frames The decoder produces audio samples by processing block of frames. SW and HW buffering allows synchronization among input rate, output rate and elaboration time Each block must be elaborated in a fixed time, during this time the CPU does not access input or output buffers Output data are sent to the audio CODEC by the DMA
System characterization The effect of speed-setting on performance depends on: Hardware characteristics Workload system characterization: FRAME RATE vs FREQUENCY
Decision Algorithm: off – line phase Characteristics determination: FRA, FRB, FRW Overall normalized characteristics determination: FRAo, FRBo, FRWo. NFR(frame/s) FRB FRmax(br, sr) FRA Bit rate 1 FRW 0.9 0.8 0.7 Sample rate 0.6 0.5 0.4 f 100 200 300
Decision Algorithm: on-line phase audio stream br, sr Look-up FR FRmax f
Decision Algorithm: on – line phase (cont’d) FRMAX sr FRREQ fMIN fAVG fMAX fAVG: worst case (large jitter) fMAX: always guaranteed fMIN : best case AVG energy MAX energy MIN energy
Energy Penalty Memory system and interface does not speed up like the processor with increasing clock frequency Increasing f increases Energy Penalty
Experimental Results E(mJ) Energy penalty E(mJ) f Energy per frame f 11 Energy penalty 10 E(mJ) 9 11 8 f 100 200 300 10 Energy per frame 9 8 f 100 200 300 fMIN
Conclusions and future work Approach ro automatic run-time setting of optimum processor frequency for energy minimization for streaming MP3 System characterization for speed-setting policies Future: other embedded applications (ex. MPEG Video) Closed loop policies