Download presentation
Presentation is loading. Please wait.
1
Optimization Of Power Consumption For An ARM7- BASED Multimedia Handheld Device Hoseok Chang; Wonchul Lee; Wonyong Sung Circuits and Systems, 2003. ISCAS '03. Proceedings of the 2003 International Symposium on, Volume: 5, 25-28 May 2003 Pages:V-105 - V-108 vol.5 Presenter: Chin-Chi Hu
2
Chin-Chi Hu 2/20 2015/6/27 Abstract We have developed a multimedia handheld educational device and optimized the current consumption not only by employing several software optimization techniques but also by using dynamic clock frequency scaling scheme (DFS). Although the ARM7 CPU employed does not support operating voltage scaling, the controlling of the operating frequency helps reducing the current consumption in the idle time and results in up to 25% of power reduction in the system level. The CPU operation frequency is determined by profiling the multimedia program components, which include LZW (Lempel-Ziv Welch) image decompression, MP3 audio decoding, CELP based speech decoding, speech recognition and ADPCM. Especially, it is shown that the time for LZW decompression is proportional to the image size rather than the size of the compressed file. The CPU load becomes almost full, between 80 to 95%, after applying the DFS.
3
Chin-Chi Hu 3/20 2015/6/27 What’s the problem? Multi-Tasking operating system and dynamic frequency scaling analysis the current consumption for system Software optimization techniques improve software to reduce numbers of instruction and clock cycle CPU load estimation the CPU load for executing each software components Results and optimization
4
Chin-Chi Hu 4/20 2015/6/27 Introduction A low power multimedia handheld device only two AA-size batteries It was needed to optimize DSP programs MP3 decoding LZW(Lempel-Ziv Welch) decompression speech recognition Aspect ARM7 specific feature optimization of software components lowering the CPU clock frequency minimizes the idle time
5
Chin-Chi Hu 5/20 2015/6/27 System architecture Speaking partner ARM7TDMI 60MHz CPU 8KB cache graphic LCD controller synchronous DRAM controller IIS interface 8 channel of 10 bit ADC 128KB NOR flash for system ROM NAND flash and SMC (smart media card) for program ROM SSFDC (solid state floppy disk card) and USB for read / write
6
Chin-Chi Hu 6/20 2015/6/27 System architecture Speaking Partner
7
Chin-Chi Hu 7/20 2015/6/27 Current consumption The CPU drains some power even when the CPU load is very small although the CPU is mostly in the idle state It is advantageous for power reduction to use the lowest possible clock frequency. The estimation of the minimum clock frequency for a real-time implementation is needed
8
Chin-Chi Hu 8/20 2015/6/27 Current consumption This figure shows that the dynamic frequency scaling scheme is more efficient than the constant frequency operation with idle state when the load condition is low
9
Chin-Chi Hu 9/20 2015/6/27 Current consumption Current consumption at each hardware block (CPU load is 10%)
10
Chin-Chi Hu 10/20 2015/6/27 Software optimization ARM7TDMI processor has characteristics for implementing DSP algorithms large number of registers most of the instructions can be executed conditionally 32 bit barrel shifter block load and store instructions are supported ARM7TDMI processor has a relatively simple data path, where the hardware multiplier only has the accuracy of 32*8 bits
11
Chin-Chi Hu 11/20 2015/6/27 Software optimization MP3 decoding algorithm C language based high level optimization assembly language based low level optimization optimized by the conditional execution of ARM7TDMI processor
12
Chin-Chi Hu 12/20 2015/6/27 Software optimization block data transfer is used for load (LDM) or store (STM) of any subset of currently visible registers to/from sequential memory No block data transfer of 15 32-bit registers from registers to sequential memory 14S+2N+1I cycles From registers to memory using the store instruction (STR) (1S+1N+1I)*15 S :sequential cycles N :non-sequential cycles I :internal cycles
13
Chin-Chi Hu 13/20 2015/6/27 Software optimization
14
Chin-Chi Hu 14/20 2015/6/27 Software optimization Optimization for speech recognition 16bit multiplications instead of 32 bit multiplications 8% of cycle time reduction employed several software optimization techniques loop fusion loop unrolling post increment/decrement conversion total execution time is reduced to about 30~45%
15
Chin-Chi Hu 15/20 2015/6/27 CPU load estimation The load for MP3 decoding is dependent on the bit rate and sampling clock frequency The CPU load with 60MHz 56kbps 22.05kHz : 10% 32kbps 22.05kHz : 9.6% 32kbps 16kHz : 7% The load for CELP decoding is almost constant 18% of the 60MHz CPU load
16
Chin-Chi Hu 16/20 2015/6/27 CPU load estimation Processing time of LZW according to the number of pixels Processing time of LZW according to the compressed data size
17
Chin-Chi Hu 17/20 2015/6/27 CPU load estimation Execution time prediction of each software component
18
Chin-Chi Hu 18/20 2015/6/27 Experimental result 478mA(optimized) / 542(original current) = 88.2%
19
Chin-Chi Hu 19/20 2015/6/27 Experimental result No change the clock frequency of the CPU, which would be a more aggressive power optimization approach which paying the delay for PLL relocking
20
Chin-Chi Hu 20/20 2015/6/27 Concluding A dynamic frequency scaling scheme is employed in order to reduce the CPU power consumption, which shows that 20% of system power saving can be achieved The power analysis show that the current consumed at the DRAM is almost equal to that of the CPU core, which means that reducing cache miss is most important for lowering power consumption The current can be further reduced, without any significant change in the power reduction algorithm Employ a CPU that supports the dynamic voltage scaling (Intel’s Xscale)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.