Optimization Of Power Consumption For An ARM7- BASED Multimedia Handheld Device Hoseok Chang; Wonchul Lee; Wonyong Sung Circuits and Systems, 2003. ISCAS.

Slides:



Advertisements
Similar presentations
DSPs Vs General Purpose Microprocessors
Advertisements

High-performance Cortex™-M4 MCU
Intel Pentium 4 ENCM Jonathan Bienert Tyson Marchuk.
Processing Efficiency Jonah Probell Multimedia Systems Engineer Tensilica Truly Understanding Low-Power Multimedia Chip Design.
VADA Lab.SungKyunKwan Univ. 1 L3: Lower Power Design Overview (2) 성균관대학교 조 준 동 교수
Computer Abstractions and Technology
Lecture 2-Berkeley RISC Penghui Zhang Guanming Wang Hang Zhang.
Lecture Objectives: 1)Explain the limitations of flash memory. 2)Define wear leveling. 3)Define the term IO Transaction 4)Define the terms synchronous.
Embedded Software Optimization for MP3 Decoder Implemented on RISC Core Yingbiao Yao, Qingdong Yao, Peng Liu, Zhibin Xiao Zhejiang University Information.
Embedded Systems Programming
Chapter XI Reduced Instruction Set Computing (RISC) CS 147 Li-Chuan Fang.
Source Code Optimization and Profiling of Energy Consumption in Embedded System Simunic, T.; Benini, L.; De Micheli, G.; Hans, M.; Proceedings on The 13th.
Instruction Set Architecture (ISA) for Low Power Hillary Grimes III Department of Electrical and Computer Engineering Auburn University.
The Effect of Data-Reuse Transformations on Multimedia Applications for Different Processing Platforms N. Vassiliadis, A. Chormoviti, N. Kavvadias, S.
EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.
Processor Frequency Setting for Energy Minimization of Streaming Multimedia Application by A. Acquaviva, L. Benini, and B. Riccò, in Proc. 9th Internation.
Jamie Unger-Fink John David Eriksen. Outline Intro to LCDs Power Issues Energy Model New Reduction Techniques Results Conclusion.
BLDC MOTOR SPEED CONTROL USING EMBEDDED PROCESSOR
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
CS 423 – Operating Systems Design Lecture 22 – Power Management Klara Nahrstedt and Raoul Rivas Spring 2013 CS Spring 2013.
Computer performance.
Media File Formats Jon Ivins, DMU. Text Files n Two types n 1. Plain text (unformatted) u ASCII Character set is most common u 7 bits are used u This.
Handheld Devices (portable but still explicit usage) Laptops Personal Digital Assistants (Palm, PocketPC) TabletPC Smart Phones.
Intel Architecture. Changes in architecture Software architecture: –Front end (Feature changes such as adding more graphics, changing the background colors,
Motivation Mobile embedded systems are present in: –Cell phones –PDA’s –MP3 players –GPS units.
Computer Architecture ECE 4801 Berk Sunar Erkay Savas.
Computing Hardware Starter.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Operating Systems Overview: Using Hardware.
Semiconductor Memory 1970 Fairchild Size of a single core –i.e. 1 bit of magnetic core storage Holds 256 bits Non-destructive read Much faster than core.
17 Sep 2002Embedded Seminar2 Outline The Big Picture Who’s got the Power? What’s in the bag of tricks?
CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA
CPU Computer Hardware Organization (How does the computer look from inside?) Register file ALU PC System bus Memory bus Main memory Bus interface I/O bridge.
Telecommunications and Signal Processing Seminar Ravi Bhargava * Lizy K. John * Brian L. Evans Ramesh Radhakrishnan * The University of Texas at.
By Michael Butler, Leslie Barnes, Debjit Das Sarma, Bob Gelinas This paper appears in: Micro, IEEE March/April 2011 (vol. 31 no. 2) pp 마이크로 프로세서.
CLEMSON U N I V E R S I T Y AVR32 Micro Controller Unit Atmel has created the first processor architected specifically for 21st century applications that.
INTRODUCTION Crusoe processor is 128 bit microprocessor which is build for mobile computing devices where low power consumption is required. Crusoe processor.
Introduction of Intel Processors
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
Lecture 2: 8/29/2002CS170 Fall CS170 Computer Organization and Architecture I Ayman Abdel-Hamid Department of Computer Science Old Dominion University.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
Computer Organization & Assembly Language © by DR. M. Amer.
2 Systems Architecture, Fifth Edition Chapter Goals Describe the system bus and bus protocol Describe how the CPU and bus interact with peripheral devices.
AT91 Products Overview. 2 The Atmel AT91 Series of microcontrollers are based upon the powerful ARM7TDMI processor. Atmel has taken these cores, added.
Shih-Fan, Peng 2013 IEE5008 –Autumn 2013 Memory Systems DRAM Controller for Video Application Shih-Fan, Peng Department of Electronics Engineering National.
© GCSE Computing Computing Hardware Starter. Creating a spreadsheet to demonstrate the size of memory. 1 byte = 1 character or about 1 pixel of information.
DSP Architectures Additional Slides Professor S. Srinivasan Electrical Engineering Department I.I.T.-Madras, Chennai –
 GPU Power Model Nandhini Sudarsanan Nathan Vanderby Neeraj Mishra Usha Vinodh
Μ [sic] design constraints wesley :: chris :: dave :: josh.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
CHAPTER 2 Instruction Set Architecture 3/21/
ARM7 Architecture What We Have Learned up to Now.
Chapter 6 System Integration and Performance
Temperature and Power Management
Evaluating Register File Size
Embedded Systems Design
The University of Adelaide, School of Computer Science
Architecture & Organization 1
Computer Organization & Assembly language
Figure 8.1 Architecture of a Simple Computer System.
Architecture & Organization 1
BIC 10503: COMPUTER ARCHITECTURE
STUDY AND IMPLEMENTATION
Figure 8.1 Architecture of a Simple Computer System.
Chapter 1 Introduction.
COMS 361 Computer Organization
ARM ORGANISATION.
Computer Architecture
Embedded Sound Processing : Implementing the Echo Effect
ARM920T Processor This training module provides an introduction to the ARM920T processor embedded in the AT91RM9200 microcontroller.We’ll identify the.
Presentation transcript:

Optimization Of Power Consumption For An ARM7- BASED Multimedia Handheld Device Hoseok Chang; Wonchul Lee; Wonyong Sung Circuits and Systems, ISCAS '03. Proceedings of the 2003 International Symposium on, Volume: 5, May 2003 Pages:V V-108 vol.5 Presenter: Chin-Chi Hu

Chin-Chi Hu 2/ /6/27 Abstract  We have developed a multimedia handheld educational device and optimized the current consumption not only by employing several software optimization techniques but also by using dynamic clock frequency scaling scheme (DFS). Although the ARM7 CPU employed does not support operating voltage scaling, the controlling of the operating frequency helps reducing the current consumption in the idle time and results in up to 25% of power reduction in the system level. The CPU operation frequency is determined by profiling the multimedia program components, which include LZW (Lempel-Ziv Welch) image decompression, MP3 audio decoding, CELP based speech decoding, speech recognition and ADPCM. Especially, it is shown that the time for LZW decompression is proportional to the image size rather than the size of the compressed file. The CPU load becomes almost full, between 80 to 95%, after applying the DFS.

Chin-Chi Hu 3/ /6/27 What’s the problem?  Multi-Tasking operating system and dynamic frequency scaling  analysis the current consumption for system  Software optimization techniques improve software to reduce numbers of instruction and clock cycle  CPU load estimation the CPU load for executing each software components  Results and optimization

Chin-Chi Hu 4/ /6/27 Introduction  A low power multimedia handheld device  only two AA-size batteries  It was needed to optimize DSP programs  MP3 decoding  LZW(Lempel-Ziv Welch) decompression  speech recognition  Aspect  ARM7 specific feature  optimization of software components  lowering the CPU clock frequency  minimizes the idle time

Chin-Chi Hu 5/ /6/27 System architecture  Speaking partner  ARM7TDMI 60MHz CPU  8KB cache  graphic LCD controller  synchronous DRAM controller  IIS interface  8 channel of 10 bit ADC  128KB NOR flash for system ROM  NAND flash and SMC (smart media card) for program ROM  SSFDC (solid state floppy disk card) and USB for read / write

Chin-Chi Hu 6/ /6/27 System architecture Speaking Partner

Chin-Chi Hu 7/ /6/27 Current consumption  The CPU drains some power even when the CPU load is very small although the CPU is mostly in the idle state  It is advantageous for power reduction to use the lowest possible clock frequency.  The estimation of the minimum clock frequency for a real-time implementation is needed

Chin-Chi Hu 8/ /6/27 Current consumption This figure shows that the dynamic frequency scaling scheme is more efficient than the constant frequency operation with idle state when the load condition is low

Chin-Chi Hu 9/ /6/27 Current consumption  Current consumption at each hardware block (CPU load is 10%)

Chin-Chi Hu 10/ /6/27 Software optimization  ARM7TDMI processor has characteristics for implementing DSP algorithms  large number of registers  most of the instructions can be executed conditionally  32 bit barrel shifter  block load and store instructions are supported  ARM7TDMI processor has a relatively simple data path, where the hardware multiplier only has the accuracy of 32*8 bits

Chin-Chi Hu 11/ /6/27 Software optimization  MP3 decoding algorithm  C language based high level optimization  assembly language based low level optimization  optimized by the conditional execution of ARM7TDMI processor

Chin-Chi Hu 12/ /6/27 Software optimization  block data transfer  is used for load (LDM) or store (STM) of any subset of currently visible registers to/from sequential memory  No block data transfer of bit registers  from registers to sequential memory 14S+2N+1I cycles  From registers to memory using the store instruction (STR) (1S+1N+1I)*15  S :sequential cycles  N :non-sequential cycles  I :internal cycles

Chin-Chi Hu 13/ /6/27 Software optimization

Chin-Chi Hu 14/ /6/27 Software optimization  Optimization for speech recognition  16bit multiplications instead of 32 bit multiplications 8% of cycle time reduction  employed several software optimization techniques loop fusion loop unrolling post increment/decrement conversion total execution time is reduced to about 30~45%

Chin-Chi Hu 15/ /6/27 CPU load estimation  The load for MP3 decoding is dependent on the bit rate and sampling clock frequency  The CPU load with 60MHz  56kbps 22.05kHz : 10%  32kbps 22.05kHz : 9.6%  32kbps 16kHz : 7%  The load for CELP decoding is almost constant  18% of the 60MHz CPU load

Chin-Chi Hu 16/ /6/27 CPU load estimation Processing time of LZW according to the number of pixels Processing time of LZW according to the compressed data size

Chin-Chi Hu 17/ /6/27 CPU load estimation Execution time prediction of each software component

Chin-Chi Hu 18/ /6/27 Experimental result 478mA(optimized) / 542(original current) = 88.2%

Chin-Chi Hu 19/ /6/27 Experimental result  No change the clock frequency of the CPU, which would be a more aggressive power optimization approach which paying the delay for PLL relocking

Chin-Chi Hu 20/ /6/27 Concluding  A dynamic frequency scaling scheme is employed in order to reduce the CPU power consumption, which shows that 20% of system power saving can be achieved  The power analysis show that the current consumed at the DRAM is almost equal to that of the CPU core, which means that reducing cache miss is most important for lowering power consumption  The current can be further reduced, without any significant change in the power reduction algorithm  Employ a CPU that supports the dynamic voltage scaling (Intel’s Xscale)