L27:Lower Power Algorithm for Multimedia Systems 1999. 8 성균관대학교 조 준 동

Slides:



Advertisements
Similar presentations
DSPs Vs General Purpose Microprocessors
Advertisements

VADA Lab.SungKyunKwan Univ. 1 L3: Lower Power Design Overview (2) 성균관대학교 조 준 동 교수
High Performance Embedded Computing © 2007 Elsevier Lecture 15: Embedded Multiprocessor Architectures Embedded Computing Systems Mikko Lipasti, adapted.
 Understanding the Sources of Inefficiency in General-Purpose Chips.
Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.
Hardware/Software Integration in Portable Systems Trevor Pering University of California Berkeley.
Optimization Of Power Consumption For An ARM7- BASED Multimedia Handheld Device Hoseok Chang; Wonchul Lee; Wonyong Sung Circuits and Systems, ISCAS.
Processor Frequency Setting for Energy Minimization of Streaming Multimedia Application by A. Acquaviva, L. Benini, and B. Riccò, in Proc. 9th Internation.
Low Power Design of Integrated Systems Assoc. Prof. Dimitrios Soudris
Getting Started With DSP A. What is DSP? B. Which TI DSP do I use? Highest performance C6000 Most power efficient C5000 Control optimized C2000 TMS320C6000™
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.
CS 423 – Operating Systems Design Lecture 22 – Power Management Klara Nahrstedt and Raoul Rivas Spring 2013 CS Spring 2013.
Computer performance.
An Energy-Efficient Reconfigurable Multiprocessor IC for DSP Applications Multiple programmable VLIW processors arranged in a ring topology –Balances its.
L29:Lower Power Embedded Architecture Design 성균관대학교 조 준 동 교수,
6.893: Advanced VLSI Computer Architecture, September 28, 2000, Lecture 4, Slide 1. © Krste Asanovic Krste Asanovic
1 Copyright © 2011, Elsevier Inc. All rights Reserved. Appendix E Authors: John Hennessy & David Patterson.
Introduction to Interconnection Networks. Introduction to Interconnection network Digital systems(DS) are pervasive in modern society. Digital computers.
Motivation Mobile embedded systems are present in: –Cell phones –PDA’s –MP3 players –GPS units.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
ENGG 6090 Topic Review1 How to reduce the power dissipation? Switching Activity Switched Capacitance Voltage Scaling.
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
Low-Power Wireless Sensor Networks
Architectures for mobile and wireless systems Ese 566 Report 1 Hui Zhang Preethi Karthik.
SYSTEM-ON-CHIP (SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY.
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
Logic Synthesis for Low Power(CHAPTER 6) 6.1 Introduction 6.2 Power Estimation Techniques 6.3 Power Minimization Techniques 6.4 Summary.
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
Section 10: Advanced Topics 1 M. Balakrishnan Dept. of Comp. Sci. & Engg. I.I.T. Delhi.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
J. Christiansen, CERN - EP/MIC
L28:Lower Power Algorithm for Multimedia Systems(2) 성균관대학교 조 준 동
L11: Lower Power High Level Synthesis(2) 성균관대학교 조 준 동 교수
ARM for Wireless Applications ARM11 Microarchitecture On the ARMv6 Connie Wang.
RICE UNIVERSITY “Joint” architecture & algorithm designs for baseband signal processing Sridhar Rajagopal and Joseph R. Cavallaro Rice Center for Multimedia.
Design of a High-Throughput Low-Power IS95 Viterbi Decoder Xun Liu Marios C. Papaefthymiou Advanced Computer Architecture Laboratory Electrical Engineering.
Computer Organization & Assembly Language © by DR. M. Amer.
ATtiny23131 A SEMINAR ON AVR MICROCONTROLLER ATtiny2313.
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Under-Graduate Project Improving Timing, Area, and Power Speaker: 黃乃珊 Adviser: Prof.
Development of Programmable Architecture for Base-Band Processing S. Leung, A. Postula, Univ. of Queensland, Australia A. Hemani, Royal Institute of Tech.,
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
Implementing algorithms for advanced communication systems -- My bag of tricks Sridhar Rajagopal Electrical and Computer Engineering This work is supported.
A Programmable Single Chip Digital Signal Processing Engine MAPLD 2005 Paul Chiang, MathStar Inc. Pius Ng, Apache Design Solutions.
What is a Microprocessor ? A microprocessor consists of an ALU to perform arithmetic and logic manipulations, registers, and a control unit Its has some.
Chapter 5: Computer Systems Design and Organization Dr Mohamed Menacer Taibah University
Class Report 林常仁 Low Power Design: System and Algorithm Levels.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
SR: 599 report Channel Estimation for W-CDMA on DSPs Sridhar Rajagopal ECE Dept., Rice University Elec 599.
VADA Lab.SungKyunKwan Univ. 1 L5:Lower Power Architecture Design 성균관대학교 조 준 동 교수
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
L9 : Low Power DSP Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab.
1 load [2], [9] Transfer contents of memory location 9 to memory location 2. Illegal instruction.
Sridhar Rajagopal Bryan A. Jones and Joseph R. Cavallaro
Lower Power and Deep Submicron VLSI Design
Low-power Digital Signal Processing for Mobile Phone chipsets
SECTIONS 1-7 By Astha Chawla
Embedded Systems Design
Architecture & Organization 1
Chapter 1: Introduction
Architecture & Organization 1
VLIW DSP vs. SuperScalar Implementation of a Baseline H.263 Encoder
Today’s agenda Hardware architecture and runtime system
Overheads for Computers as Components 2nd ed.
Operating System Concepts
A High Performance SoC: PkunityTM
Nov. 12, 1997 Bob Brodersen ( CS 152 Computer Architecture and Engineering Introduction to Architectures for Digital.
Computer Evolution and Performance
Presentation transcript:

L27:Lower Power Algorithm for Multimedia Systems 성균관대학교 조 준 동

Contents Algorithmic Effects on Low Power Low Power Management Low Power Applications –Low Power Video Processor –Single Chip Video Camera –Vector Quantization –Data Encoding –CDMA Searcher –Viterbi Decoder

Low Power Algorithm

Algorithm Selection Example: 8x8 matrix DCT

Strength Reduction: DIGLOG multiplier 1st Iter 2nd Iter 3rd Iter Worst-case error -25% -6% -1.6% Prob. of Error<1% 10% 70% 99.8% With an 8 by 8 multiplier, the exact result can be obtained at a maximum of seven iteration steps (worst case)

Logarithmic Number System --> Significant Strength Reduction

Switching Activity Reduction (a) Average activity in a multiplier as a function of the constant value (b) A parallel and serial implementations of an adder tree.

System-Level Solutions System management, System partitioning, Algorithm selection Precompute physical capacitance of Interconnect and switching activity (number of bus accesses) Regularity: to minimize the power in the control hardware and the interconnection network. Modularity: to exploit data locality through distributed processing units, memories and control. –Spatial locality: an algorithm can be partitioned into natural clusters based on connectivity – Temporal locality:average lifetimes of variables (less temporal storage, probability of future accesses referenced in the recent past). Few memory references: since references to memories are expensive in terms of power.

System-Level Solutions - cont. Simulator: Instruction-level Energy Estimation Software: Energy Efficient Algorithms OS: Voltage Scheduling Algorithms OS: Multiprocessing for Energy Microprocessor: Dynamic Caches

Processor Systems:high Power Thinkpad (Pentium)  0.3 Hours/AA InfoPad (ARM)  0.8 Hours/AA Toshiba Portable (486)  0.9 Hours/AA Newton (ARM)  2.0 Hours/AA Operations per Battery Life: Minimize Energy Consumed per Operation Operations per Second: Maximize Throughput  Operations/ second

DPM vs SPM DPM (Dynamic Power Management): stops the clock switching of a specific unit generated by clock generators. SPM (Static Power Management): When the system remains idle for a significant period time, then it is shut-down. Identify power hungry modules and look for opportunities to reduce power

V dd vs Delay Use Variable Voltage Scaling or Scheduling for Real-time Processing Use architecture optimization to compensate for slower operation, e.g., Parallel Processing and Pipelining for concurrent increasing and critical path reducing. Scale down device sizes to compensate for delay (Interconnects do not scale proportionately and can become dominant)

Power PC 603 Strategy Baseline: use right supply and right frequency to each part of the system If one has to wait on the occurence of some input, only a small circuit could wait and wake-up the main circuit when the input occurs. PowerPC 603 is a 2-issue (2 instructions read at a time) with 5 parallel Execution units. 4 modes: – Full on mode for full speed –Doze mode in which the execution units are not running –Nap mode which also stops the bus clocking and the Sleep mode which stops the clock generator –Sleep mode which stops the clock generator with or without the PLL (20-100mW).

Power PC 603 Power Management

TI Structures Two DSPs: TMS320C541, TMS320C542 reduce power and chip count and system cost for wireless communication applications C54X DSPs, 2.7V, 5V, Low-Power Enhanced Architecture DSP (LEAD) family: Three different power down modes, these devices are well-suited for wireless communications products such as digital cellular phones, personal digital assistants, and wireless modem,low power on voice coding and decoding The TMS320LC548 features: –15-ns (66 MIPS) or 20-ns (50 MIPS) instruction cycle times – 3.0- and 3.3-V operation 32K 16-bit words of RAM and 2K 16-bit words of boot ROM on-chip Integrated Viterbi accelerator that reduces Viterbi butterfly update in four instruction cycles for GSM channel decoding Powerful single-cycle instructions (dual operand, parallel instructions, conditional instructions)

InfoPad Architecture, UC-Berkeley Speech Recognizer “PadServer” Wireless Basestation InfoPad Maintain state in the network, not on the Pad Transmit audio and raw bitmaps across the wireless link Web Browser Internet Example: Hand-held speech-enabled web-browser Perform all computation in the network to minimize client energy dissipation

InfoPad Hardware Flexibility Only header sent to microprocessor 10 MIPS μProcessor Control Statistics Reliability Debugging Entire packet routed to dedicated hardware RX Packet Packet Header Frame- buffer update Embedded software responsible for high-level functions Main data-flow handled by custom low-power ASICs Radio Frame Buffer Use hardware/software integration to provide energy-efficient high-level functionality

Multimedia I/O Terminal.

Multimedia I/O terminal

InfoPad Evolution Total Power: ~7 W Where did the power go? No local computation? Commercial radios Commercial DC/DC Inefficient implementation Intercom Energy- Efficient Processors InfoPad High-level system design optimizes complete solution and drives new research

Power-Down Techniques

Low Power Memory