Motivation Mobile embedded systems are present in: –Cell phones –PDA’s –MP3 players –GPS units.

Slides:



Advertisements
Similar presentations
Philips Research ICS 252 class, February 3, The Trimedia CPU64 VLIW Media Processor Kees Vissers Philips Research Visiting Industrial Fellow
Advertisements

DSPs Vs General Purpose Microprocessors
Lecture 4 Introduction to Digital Signal Processors (DSPs) Dr. Konstantinos Tatas.
Intel Pentium 4 ENCM Jonathan Bienert Tyson Marchuk.
VADA Lab.SungKyunKwan Univ. 1 L3: Lower Power Design Overview (2) 성균관대학교 조 준 동 교수
AMD OPTERON ARCHITECTURE Omar Aragon Abdel Salam Sayyad This presentation is missing the references used.
The University of Adelaide, School of Computer Science
Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.
Parallell Processing Systems1 Chapter 4 Vector Processors.
This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.
Embedded Systems Programming
Room: E-3-31 Phone: Dr Masri Ayob TK 2123 COMPUTER ORGANISATION & ARCHITECTURE Lecture 4: Computer Performance.
Processor Technology and Architecture
Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.
Introduction to ARM Architecture, Programmer’s Model and Assembler Embedded Systems Programming.
PowerPC 601 Stephen Tam. To be tackled today Architecture Execution Units Fixed-Point (Integer) Unit Floating-Point Unit Branch Processing Unit Cache.
Embedded Systems Programming
Prardiva Mangilipally
Computer Organization and Assembly language
Eye-RIS. Vision System sense – process - control autonomous mode Program stora.
Engineering 1040: Mechanisms & Electric Circuits Fall 2011 Introduction to Embedded Systems.
Hardware Overview Net+ARM – Well Suited for Embedded Ethernet
1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.
Advanced Computer Architectures
INTRODUCTION TO MICROCONTROLLER. What is a Microcontroller A microcontroller is a complete microprocessor system, consisting of microprocessor, limited.
Computer performance.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
Lecture 18 Lecture 18: Case Study of SoC Design ECE 412: Microcomputer Laboratory.
RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696
1 Homework Reading –None (Finish all previous reading assignments) Machine Projects –Continue with MP5 Labs –Finish lab reports by deadline posted in lab.
A Reconfigurable Processor Architecture and Software Development Environment for Embedded Systems Andrea Cappelli F. Campi, R.Guerrieri, A.Lodi, M.Toma,
SYSTEM-ON-CHIP (SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY.
A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.
CLEMSON U N I V E R S I T Y AVR32 Micro Controller Unit Atmel has created the first processor architected specifically for 21st century applications that.
INTRODUCTION Crusoe processor is 128 bit microprocessor which is build for mobile computing devices where low power consumption is required. Crusoe processor.
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
Classifying GPR Machines TypeNumber of Operands Memory Operands Examples Register- Register 30 SPARC, MIPS, etc. Register- Memory 21 Intel 80x86, Motorola.
ARM for Wireless Applications ARM11 Microarchitecture On the ARMv6 Connie Wang.
Computer Architecture System Interface Units Iolanthe II approaches Coromandel Harbour.
The TM3270 Media-Processor. Introduction Design objective – exploit the high level of parallelism available. GPPs with Multi-media extensions (Ex: Intel’s.
Overview of Super-Harvard Architecture (SHARC) Daniel GlickDaniel Glick – May 15, 2002 for V (Dewar)
Introduction to Microprocessors
CPS 4150 Computer Organization Fall 2006 Ching-Song Don Wei.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
VLSI Algorithmic Design Automation Lab. THE TI OMAP PLATFORM APPROACH TO SOC.
Computer Organization CDA 3103 Dr. Hassan Foroosh Dept. of Computer Science UCF © Copyright Hassan Foroosh 2002.
DSP Architectures Additional Slides Professor S. Srinivasan Electrical Engineering Department I.I.T.-Madras, Chennai –
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
Playstation2 Architecture Architecture Hardware Design.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
Fundamentals of Programming Languages-II
SR: 599 report Channel Estimation for W-CDMA on DSPs Sridhar Rajagopal ECE Dept., Rice University Elec 599.
Architectural Effects on DSP Algorithms and Optimizations Sajal Dogra Ritesh Rathore.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
PRESENTED BY: MOHAMAD HAMMAM ALSAFRJALANI UFL ECE Dept. 3/31/2010 UFL ECE Dept 1 CACHE OPTIMIZATION FOR AN EMBEDDED MPEG-4 VIDEO DECODER.
Low-power Digital Signal Processing for Mobile Phone chipsets
Embedded Systems Design
Architecture & Organization 1
Vector Processing => Multimedia
Introduction to Digital Signal Processors (DSPs)
Architecture & Organization 1
Comparison of Two Processors
VLIW DSP vs. SuperScalar Implementation of a Baseline H.263 Encoder
Nov. 12, 1997 Bob Brodersen ( CS 152 Computer Architecture and Engineering Introduction to Architectures for Digital.
* From AMD 1996 Publication #18522 Revision E
Digital Signal Processors-1
What Choices Make A Killer Video Processor Architecture?
ADSP 21065L.
Presentation transcript:

Motivation Mobile embedded systems are present in: –Cell phones –PDA’s –MP3 players –GPS units

Mobile Computing Design Considerations Low power Real-time data processing Small size Low cost Quick time to market

Metric Introduction Processor specialization Instruction set Interconnect Memory specialization Functional & Data path units Power Specialization

Metric: Processor Specialization Central controlling point of embedded system Examples: –VLIW to perform multiple instructions in parallel. –RISC architecture

Metric: Instruction Set Specialization Introduction of new instructions to extract optimal performance from the processor Examples: –Multiply-accumulate –Vector operations

Metric: Interconnect Provides means for different modules to communicate Optimizations can lead to reduced complexity, cost, and power consumption

Metric: Memory Specialization Specialization is achieved through optimization of number and size of memory banks, number and size of access ports Optimizations can improve performance, power consumption, and chip area

Metric: Functional & Data Path Units Functional units are often specialized hardware units implementing a frequently used software algorithm Examples: –DSP co-processors, interrupt priority co- processors, memory access modules, and timer modules

Metric: Power Specialization Major concern in mobile systems Kept under control by: –Using low voltage –Slow clock speed –Custom circuit solutions

Architectures to be discussed M*CORE D30V/MPEG SuperENC 1.3-GOPS Parallel DSP IA-32 w/ Enhanced Data Streaming

M*CORE Low power embedded applications Wireless mobile devices Cellular phones

M*CORE Processor Specialization Simple RISC architecture 4 stage pipeline 16-bit instruction word length Compiler designed in parallel with architecture Barrel shifter built into ALU

M*CORE Instruction Set Specialization Multimedia instructions –Multiple data transfers from memory to register and register to memory. –Fast register saves FF1 – Find First 1 –Finding highest priority interrupt in hardware

M*CORE Interconnect Specialization 16 – bit data bus to match 16 bit word length –Reduces memory bandwidth, complexity, chip area layout, and power consumption MDI – MCU–to-DSP Interface –Dual access memory messaging unit General I/O bus for a peripherals

M*CORE Memory Specialization Alternate register bank –Fast register saves for context switches

M*CORE Functional & Data Path Units 32 channel programmable interrupt controller Protocol timer DSP core

M*CORE Power Specialization 1.8 Volts Uses 0.5 Watts Power aware pipeline Programmable power states –Stop –Wait –Dose –Normal

M*CORE Summary Low power and programmable power states make it ideal for mobile devices Interface to built in DSP core makes it ideal for cell phone applications

650 MHZ IA-32 Microprocessor designed to accelerate data- streaming applications Three-dimensional graphics Video encode/decode

650 MHZ IA-32 Processor Specialization IA-32 architecture 70 new instructions SIMD floating point data type Improvements in regard to circuit implementation

650 MHZ IA-32 Instruction Set Specialization 70 new instructions –SIMD FP operations –Control for new 8-entry register file –Multimedia extension 12 new integer instructions

650 MHZ IA-32 Interconnect Specialization Front Side Bus of 66, 100, 133 MHz Back Side Bus –Half the clock frequency for mobile and desktop applications –Full clock frequency for server/workstation applications

650 MHZ IA-32 Memory Specialization 3 new non-temporal store instructions with write combining buffers –Burst write protocol –Write data throughput of Gbytes/sec on a 133 MHz bus 4 new data pre-fetch instructions –Overlap, reduces cache miss penalties

650 MHZ IA-32 Functional Specialization 8 entry register file –Reduces register starvation for SIMD unit –128 bits wide four independent single precision elements packed in parallel Dedicated table based lookup unit for reciprocal operations –Completes reciprocal operations in one clock cycle –Error of 1.5 * 2^-12

650 MHZ IA-32 Low Power Usage 1.4 V ~ 2.2 V at 650 MHz close to room temperature

650 MHZ IA-32 Performance 1.5X to 2.0X performance boost for 3-D transform and lighting kernels Real-time MPEG-2 video/audio encoding at 30 frames per second –Achieved through improvement to SIMD unit, at a cost of only 2% increase of unit area size

D30V/MPEG Multimedia applications –Decoding MPEG-2

D30V/MPEG Processor Specialization 2 way VLIW Dual issue RISC pipeline 2 way assigned SIMD module Pipeline has ability to re-route data through execution path

D30V/MPEG Instruction Set Specialization Saturate and Add DSP instructions built in –Modular addressing –Block repeat –Multiply accumulate Half word instructions –Effectively double number of useable registers

D30V/MPEG Interconnect Specialization Chip layout specialized for decoding streaming mpeg data

D30V/MPEG Memory Specialization 32 Kbyte data RAM 64 Kbyte instruction RAM 4 Kbyte RAM for Variable Length Encoder/Decoder (VLC/VLD) tables Special Registers –MOD_S & MOD_E for modulo addressing –RPT_S, RPT_E, and RPT_C for looping

D30V/MPEG Functional Specialization VLC/VLD Variable Length Encoding/Decoding units

D30V/MPEG Low Power Usage 2.5 Volts at 243 MHz Uses 2.0 Watts

D30V/MPEG Performance 12 % speedup from inter-pipe bypasses Special VLC/VLD functional blocks speedup MPEG decoding

1.3 GOPS Parallel DSP Achieve real-time image processing capability Employ data parallelism to achieve goal –High level algorithms, non-parallelizable Arithmetic encoding –Medium level algorithms, medium parallelizable Contour tracking of binary images –Low level algorithms, high parallelizable Filters and transforms Data independent control and data flow 80 % of MPEG-2, 60% of MPEG-4

1.3 GOPS Parallel DSP Processor Specialization Central control unit –RISC based –Controls multiple SIMD units

1.3 GOPS Parallel DSP Instruction Set Specialization VLIW instructions –3 instructions per issue 1 load/store 16 bit data 2 arithmetic operations on 16/32 bit data

1.3 GOPS Parallel DSP Interconnect Specialization DMA/MCU (Direct Memory Access/Memory Control Unit) –Handles cache misses –Performs prefetch operations from matrix memory –Interfaces with external 64 bit data bus and 32 bit address bus for SRAM and DRAM modules

1.3 GOPS Parallel DSP Memory Specialization Memory tailored to image processing needs –Provides parallel high bandwidth access to shared data with matrix shaped access patterns Individual Cache Memory –Services irregular memory requests

1.3 GOPS Parallel DSP Functional Specialization Multiple SIMD units –Currently 4 units for prototype –16 units planned for future versions –SIMD approach has been extended with ASIMD, autonomous instruction selection capability Improves handling of conditional branches

1.3 GOPS Parallel DSP Low Power Usage 3.3 Volts Using 650 milliwatts

1.3 GOPS Summary Sustained performance 380 MIPS –Around 90% utilization

SuperENC MPEG-2 video encoder

SuperENC Processor Specialization Software implemented RISC architecture –5 stage pipeline –81 MHz, 32 bit wide data/instruction path Software implemented SIMD/SDIF (SDRAM Interface) modules

SuperENC Instruction Set Specialization There is no instruction set specialization mentioned in the paper.

SuperENC Interconnect Specialization SDIF –All memory access goes through SDIF –Relay data without going to external memory Reduces memory bandwidth and power consumption

SuperENC Memory Specialization Uses external RAM –Can access two 16 Mbit SDRAMS or one 64 Mbit SDRAM

SuperENC Functional Specialization MPEG algorithm is broken up into hardware functional blocks –Example DCT, Discrete Cosine Transfer IDCT, Inverse Discrete Cosine Transfer ME. Motion Estimation MC, Motion Compensation

SuperENC Low Power Usage 2.5 Volts internal 3.3 Volts I/O 1.5 Watts

SuperENC Summary SuperENC makes use of many hardware functional blocks to implement the MPEG decoding algorithm

Metric Results D30V/MPEG highest rated