Survey of Digital Signal Processors Michael Warner ECD: VLSI Communication Systems.

Slides:



Advertisements
Similar presentations
Micro controllers introduction. Areas of use You are used to chips like the Pentium and the Athlon, but in terms of installed machines these are a small.
Advertisements

Is There a Real Difference between DSPs and GPUs?
DSPs Vs General Purpose Microprocessors
Lecture 4 Introduction to Digital Signal Processors (DSPs) Dr. Konstantinos Tatas.
Intel Pentium 4 ENCM Jonathan Bienert Tyson Marchuk.
VADA Lab.SungKyunKwan Univ. 1 L3: Lower Power Design Overview (2) 성균관대학교 조 준 동 교수
Survey of Digital Signal Processors Michael Warner ECD: VLSI Communication Systems.
Processor System Architecture
Khaled A. Al-Utaibi  Computers are Every Where  What is Computer Engineering?  Design Levels  Computer Engineering Fields  What.
C66x CorePac: Achieving High Performance. Agenda 1.CorePac Architecture 2.Single Instruction Multiple Data (SIMD) 3.Memory Access 4.Pipeline Concept.
1 Microprocessor-based Systems Course 4 - Microprocessors.
Embedded Systems Programming
Processor Technology and Architecture
1 Architectural Analysis of a DSP Device, the Instruction Set and the Addressing Modes SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software.
Introduction to ARM Architecture, Programmer’s Model and Assembler Embedded Systems Programming.
RISC. Rational Behind RISC Few of the complex instructions were used –data movement – 45% –ALU ops – 25% –branching – 30% Cheaper memory VLSI technology.
Computer Organization and Assembly language
Using Programmable Logic to Accelerate DSP Functions 1 Using Programmable Logic to Accelerate DSP Functions “An Overview“ Greg Goslin Digital Signal Processing.
Micro controllers A self-contained system in which a processor, support, memory, and input/output (I/O) are all contained in a single package.
Computer performance.
Computer Organization & Assembly Language
Department of Computer and Information Science, School of Science, IUPUI Dale Roberts, Lecturer Computer Science, IUPUI CSCI.
Real time DSP Professors: Eng. Julian Bruno Eng. Mariano Llamedo Soria.
Motivation Mobile embedded systems are present in: –Cell phones –PDA’s –MP3 players –GPS units.
Semiconductor Memory 1970 Fairchild Size of a single core –i.e. 1 bit of magnetic core storage Holds 256 bits Non-destructive read Much faster than core.
Basic Microcomputer Design. Inside the CPU Registers – storage locations Control Unit (CU) – coordinates the sequencing of steps involved in executing.
Basics and Architectures
RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696
CLEMSON U N I V E R S I T Y AVR32 Micro Controller Unit Atmel has created the first processor architected specifically for 21st century applications that.
Chapter 2 The CPU and the Main Board  2.1 Components of the CPU 2.1 Components of the CPU 2.1 Components of the CPU  2.2Performance and Instruction Sets.
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
RISC By Ryan Aldana. Agenda Brief Overview of RISC and CISC Features of RISC Instruction Pipeline Register Windowing and renaming Data Conflicts Branch.
ARM for Wireless Applications ARM11 Microarchitecture On the ARMv6 Connie Wang.
RISC Architecture RISC vs CISC Sherwin Chan.
Computer Organization - 1. INPUT PROCESS OUTPUT List different input devices Compare the use of voice recognition as opposed to the entry of data via.
Computer Organization & Assembly Language © by DR. M. Amer.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
THE BRIEF HISTORY OF 8085 MICROPROCESSOR & THEIR APPLICATIONS
Computer and Information Sciences College / Computer Science Department CS 206 D Computer Organization and Assembly Language.
Chapter 5: Computer Systems Design and Organization Dr Mohamed Menacer Taibah University
Succeeding with Technology Chapter 2 Hardware Designed to Meet the Need The Digital Revolution Integrated Circuits and Processing Storage Input, Output,
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
Fundamentals of Programming Languages-II
RISC / CISC Architecture by Derek Ng. Overview CISC Architecture RISC Architecture  Pipelining RISC vs CISC.
1 Basic Processor Architecture. 2 Building Blocks of Processor Systems CPU.
Lecture # 10 Processors Microcomputer Processors.
1 x86 Programming Model Microprocessor Computer Architectures Lab Components of any Computer System Control – logic that controls fetching/execution of.
Computer Organization Exam Review CS345 David Monismith.
Introduction to Computers - Hardware
William Stallings Computer Organization and Architecture 6th Edition
CS 352H: Computer Systems Architecture
ARM Embedded Systems
ECE354 Embedded Systems Introduction C Andras Moritz.
Microarchitecture.
Embedded Systems Design
Advanced Topic: Alternative Architectures Chapter 9 Objectives
Architecture & Organization 1
Basic Computer Organization
Introduction to Pentium Processor
Introduction to Digital Signal Processors (DSPs)
Architecture & Organization 1
BIC 10503: COMPUTER ARCHITECTURE
Microprocessor & Assembly Language
Morgan Kaufmann Publishers Computer Organization and Assembly Language
Computer Evolution and Performance
What is Computer Architecture?
Introduction to Microprocessor Programming
COMS 361 Computer Organization
ADSP 21065L.
Presentation transcript:

Survey of Digital Signal Processors Michael Warner ECD: VLSI Communication Systems

Agenda  Industry Trends  DSP Architecture  DSP Micro-Architecture  DSP Systems

Agenda  Industry Trends  DSP Architecture  DSP Micro-Architecture  DSP Systems

Moore’s Law Drives Processor Development ™ 486™ Pentium ® Pentium ® II Pentium ® III Pentium ® 4 Itanium ® Transistors per Die Itanium ® Data (Moore) Microprocessor ‘60‘65‘70‘75‘80‘85‘90‘95‘00‘05‘10 Source: Intel internal Doubling the number of transistors every at same price point drives significant product opportunities …especially if you have little regard for power But what if energy-delay had to be reduced every generation by an order of magnitude?

Gene’s Law Drives DSP Development Gene’s Law DSP Power 1, mW/MIPS Year Gene’s Law will have it’s challenges to hold the line!

Digital Audio u MP3 u Real Audio Streaming Video u MPEG 4 u H.263 Connectivity u Internet u Bluetooth Modem Standards TXN UPX 12 3/4 u UMTS u GMS BuyNow? Yes No What’s Driving Gene’s Law?

DSP Design Constraints Technology (uM) Transistors MIPS RAM (bytes) Power (mW/MIPS) Price/MIPS 3 50K $ K 40 2K 12.5 $ M 5,000 3M 0.1 $ DEVICE CAPABILITIES

Agenda  Industry Trends  DSP Architecture  DSP Micro-Architecture  DSP Systems

What Makes a DSP a DSP?  Single-Cycle MAC  Multiple Execution Units  High Bandwidth (Flat) Memory Sub-Systems  Efficient Zero-Overhead Looping  Short Pipeline  High Bandwidth I/O  Specialized Instruction Sets  Sophisticated DMA  Little to No Speculation

Single Cycle MAC  MAC’s Typically Determine DSP Performance and Pipeline Length (EX)  Most DSP’s Have 2-8 MAC Units  MAC’s Typically Operate in Both a Scalar and Vector Mode

Multiple Instruction Units  VLIW Architectures Driving ILP  Typically Instruction Units  M-Unit - MAC  S-Unit - Shift  L-Unit - ALU  D-Unit – Load/Store  Industry Has Converged on a ILP of ~8 DDATA_I2 (load data) D2 DS1S2 M1 DS1S2 D1 DS1S2 DDATA_I1 (load data) 2X1X L 1L 1S1 S2 DL SL DDL S2S1 D M2L2S2 D DL SL DDL S2S1 S2 D S1 Registers B0 - B15Registers A0 - A15

High Bandwidth Memory Sub-Systems  Multiple Load-Store Units Required to Feed Data Path  Tightly Coupled Memory is Typically Dual Ported  Harvard Architecture is Heavily Banked Central Arithmetic Logic Unit EXTERNAL MEMORY MUXMUXMUXMUX INTERNAL MEMORY MUXESMUXESMUXESMUXES P ALU SHIFTER B MAC A PCCNTL E C D ARs

Specialized Instruction Sets  Base RISC ISA Plus CISC ISA Driven by End Application  MAC  SAD  LMS  FIRS  Viterbi  Support For Both Scalar and Vector Instructions  Support For 8, 16 and 32-Bit Instructions  Instructions are Highly Orthogonal

Scalar (55x) vs VLIW (64x)  Scalar DSP’s Tend to be More CISC Like  Hurts Compiler Performance  Improves Energy-Delay  Improves Code Density  Limits Top End Performance  VLIW DSP’s Tend to be More RISC Like  RISC + GP Regs + Orthogonality Makes For a Good C Compiler  Assembler Code Is Challenging  RISC ISA Allows for Higher Frequencies  Load-Store Hurts Energy-Delay

TMS320C54x

TMS320C54x Protected Pipeline CYCLES P 1 D1D1 F2F2 P3P3 A1A1 D2D2 F3F3 P4P4 R1R1 A2A2 D3D3 F4F4 P5P5 X1X1 P6P6 R2R2 A3A3 D4D4 F5F5 F6F6 X2X2 R3R3 A4A4 D5D5 F1F1 P2P2 D6D6 X3X3 R4R4 A5A5 A6A6 X4X4 R5R5 R6R6 X5X5 X6X6 Fully loaded pipeline Note: Protected Pipeline Limits Micro-Architectural Flexibility and Performance Prefetch: Calculate address of instruction Fetch: Collect instruction Decode: Interpret instruction Access: Collect address of operand Read: Collect operand Execute: Perform operation

TMS320C6xx Arithmetic Logic Unit Auxiliary Logic Unit Multiplier Unit ’C6xx CPU Core Data Path 1 D1M1S1L1 A Register File Data Path 2 L2S2M2D2 B Register File Instruction Decode Instruction Dispatch Program Fetch Interrupts Control Registers Control Logic Emulation Test

TMS320C6xx Exposed Pipeline Fetch PGPSPWPRDPDCE1E2E3E4E5 DecodeExecute Execute Packet 1  Fetch  PGProgram Address Generate  PS Program Address Send  PWProgram Access Ready Wait  PRProgram Fetch Packet Receive  Decode  DPInstruction Dispatch  DCInstruction Decode  Execute  E1 - E5 Execute 1 through Execute 5 PGPSPWPRDPDCE1E2E3E4E5 Execute Packet 2 PGPSPWPRDPDCE1E2E3E4E5 Execute Packet 3 PGPSPWPRDPDCE1E2E3E4E5 Execute Packet 4 PGPSPWPRDPDCE1E2E3E4E5 Execute Packet 5 PGPSPWPRDPDCE1E2E3E4E5 Execute Packet 6 PGPSPWPRDPDCE1E2E3E4E5 Execute Packet 7 PGPSPWPRDPDCE1E2E3E4E5 Note: Exposed Pipeline Adds Risk to Programming Model

Agenda  Industry Trends  DSP Architecture  DSP Micro-Architecture  DSP Systems

Micro-Architectural Challenges  Accessing (Flat) On Chip Memory At Speed Within 2-3 cycles  Feeding Multiple Functional Units From a Single Register File  Running 600Mhz+ with a 7-9 Stage Pipeline  Linking Multiple Functional Units with Result Forwarding  Implementing CISC Data-path to Meet Area and Performance Goals  Achieving ARM Like Code Density

What Does and Doesn’t Work?  Do  Banked Memory  Dual Access Memory  Full Custom Register Files  Split/Multiple Register Files  Custom/Semi-Custom Data-paths  Variable Length Instructions  CISC ISA  Co-Processors  Multi-Core  Don’t  Multi-Level Caches  Super-Scalar  VLIW Packet Descriptors  Speculative Branching  Full Synthesis  Dynamic Logic  Consider  Multi-Threading  uP with Co-Processors

Agenda  Industry Trends  DSP Architecture  DSP Micro-Architecture  DSP Systems

DSP Systems TMS320C MHz Viterbi and Turbo hardware accelerators Wireless Infrastructure TMS320C MHz Viterbi and Turbo hardware accelerators Wireless Infrastructure 6 DSP 300MHz 3MB integrated memory 180M transistors TMS320C5561 Wired Infrastructure TMS320C MHz Viterbi and Turbo hardware accelerators Wireless Infrastructure 6 DSP 300MHz 3MB integrated memory 180M transistors TMS320C5561 Wired Infrastructure OMAP5910 DSP+GPP Low power consumption Voice, data, video Wireless Client TMS320C MHz Viterbi and Turbo hardware accelerators Wireless Infrastructure 6 DSP 300MHz 3MB integrated memory 180M transistors TMS320C5561 Wired Infrastructure OMAP5910 DSP+GPP Low power consumption Voice, data, video Wireless Client Digital Still Camera TMS320DM310 DSP+GPP Imaging accelerators TMS320C MHz Viterbi and Turbo hardware accelerators Wireless Infrastructure 6 DSP 300MHz 3MB integrated memory 180M transistors TMS320C5561 Wired Infrastructure OMAP5910 DSP+GPP Low power consumption Voice, data, video Wireless Client TMS320DM310 DSP+GPP Imaging accelerators Digital Still Camera 225 MHz Floating point TMS320DA610 Performance Audio TMS320C MHz Viterbi and Turbo hardware accelerators Wireless Infrastructure 6 DSP 300MHz 24Mb integrated memory 180M transistors TMS320C5561 Wired Infrastructure OMAP5910 DSP+GPP Low power consumption Voice, data, video Wireless Client TMS320DM310 DSP+GPP Imaging accelerators Digital Still Camera 225 MHz Floating point TMS320DA610 Performance Audio

VIOP Platform  TNETV3010 Features  6 C55x 300 MHz  Shared Instruction Memory  Broadcast DMA  24M Bits of On Chip SRAM

DaVinci Platform

OMAP Platform  OMAP2420 Features  ARM 330 MHz, VFP (Vector Floating Point), 32K/32K I/Dcache  220 MHz  2D/3D graphics accelerator  IVA supports still images to >4 Mpixels, 30 fps VGA video decode  Output to TV for gaming and video playback  Encryption hardware for DRM and security ARM11 + VFP 2D/3D Graphics Accelerator Camera I/F Memory Controller Peripherals L4 Interconnect Imaging & Video Accelerator (IVA) Internal SRAM OMAP2420 LCD I/F Video Out L3 Interconnect TMS320C55x DSP Security