2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items.

Slides:



Advertisements
Similar presentations
Is There a Real Difference between DSPs and GPUs?
Advertisements

DSPs Vs General Purpose Microprocessors
Lecture 4 Introduction to Digital Signal Processors (DSPs) Dr. Konstantinos Tatas.
Computer Organization and Architecture
CSCI 4717/5717 Computer Architecture
Intel Pentium 4 ENCM Jonathan Bienert Tyson Marchuk.
INSTRUCTION SET ARCHITECTURES
1 Lecture 5: Static ILP Basics Topics: loop unrolling, VLIW (Sections 2.1 – 2.2)
Processor Architecture Needed to handle FFT algoarithm M. Smith.
This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.
1 Advanced Computer Architecture Limits to ILP Lecture 3.
1 Lecture 10: Static ILP Basics Topics: loop unrolling, static branch prediction, VLIW (Sections 4.1 – 4.4)
Khaled A. Al-Utaibi  Computers are Every Where  What is Computer Engineering?  Design Levels  Computer Engineering Fields  What.
1 Analog Devices TigerSHARC® DSP Family Presented By: Mike Lee and Mike Demcoe Date: April 8 th, 2002.
Blackfin ADSP Versus Sharc ADSP-21061
What are the characteristics of DSP algorithms? M. Smith and S. Daeninck.
Systematic development of programs with parallel instructions SHARC ADSP2106X processor M. Smith, Electrical and Computer Engineering, University of Calgary,
This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.
Software and Hardware Circular Buffer Operations First presented in ENCM There are 3 earlier lectures that are useful for midterm review. M. R.
ENCM 515 Review talk on 2001 Final A. Wong, Electrical and Computer Engineering, University of Calgary, Canada ucalgary.ca.
CACHE-DSP Tool How to avoid having a SHARC thrashing on a cache-line M. Smith, University of Calgary, Canada B. Howse, Cell-Loc, Calgary, Canada Contact.
2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items.
6/3/20151 ENCM515 Comparison of Integer and Floating Point DSP Processors M. Smith, Electrical and Computer Engineering, University of Calgary, Canada.
Generation of highly parallel code for TigerSHARC processors An introduction This presentation will probably involve audience discussion, which will create.
Alyssa Concha Microprocessors Final Project ADSP – SHARC Digital Signal Processor.
RISC. Rational Behind RISC Few of the complex instructions were used –data movement – 45% –ALU ops – 25% –branching – 30% Cheaper memory VLSI technology.
TigerSHARC processor General Overview. 6/28/2015 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada 2 Concepts tackled Introduction to.
Unit -II CPU Organization By- Mr. S. S. Hire. CPU organization.
Cisc Complex Instruction Set Computing By Christopher Wong 1.
Advanced Computer Architectures
Real time DSP Professors: Eng. Julian Bruno Eng. Mariano Llamedo Soria.
Processor Architecture Needed to handle FFT algoarithm M. Smith.
Data Representation By- Mr. S. S. Hire. Data Representation.
Computer architecture Lecture 11: Reduced Instruction Set Computers Piotr Bilski.
DSP Processors We have seen that the Multiply and Accumulate (MAC) operation is very prevalent in DSP computation computation of energy MA filters AR filters.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
Overview of Super-Harvard Architecture (SHARC) Daniel GlickDaniel Glick – May 15, 2002 for V (Dewar)
Computer Architecture 2 nd year (computer and Information Sc.)
DSP Architectures Additional Slides Professor S. Srinivasan Electrical Engineering Department I.I.T.-Madras, Chennai –
Systematic development of programs with parallel instructions SHARC ADSP21XXX processor M. Smith, Electrical and Computer Engineering, University of Calgary,
Embedded Systems Design
Digital Signal Processors
Instruction Level Parallelism and Superscalar Processors
Subject Name: Digital Signal Processing Algorithms & Architecture
TigerSHARC processor General Overview.
Microcoded CCU (Central Control Unit)
Program Flow on ADSP2106X SHARC Pipeline issues
EE 445S Real-Time Digital Signal Processing Lab Spring 2014
Overview of SHARC processor ADSP and ADSP-21065L
Overview of SHARC processor ADSP Program Flow and other stuff
Trying to avoid pipeline delays
Comparing 68k (CISC) with 21k (Superscalar RISC DSP)
This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.
Comparing 68k (CISC) with 21k (Superscalar RISC DSP)
* 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items.
* From AMD 1996 Publication #18522 Revision E
* M. R. Smith 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint.
This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.
Getting serious about “going fast” on the TigerSHARC
Explaining issues with DCremoval( )
Introduction to Microprocessor Programming
Digital Signal Processors-1
Overview of SHARC processor ADSP-2106X Compute Operations
Overview of SHARC processor ADSP-2106X Compute Operations
Overview of SHARC processor ADSP-2106X Memory Operations
Understanding the TigerSHARC ALU pipeline
This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.
Lecture 5: Pipeline Wrap-up, Static ILP
ADSP 21065L.
* M. R. Smith 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint.
Presentation transcript:

2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during your presentation In Slide Show, click on the right mouse button Select “Meeting Minder” Select the “Action Items” tab Type in action items as they come up Click OK to dismiss this box This will automatically create an Action Item slide at the end of your presentation with your points entered. Processor Requirements needed to optimize DSP performance M. R. Smith, Electrical and Computer Engineering, University of Calgary, Alberta, Canada ucalgary.ca

2000/03/05 ENCM Characteristics needed in DSP processors Copyright 2 / 34 To be tackled today Characteristics of DSP algorithms Specialized handling of Multiplication Division (21K has no division instruction) ENCM515 Reference Material How RISCy Is DSP, IEEE Micro (Jan-10) Simply Signal Processing (Jan-40) Fast Scaling, CCI (Apr-10) Saturation Arithmetic (Apr-20)

2000/03/05 ENCM Characteristics needed in DSP processors Copyright 3 / 34 DSP Algorithms DSP algorithms require specialized features on processors Processors are a compromise speed, cost, silicon When have you as a designer found a compromise that meets your requirements? As a consultant may have to add DSP characteristics to an existing system or add DSP coprocessor to an existing system

2000/03/05 ENCM Characteristics needed in DSP processors Copyright 4 / 34 FIR Multiply/Addition intensive Sum operation with high precision -- overflow considerations Long simple loop Online operation -- “infinite” amount of data Store coefficients on-chip for fast access Complex domain arithmetic

2000/03/05 ENCM Characteristics needed in DSP processors Copyright 5 / 34 IIR-1 Interrelated and order dependent multiplications and additions Small number of delays via register moves? short loop -- low number of instructions in loop which makes it difficult to optimize Precision -- very important because of feedback Multiple stages -- I.e. IIR follows IIR etc

2000/03/05 ENCM Characteristics needed in DSP processors Copyright 6 / 34 IIR-2 LDI Short complicated loop Many intermediate values Pipeline issues because of interdependence

2000/03/05 ENCM Characteristics needed in DSP processors Copyright 7 / 34 FFT Complex variables (A and B) and fixed coefficients (W) Address calculations complex Memory accesses numerable Multiplication and additions Need for fast access to many registers, address pointers, constants, variables

2000/03/05 ENCM Characteristics needed in DSP processors Copyright 8 / 34 Fast instruction cycle -- needed DSP chips -- two cycle instructions (on top of FETCH/DECODE) during which the processor performs many parallel operations More recent technology -- 1 clock cycle Many processors takes 6 to 32 cycles to handle MULT, FMULT, FDIV or even FADD Make processor highly pipelined -- pipeline must be started and then kept full FIR (easy to pipeline) IIR (hard to pipeline) FFT (challenging to pipeline)

2000/03/05 ENCM Characteristics needed in DSP processors Copyright 9 / 34 Loop Overhead -- must be minimized Use specialized hardware specialized decrement and branch instructions occurring in a single cycle instruction cached with counter superscalar operations delayed branches hardware loop control Use specialized software techniques loop unrolling down counting loops

2000/03/05 ENCM Characteristics needed in DSP processors Copyright 10 / 34 Memory operations -- Many of them Data/instruction and data/data conflicts Data caches Will also have external data memory banks Harvard architecture branch target caches multi-ported memory register pre-forwarding -- avoid stalls while trying to write back result of ALU operation only to re-- access the same register large register banks -- avoid memory ops associated with just calculated values

2000/03/05 ENCM Characteristics needed in DSP processors Copyright 11 / 34 Precision -- high but without speed loss FIR -- accumulated value can grow big IIR -- recursive use of a value External Memory bus width Internal Memory bus width Data width of registers and ALU Saturation arithmetic

2000/03/05 ENCM Characteristics needed in DSP processors Copyright 12 / 34 Saturation Arithmetic For full discussion see 21K SHARC user manual and also “Being Assertive with your processor” (APR-20) Internal register 80 bits but external busses only 32 wide 0xFFFF F stored as F xFFFF stored as (normal math) stored as (saturation) Can be good solution (FIR) or bad solution (IIR) to the problem of overflow

2000/03/05 ENCM Characteristics needed in DSP processors Copyright 13 / 34 Complex arithmetic -- frequency domain operations Need to fetch real and imaginary parts in at different times during the algorithm Need fast access to adjacent memory locations -- burst memory Need for many internal registers to temporarily store real/imaginary components (FFT butterfly and last years exams) Duplication of resources -- was custom, but consider now TigerSHARC

2000/03/0514 DAG 2 8 x 4 x 32 DAG 1 8 x 4 x 32 CACHE MEMORY 32 x 48 PROGRAM SEQUENCER PMD BUS DMD BUS 32 PMA BUS PMD DMD PMA 32DMA BUS DMA 64 JTAG TEST & EMULATION FLAGS TIMER Alternate Core Architecture BUS CONNECT FLOATING & FIXED- POINT MULTIPLIER, FIXED-POINT ACCUMULATOR REGISTER FILE 16 x BIT BARREL SHIFTER FLOATING-POINT &FIXED-POINT ALU FLOATING & FIXED- POINT MULTIPLIER, FIXED-POINT ACCUMULATOR REGISTER FILE 16 x BIT BARREL SHIFTER FLOATING-POINT &FIXED-POINT ALU

2000/03/05 ENCM Characteristics needed in DSP processors Copyright 15 / 34

2000/03/05 ENCM Characteristics needed in DSP processors Copyright 16 / 34 Address calculations -- frequent Complex addressing modes -- take many clock cycles Use pointers and autoincrement rather than calculating pointer + offset need many address-related registers address calculations compete with ALU calculations group instructions within program e.g. read and store often use same or similar addresses so don’t recalculate the addresses.

2000/03/05 ENCM Characteristics needed in DSP processors Copyright 17 / 34 Specialized addressing modes standard memory access premodify postmodify circular buffers (modulo arithmetic on the address registers) bit-reverse addressing structure handling auto-increment with size accounted for

2000/03/05 ENCM Characteristics needed in DSP processors Copyright 18 / 34 Key issue -- ease of development Microcontrollers -- onboard peripherals Host communication Multiprocessor communications Simulators Multi-processor operations Application notes Good working environment Compatibility to previous processor versions -- legacy code (advantage and a disadvantage)

2000/03/05 ENCM Characteristics needed in DSP processors Copyright 19 / 34 Multiplication Extensive algorithms Off-chip multipliers have big bottlenecks Get and then give instruction to multiplier Get and then give first, second data to multiplier Wait till cooked, and then get value Newer chips have on-board multiplication or intelligent co-processors (F-LINE exceptions) Many chips do multiplication using specialized techniques introduced by optimizing compiler

2000/03/05 ENCM Characteristics needed in DSP processors Copyright 20 / 34 Smart Multiplication through optimizing compiler techniques 29K RISC FMULT execution takes 6 cycles + fetch 16bit x 16bit INTEGER multiplication on 68K CISC takes 70 cycles regardless of operations Use adds and shift instead since these take less time -- easy with integer, but floats? What are equivalent operations on 21K. Discussed in early lecture on Quirks and SHARCs

2000/03/05 ENCM Characteristics needed in DSP processors Copyright 21 / 34 Smart Integer 68k Multiplication Multiplication by 2, 4, 8, 16 Achieved by shifting 1, 2, 3 or 4 times (done in 6 + 2n operations on 68K) D2 = D0 * 19 MOVE.W D0, D2 ASL.W #4, D2D2 = D0 * 16 ADD.W D0, D2D2 = D0 * 17 ASL.W #1, D0D0 = D0 *2 ADD.W D0, D2 D2 = D0 * 19 (29 cycles compared to 70) Watch out for overflow, may need conversion to 32 bits (SSI, SSF on some processssors -- not only 21k) Waste of time if have single cycle multipliers (21k?). Careful because multiplication results may end in special register.

2000/03/05 ENCM Characteristics needed in DSP processors Copyright 22 / 34 Multiplication Extensive algorithms Highly pipelined, therefore complex instruction interdependence R0 = R1 * R2BUTR0 = R1 * R2 R3 = R4 * R5R3 = R0 * R5 <- delay dependency Need automated tools to schedule instructions Need multiple destinations (registers) for multiplier result Multiple and Accumulate (MAC) instruction Super-scalar operations even on a simpler processor Cause problems in short loops Many types of MACs needed Not all processors have the single cycle multiplication operation See “In the AM29050 a FIR-bearing animal” (FEB-80 in class notes))

2000/03/05 ENCM Characteristics needed in DSP processors Copyright 23 / 34 Typically need “Normalization” of result N point DFT Result = DFT (Input) ; 0 <= n < N N point inverse DFT Result = IDFT (Input) / N ; 0 <= n < N Division is typically done by the equivalent of repeated subtraction cycles on 68K result = 0; do { Numerator = Numerator - Denom; result++; } while (Numerator > 0); result--; Special shift-subtract tricks speed operations

2000/03/05 ENCM Characteristics needed in DSP processors Copyright 24 / 34 Smart Integer Division Division by 2, 4, 8, 16 unsigned signed LSL #1, D0ASL #1, D0 Need to propagate (or not propagate) the sign bit Unsigned original = 0x80 (128) final = 0x40 (64) Signed original = 0x80 ( - 128) final = 0xC0 ( - 64)

2000/03/05 ENCM Characteristics needed in DSP processors Copyright 25 / 34 Floating Point Division There is not a FDIV on the 21K -- use recursion!!

2000/03/05 ENCM Characteristics needed in DSP processors Copyright 26 / 34 Processors compared IEEE Micro Magazine Special Feature 1992 DSP TMS320C25, 030 DSP56000/1, DSP96002 (Motorola) RISC i860 (Intel) MC88100 (Motorola) SPARC (Sparc Consortium NOT Sun) Am29050 Ideal -- SMITH CRISP

2000/03/05 ENCM Characteristics needed in DSP processors Copyright 27 / 34 CRISP -- triple pun as well Comprehensive RISC -- Predicted 1992 Harvard architecture MAC (rather than Super -- Scalar instructions) Ability to do X = R+S, Y = R-S operations many registers for address/values FP as well as integer capability Bit-reverse addressing Peripherals with DMA Low power standby High precision -- double precision Efficient pipeline with parallel completion of many operations (dual-ported memory and register banks)

2000/03/05 ENCM Characteristics needed in DSP processors Copyright 28 / 34 Comparisons -- 1

2000/03/05 ENCM Characteristics needed in DSP processors Copyright 29 / 34 FIR/IIR

2000/03/05 ENCM Characteristics needed in DSP processors Copyright 30 / 34 FFT -- Radix 2 and Radix 4

2000/03/05 ENCM Characteristics needed in DSP processors Copyright 31 / 34 Requirements for “perfect” DSP Fast instruction cycle -- different from high clock speed Cycle time adjustable according to instruction type Fast hardware multiplier Floating point for easier algorithm design High precision, implying wide data buses for memory, internal processor transfers, registers and on-board processing units

2000/03/05 ENCM Characteristics needed in DSP processors Copyright 32 / 34 Requirements for “perfect” DSP Several data buses available to reduce bus conflict transfer overhead Harvard architecture and/or instruction cache to avoid instruction and data-fetch clashes Duplicate resources for parallel computation of real and imaginary components of complex numbers Dedicated hardware required for address calculations to avoid APU clash with main algorithm

2000/03/05 ENCM Characteristics needed in DSP processors Copyright 33 / 34 Requirements for “perfect” DSP Extensive temporary registers to reduce unwanted fetches of continually used data Or single cycle, highly parallel, memory operations Fast and reliable, easily programmed, developed and upgraded Inexpensive and easy to develop peripherals High level of customer support Inexpensive to purchase Lower power consumption with a standby mode

2000/03/05 ENCM Characteristics needed in DSP processors Copyright 34 / 34 Tackled today Characteristics of DSP algorithms Specialized handling of Multiplication Division (21K has no division instruction) ENCM515 Reference Material How RISCy Is DSP, IEEE Micro (Jan-10) Simply Signal Processing (Jan-40) Saturation Arithmetic (Apr-20)