6/3/20151 ENCM515 Comparison of Integer and Floating Point DSP Processors M. Smith, Electrical and Computer Engineering, University of Calgary, Canada.

Slides:

Advertisements

Similar presentations

Fixed Point Numbers The binary integer arithmetic you are used to is known by the more general term of Fixed Point arithmetic. Fixed Point means that we.

Advertisements

This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.

Computer Engineering FloatingPoint page 1 Floating Point Number system corresponding to the decimal notation 1,837 * 10 significand exponent A great number.

This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.

ENCM 515 Review talk on 2001 Final A. Wong, Electrical and Computer Engineering, University of Calgary, Canada ucalgary.ca.

CACHE-DSP Tool How to avoid having a SHARC thrashing on a cache-line M. Smith, University of Calgary, Canada B. Howse, Cell-Loc, Calgary, Canada Contact.

1 Lecture 9: Floating Point Today’s topics:  Division  IEEE 754 representations  FP arithmetic Reminder: assignment 4 will be posted later today.

Number Systems Standard positional representation of numbers:

Informationsteknologi Friday, October 19, 2007Computer Architecture I - Class 61 Today’s class Floating point numbers Computer systems organization.

2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items.

CPSC 321 Computer Architecture ALU Design – Integer Addition, Multiplication & Division Copyright 2002 David H. Albonesi and the University of Rochester.

ECEN 248 Integer Multiplication, Number Format Adopted from Copyright 2002 David H. Albonesi and the University of Rochester.

Information Representation (Level ISA3) Floating point numbers.

Computer Organization and Architecture Computer Arithmetic Chapter 9.

Computer Arithmetic Nizamettin AYDIN

Computer Arithmetic. Instruction Formats Layout of bits in an instruction Includes opcode Includes (implicit or explicit) operand(s) Usually more than.

Computer Arithmetic.

Processor Architecture Needed to handle FFT algoarithm M. Smith.

Fixed-Point Arithmetics: Part II

Computing Systems Basic arithmetic for computers.

07/19/2005 Arithmetic / Logic Unit – ALU Design Presentation F CSE : Introduction to Computer Architecture Slides by Gojko Babić.

Oct. 18, 2007SYSC 2001* - Fall SYSC2001-Ch9.ppt1 See Stallings Chapter 9 Computer Arithmetic.

Lecture 9: Floating Point

Fixed & Floating Number Format Dr. Hugh Blanton ENTC 4337/5337.

Introduction to MMX, XMM, SSE and SSE2 Technology

Chapter 9 Computer Arithmetic

William Stallings Computer Organization and Architecture 8th Edition

Floating Point Representations

Computer Architecture & Operations I

Integer Division.

Lecture 9: Floating Point

CS 232: Computer Architecture II

CS/COE0447 Computer Organization & Assembly Language

William Stallings Computer Organization and Architecture 7th Edition

Chapter 6 Floating Point

Digital Signal Processors

CSCE 350 Computer Architecture

Microcoded CCU (Central Control Unit)

Overview of SHARC processor ADSP and ADSP-21065L

Arithmetic Logical Unit

Overview of SHARC processor ADSP Program Flow and other stuff

Comparing 68k (CISC) with 21k (Superscalar RISC DSP)

This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.

Computer Arithmetic Multiplication, Floating Point

ECEG-3202 Computer Architecture and Organization

Comparing 68k (CISC) with 21k (Superscalar RISC DSP)

* L. E. Turner and M. R. Smith, University of Calgary, Alberta, Canada

* From AMD 1996 Publication #18522 Revision E

* M. R. Smith 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint.

This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.

Overview of SHARC processor ADSP-2106X Compute Operations

Morgan Kaufmann Publishers Arithmetic for Computers

Overview of SHARC processor ADSP-2106X Compute Operations

* L. E. Turner and M. R. Smith, University of Calgary, Alberta, Canada

Overview of SHARC processor ADSP-2106X Memory Operations

Understanding the TigerSHARC ALU pipeline

This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.

* M. R. Smith 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint.

Presentation transcript:

6/3/20151 ENCM515 Comparison of Integer and Floating Point DSP Processors M. Smith, Electrical and Computer Engineering, University of Calgary, Canada ucalgary.ca This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during your presentation In Slide Show, click on the right mouse button Select “Meeting Minder” Select the “Action Items” tab Type in action items as they come up Click OK to dismiss this box This will automatically create an Action Item slide at the end of your presentation with your points entered.

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 2 / 58 Requirements for “perfect” DSP architecture Fast instruction cycle -- not clock speed Fast hardware multiplier Floating point for easier design -- avoids scaling and overflow High precision wide busses for register, memory, processing units Fast loop operation

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 3 / 58 “Perfect” DSP architecture -- II Several data buses available to reduce memory bus conflict/transfer overhead Harvard architecture and/or instruction caches to avoid instruction and data-fetch clashes Duplicate resources for parallel computation Dedicated address calculation hardware Extensive temporary registers to avoid unnecessary fetches of continually used data Architecture allows easy parallel operation in multiprocessor systems -- NEW Cycle time adjustable by instruction -- UNCOMMON Duplicate resources for parallel computation of real and imaginary components -- UNCOMMON -- SIMD?

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 4 / 58 Integer DSP processors remain popular Around a long time so much code already development Many designs available Some complications Overflow with addition multiplication operations bit x 16 bit means 32 bit result where only certain portions are useful Overcome with Fractional Format Overcome with special architecture features

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 5 / 58 Consider 12 bit A/D Double-sided V to nearly +15V 0x V -- negative full scale 0xA V -- three quarter negative scale 0xC V -- half negative full scale 0xE V -- quarter negative full scale 0x V 0x V -- quarter positive full scale 0x V -- half positive full scale Connect so that negative sign (bit 11) on A/D matches negative sign (bit 31) on 21061

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 6 / 58 Consider 12 bit A/D connected to 32 bit Double-sided V to nearly +15V 0x V -- negative full scale 0xA three quarters negative 0xC V -- half negative full scale 0xE V -- quarter negative 0x V 0x V -- quarter positive 0x V -- half positive full scale Connected so that negative sign (bit 11) on A/D matches negative sign (bit 31) on 21061

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 7 / 58 Examples of integer problems SIMPLE SMOOTHING Let’s sum up a couple of values around -7.5V and calculate an average 0xA xA …… Overflow VERY SIMPLE FIR FILTER Result = V1 * H1 + V2 * H2 Let V1 = 0xA (32 bits) Let H1 = 0x8 (3 bits) Need 35 bits to keep result What do the 35 bits mean? -- need to scale

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 8 / 58 Fractional values -- automatic handling of multiplication shifts Normally 0xf * 0xf would result in 64 bit values which would then need scaling

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 9 / 58 Fractional values -- Not all problems removed -- Overflow Understand “fractional” as “fractional full scale” Okay when multiply (R7) but look at R6 =

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 10 / 58 This is the standard overflow -1 = 0x (16 bits) = 0x x x (17 bits) Can expect to overflow in the middle of integer FIR filter, although final result should be in range -1.0 to +1.0 if filter gain is less than 1. Must handle intermediate results overflowing

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 11 / 58 MR registers bits wide MR2MR1MR MR1 -- acts just the same R register in fractional mode MR2 -- OVERFLOW -- looks after the problems of -1 * -1 MR0 -- UNDERFLOW -- looks after problems of -1 / Works till have to get values out of MR -- Okay in FIR (important stuff in MR1)

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 12 / 58 Set the MR register to 0

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 13 / 58 Now subtract -( -1 * -1 )from MR MR2 -- extra sign bits? MR1 -- looks like R0

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 14 / 58 Subtract another -1 (get -2 as 80 bits fractional) MR2 -- extra sign bits? MR1 -- looks like R6

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 15 / 58 Need to look at a variety of processors TI Very early integer DSP TI32C Later integer DSP Motorola Popular integer DSP AMD series (RISC with some DSP) Analog SHARC 2106X Motorola C VLIW Analog TigerSHARC -- VLIW

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 16 / 58 TI32010 Block Diagram

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 17 / 58 TI Details

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 18 / 58 More advanced TMS320C4X

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 19 / 58 TI Block Diagram

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 20 / 58 TI C2XXX Block Diagram 1

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 21 / 58 TI C2XXX -- Block Diagram 2

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 22 / 58 Motorola Core

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 23 / 58 Motorola Integer Processor

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 24 / 58 Problems with Integer Implementations Use 8-bit examples for simplicity 16 bit will have same problem 8 bit A/D for real time operations 8 bit processor Average 4 values 1, 2, 3, 4 -- answer will be correct = 5 127, 2, 3, 4 -- answer incorrect = -60

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 25 / 58 Solution -- Scaling Must prescale all incoming numbers by a value that guarantees that no overflow occurs. Do process then rescale Add 2 numbers -- ASR 1 - scale by 2 Add 4 numbers -- ASR 2 - scale by 4 Average 4 values 1, 2, 3, 4 -- scaled by , 0, 0, 1 average = 0 -- accurate answer to 2 bits 127, 2, 3, 4 -- scaled -- 32, 0, 0, 1 answer = accurate answer to 2 bits

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 26 / 58 Guard Bits -- above and below Need to do 8 bit algorithm in 16 bit processor Use 4 guard bits below and 4 above Still need to prescale, but not by as much Example adding 4 numbers -- no prescale adding 16 numbers -- no prescale? Adding 32 numbers -- prescale

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 27 / 58 Example of Guard Bits Store with guard bits x7F -- stored as 0x07F x02 -- stored as 0x x03 -- stored as 0x x04 -- stored as 0x0040 Sum stored as 0x0880 Average stored as 0x0220 = 34 FIR type sum may involves 128 terms

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 28 / 58 Reference Source Following diagrams from Digital Signal Processing Principles, Algorithms and Applications -- 2nd addition Proakis and Manolakis, McMillian Publishing, 1992

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 29 / 58 Quantization Error Suppose you want to develop band-pass or low-pass IIR filter

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 30 / 58 Two pole IIR filter

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 31 / 58 Allowable pole-positions OK band-pass BAD low-pass

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 32 / 58 Coupled form IIR filter

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 33 / 58 Allowed pole positions

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 34 / 58 Floating Point Chips Only scale as necessary Scale automatically Many other advantages Many formats of floats Some are high precision and slow Some are low precision and fast Some are as high precision as possible given the speed Round up, round down etc

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 35 / 58 Floating point formats on 21K Three kinds available IEEE Single Precision -- normal operations bit format -- Also extended 40-bit format Short Word Floating Point -- special 21K feature bit format Used to create IIR delay lines as use less memory Special memory location for storage Special instructions -- Fpack and Funpack

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 36 / 58 What are the allowed numbers? 32 bit integer Minimum value is -2^31 Maximum value is +2^ Smallest value is 1 Granularity of 1 32 bit floating point Maximum value is +2^+127 Minimum value is -2^+127 Smallest value is 2^-127 Granularity -- changes -- fine for small number, coarse for large

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 37 / 58 Normal 21k FP example Ordinary Decimal Best Integer Approximation 178 Scientific Decimal * 10^2 Scientific Binary * 2^111 1 bit -- sign 8 bits -- for unsigned magnitude biased exponent 24 bits -- for fractional part Total 33 bits of storage

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 38 / 58 Normal 21k FP Example continued Scientific Binary * 2^111 1 bit -- sign 8 bits -- for unsigned magnitude biased exponent (+127) 24 bits -- for fractional part -- the 1. Is “James Bonded” -- “remembered not stored” -- need 23 bits Total 33 bits of storage Biased exponent (1) * 2^ sign = 0 biased exponent = fractional part = hidden 1. For normalized numbers

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 39 / 58 Packed 21k Float See Appendix C Short Float type supports gradual underflow Sacrifices precision for dynamic range Largest number 2 ^ 135 Smallest “Accurate” Number 2 ^ 120 Smallest “Non-zero” number 2 ^ 110 Must scale numbers appropriately 1 bit -- sign 4 bit -- (binary exponent - 120) 11 bit -- rounded upper 11 bits of source OR 11 bit represents non-normalized form of the source when exponent stored as 0

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 40 / 58 Addition in Floating point stored as (1).frac * 2^N (1).010 * 2^3 + (1).011 * 2^3 = * 2^3 = (1).0101 * 2^4 -- must renormalize (1).010 * 2^3 + (1).010 * 2^4 = * 2^ * 2^4 -- denormalize = (1).1110 * 2^4 Remember that (1) is “magic” or remembered and is not stored Can all be done using integer instructions -- around 280 instructions per FOP Problems with co-processor -- data moves

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 41 / 58 AMD29050 FP pipeline

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 42 / 58 AMD29050 FP pipeline latency

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 43 / 58 FP pipeline latency -- software solution

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 44 / 58 FP pipeline latency -- Hardware Solution

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 45 / 58 21K -- Computational Unit

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 46 / 58 29K and 21K Comparison 29K is “general” not DSP 29K and 21K are both Super-scalar structurally 21K is super-scalar instructionally 29K has two important “superscalar features” in terms of instructions FMAC which is 2 instructions on 21K (1 in integer) 192 registers on 29K -- no need to do dm( ) and pm( ) access since already in registers! FMAC gives 29K tremendous speed advantage

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 47 / 58 29K and 21K Comparison Both 29K and 21K can complete new FADD every cycle -- BUT 29K FADD 7-stage pipeline at 50 MHz is FETCH DECODE Denormalize, Add, Perhaps Renormalize, Round WRITEBACK 21K FADD 3-stage pipeline at 40 MHZ FETCH DECODE EXECUTE/WRITEBACK

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 48 / 58 FP versus Int processors Trade algorithm stability and speed/ease of development with cost Cost is rapidly changing FP has “less baggage” in terms of legacy code Now VLIW (true) on DSP and VLIW (effective) on standard Intel and AMD stuff

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 49 / 58 New Trends in DSP VLIW

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 50 / 58

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 51 / 58

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 52 / 58

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 53 / 58 More comments on TIC67XX VLIW

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 54 / 58 Tiger SHARC -- Comparison

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 55 / 58 Tiger SHARC -- Block

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 56 / 58 Tiger SHARC comments

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 57 / 58 Tiger SHARC comments -- 2

6/3/2015 ENCM Comparing Floating Point and Integer Processors Copyright 58 / 58 Looked at a variety of processors TI Very early integer DSP TI32C Later integer DSP Motorola Popular integer DSP AMD series (RISC with some DSP) Analog SHARC 2106X Motorola C VLIW Analog TigerSHARC -- VLIW