Chapter 13 Numerical Issues

Slides:

Advertisements

Similar presentations

Numbers Treasure Hunt Following each question, click on the answer. If correct, the next page will load with a graphic first – these can be used to check.

Advertisements

Chapter 19 Fast Fourier Transform (FFT) (Theory and Implementation)

© 2005 Microchip Technology Incorporated. All Rights Reserved. Slide 1 NUM 1019 Numerical Methods on the PIC Micro.

1 C Programming. 2 Operators 3 Operators in C An operator is a symbol that tells the computer to perform certain mathematical or logical manipulation.

AP STUDY SESSION 2.

Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.

Principles & Applications

Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.

Author: Julia Richards and R. Scott Hawley

Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.

Objectives: Generate and describe sequences. Vocabulary:

Fast Algorithms for Finding Nearest Common Ancestors Dov Harel and Robert Endre Tarjan Fast Algorithms for Finding Nearest Common Ancestors SIAM J. COMPUT.

We need a common denominator to add these fractions.

1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.

Conversion Problems 3.3.

Properties of Real Numbers CommutativeAssociativeDistributive Identity + × Inverse + ×

FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.

Binary Numbers.

1 Floating Point Representation and Arithmetic (see Patterson Chapter 4)

A Simple ALU Binary Logic.

B261 Systems Architecture

Break Time Remaining 10:00.

Monika Gope Lecturer IICT, KUET

Turing Machines.

PP Test Review Sections 6-1 to 6-6

Briana B. Morrison Adapted from William Collins

Discrete Mathematical Structures: Theory and Applications

Digital Systems Introduction Binary Quantities and Variables

Copyright © 2013, 2009, 2005 Pearson Education, Inc.

Exarte Bezoek aan de Mediacampus Bachelor in de grafische en digitale media April 2014.

Binary Values and Number Systems

Number Systems  binary, octal, and hexadecimal numbers  why used  conversions, including to/from decimal  negative binary numbers  floating point.

Fixed-point and floating-point numbers CS370 Fall 2003.

Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.

1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.

Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)

Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.

CONTROL VISION Set-up. Step 1 Step 2 Step 3 Step 5 Step 4.

© 2012 National Heart Foundation of Australia. Slide 2.

Adding Up In Chunks.

Binary Lesson 3 Hexadecimal. Counting to 15 Base Base Base 16 Base Base Base 16 Two Ten (Hex) Two Ten (Hex)

Binary Lesson 3 Hexadecimal. Counting to 15 Base Base Base 16 Base Base Base 16 Two Ten (Hex) Two Ten (Hex)

25 seconds left…...

Digital Logic & Design Lecture No. 3. Number System Conversion Conversion between binary and octal can be carried out by inspection.  Each octal digit.

Subtraction: Adding UP

Analyzing Genes and Genomes

Datorteknik IntegerAddSub bild 1 Integer arithmetic Depends what you mean by "integer" Assume at 3-bit string. –Then we define zero = 000 one = 001 Use.

©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.

Essential Cell Biology

Converting a Fraction to %

Clock will move after 1 minute

Intracellular Compartments and Transport

PSSA Preparation.

Essential Cell Biology

Immunobiology: The Immune System in Health & Disease Sixth Edition

1 Chapter 13 Nuclear Magnetic Resonance Spectroscopy.

14-1 Bard, Gerstlauer, Valvano, Yerraballi EE 319K Introduction to Embedded Systems Lecture 14: Gaming Engines, Coding Style, Floating Point.

Energy Generation in Mitochondria and Chlorplasts

Select a time to count down from the clock above

Murach’s OS/390 and z/OS JCLChapter 16, Slide 1 © 2002, Mike Murach & Associates, Inc.

and M-ary Quadrature Amplitude Modulation (M-QAM)

1/15/2015 Slide # 1 Binary, Octal and Hex Numbers Copyright Thaddeus Konar Introduction to Binary, Octal and Hexadecimal Numbers Thaddeus Konar.

The Logic of Compound Statements

Lecture 09a Numerical Issues. Lecture 09a, Slide 2 Learning Objectives  Numerical issues and data formats.  Fixed point.  Fractional number.  Floating.

Number Representation for

Presentation transcript:

Chapter 13 Numerical Issues

Learning Objectives Numerical issues and data formats. Fixed point. Fractional number. Floating point. Comparison of formats and dynamic ranges.

Numerical Issues and Data Formats C6000 Numerical Representation Fixed point arithmetic: 16-bit (integer or fractional). Signed or unsigned. Floating point arithmetic: 32-bit single precision. 64-bit double precision.

Fixed Point Arithmetic - Definition For simplicity a 4-bit representation is used: Decimal Equivalent Binary Number 23 22 21 20 Unsigned integer numbers

Fixed Point Arithmetic - Definition For simplicity a 4-bit representation is used: Decimal Equivalent Binary Number 23 22 21 20 1 Unsigned integer numbers 1 1

Fixed Point Arithmetic - Definition For simplicity a 4-bit representation is used: Decimal Equivalent Binary Number 23 22 21 20 1 Unsigned integer numbers 1 1 1 2

Fixed Point Arithmetic - Definition For simplicity a 4-bit representation is used: Decimal Equivalent Binary Number 23 22 21 20 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Unsigned integer numbers

Fixed Point Arithmetic - Definition For simplicity a 4-bit representation is used: Decimal Equivalent Binary Number -23 22 21 20 Signed integer numbers

Fixed Point Arithmetic - Definition For simplicity a 4-bit representation is used: Decimal Equivalent Binary Number -23 22 21 20 1 Signed integer numbers 1 1

Fixed Point Arithmetic - Definition For simplicity a 4-bit representation is used: Decimal Equivalent Binary Number -23 22 21 20 1 Signed integer numbers 1 1 1 2

Fixed Point Arithmetic - Definition For simplicity a 4-bit representation is used: Decimal Equivalent Binary Number -23 22 21 20 1 Signed integer numbers 1 1 1 2 1 3 1 4 1 5 1 6 1 7

Fixed Point Arithmetic - Definition For simplicity a 4-bit representation is used: Decimal Equivalent Binary Number -23 22 21 20 1 Signed integer numbers 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 -8

Fixed Point Arithmetic - Definition For simplicity a 4-bit representation is used: Decimal Equivalent Binary Number -23 22 21 20 1 Signed integer numbers 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 -8 1 -7

Fixed Point Arithmetic - Definition For simplicity a 4-bit representation is used: Decimal Equivalent Binary Number -23 22 21 20 1 Signed integer numbers 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 -8 1 -7 1 -6 1 -5 1 -4 1 -3 1 -2 1 -1

Fixed Point Arithmetic - Problems The following equation is the basis of many DSP algorithms (See Chapter 1): Two problems arise when using signed and unsigned integers: Multiplication overflow. Addition overflow.

Multiplication Overflow 16-bit x 16-bit = 32-bit Example: using 4-bit representation 24 cannot be represented with 4-bits. 1 3 1 x x 8 24 1 1

Addition Overflow 32-bit + 32-bit = 33-bit Example: using 4-bit representation 16 cannot be represented with 4-bits. 1 8 1 + + 8 16 1

Fixed Point Arithmetic - Solution The solutions for reducing the overflow problem are: Saturate the result. Use double precision result. Use fractional arithmetic. Use floating point arithmetic.

Solution - Saturate the result Unsigned numbers: If A x B  15  result = A x B If A x B > 15  result = 15 1 3 1 8 x 1 1 24 Saturated 1 15

Solution - Saturate the result Signed numbers: If -8  A x B  7  result = A x B If A x B > 7  result = 7 If A x B < -8  result = -8 1 3 1 -8 x 1 1 -24 Saturated 1 -8

Solution - Double precision result For a 4-bit x 4-bit multiplication hold the result in an 8-bit location. Problems: Uses more memory for storing data. If the result is used in another multiplication the data needs to be represented into single precision format (e.g. prod = prod x sum). Results need to be scaled down if it is to be sent to an A/D converter.

Solution - Fractional arithmetic If A and B are fractional then: A x B < min(A, B) i.e. The result is less than the operands hence it will never overflow. Examples: 0.6 x 0.2 = 0.12 (0.12 < 0.6 and 0.12 < 0.2) 0.9 x 0.9 = 0.81 (0.81 < 0.9) 0.1 x 0.1 = 0.01 (0.01 < 0.1)

Fractional numbers Definition: What is the largest number? -20 2-1 2-2 2-(N-1) -1 0.5 0.25 1 1 What is the largest number? Largest Number: -20 2-1 2-2 2-(N-1) + 1 = MAX 1 = 2-(N-1) 1 = MAX+2-(N-1) = 1 MAX = 1-2-(N-1)

Fractional numbers Definition: What is the smallest number? 1 -20 2-1 2-2 2-(N-1) What is the smallest number? Smallest Number: 1 -20 2-1 2-2 2-(N-1) = MIN = -1 For 16-bit representation: MAX = 1 - 2-15 = 0.999969 MIN = -1 -1 x < 1

Fractional numbers - Sign Extension 1 a= = 0.5 + 0.25 = 0.75 b= = -1 + 0.5 + 0.25 = -0.25 x 1 . 1 . 1 . 1 Sign extension To keep the same resolution as the operands we need to select these 4-bits: 1

Fractional numbers - Sign Extension 1 1 = 0.5 + 0.25 = 0.75 x b= 1 1 1 = -1 + 0.5 + 0.25 = -0.25 1 1 . 1 1 . . Sign extension bits 1 1 1 . . . 1 Sign extension The way to do it is to shift left by one bit and store upper 4-bits or right shift by three and store the lower 4-bits: 1 1 1

15-bit * 15-bit Multiplication CPU MPY A3,A4,A6 NOP Q15 s. x y x Q15 s z Q30 Store to Data Memory SHR A6,15,A6 STH A6,*A7 s. z Q15

‘C6000 C Data Types Type Size Representation char, signed char 8 bits ASCII unsigned char 8 bits ASCII short 16 bits 2’s complement unsigned short 16 bits binary int, signed int 32 bits 2s complement unsigned int 32 bits binary long, signed long 40 bits 2’s complement unsigned long 40 bits binary enum 32 bits 2’s complement float 32 bits IEEE 32-bit double 64 bits IEEE 64-bit long double 64 bits IEEE 64-bit pointers 32 bits binary

Fractional numbers - Sign Extension Pseudo assembly language: Pseudo ‘C’ language: A0 = 0x80000000 ; initial value A1 = 0.5 ; initial value A2 = 0.5 ; initial value A3 = 0 ; initial value MPY A1, A2, A3 ; A3 = 0x10000000 SHL A3,1,A3 ; A3 = 0x20000000 STH A3, *A0 ; 0x2000 -> 0x80000000 or SHR A3,15,A3 ; A3 = 0x00002000 short a, b, result; int prod; prod = a * b; prod = prod >> 15; result = (short) prod;

Fractional numbers - Problems There are some problems that need to be resolved when using fractional numbers. These are: Result of -1 x -1 = 1 Accumulative overflow.

Problem of -1 x -1 We have seen that: Solution: -1 x < 1 -1 x -1 = 1 which cannot be represented. Solution: There are two instructions that saturate the result if you have -1 x -1: SMPY SMPYH

Problem of -1 x -1 In one cycle these instructions do the following: Multiply. Shift left by 1-bit. Saturate if the sign bits are 01. It can be shown that: Positive Result Negative Result -1 x -1 Result Result of MPY(H) 00.xxx-xb 11.xxx-xb 01.xxx-xb Result of SMPY(H) 0.xxx-xb 1.xxx-xb

Problem of Accumulative Overflow In this case the overflow is due to the summation. Examples of overflow: 0x0000 0xffff 0x7fff 0x8001 0x7ffe 0x7fff + 0x0002 = 0x8001 (positive number + positive number = negative number!) 0xffff + 0x0002 = 0x0001 (negative number + positive number = negative number!)

Problem of Accumulative Overflow Solutions: (1) Saturate the intermediate results by using these add instructions: If saturation occurs the SAT bit in the CSR is set to 1. You must clear it. (2) Use guard bits: e.g. ADD A1, A2, A1:A0 SADD SSUB

Problem of Accumulative Overflow Solutions: (3) Do nothing if the system is Non-Gain: With a non-gain system the final result is always less than unity. Example system: This will be non-gain if:

Floating Point Arithmetic The C67xx support both single and double precision floating point formats. The single precision format is as follows: s 31 e 30 22 21 m ... 1-bit 8-bits 23-bits s = sign bit e = exponent (8-bit biased : -127) m = mantissa (23-bit normalised fraction) value = (-1)sign * (1.mantissa) * 2(exponent-127)

Floating Point Arithmetic Example Example: Conversion between integer and floating point. Convert ‘dd’ to the IEEE floating point format: int dd = 0x6000 0000; flot1 = (float) dd;

Floating Point Arithmetic Example To view the value of “flot1” use: View: Memory:Address= &flot1 flot1 = 0x4EC0 0000 We find:

Floating Point Arithmetic Example Let us check to see if we have the same number: s = 0 e = 10011101b = 128+16+8+4+1 = 157 m = 0.100b = 0.5 float1 = (-1)0 * (1.5) * 2(157-127) = 1.5 * 230 = 1610612736 decimal = 0x6000 0000

Floating Point Arithmetic Example The previous example can be seen in: numerical.pjt Numerical_.ws Use the mixed mode display to see the assembly code.

Floating Point IEEE Standard Special values: s 1 e 0<e<255 255 m 0 Number -0 (-1)s * 0.m * 2-126 (-1)s * 1.m * 2e-127 + - NaN (not a number)

Floating Point IEEE Standard value = (-1)sign * (1.mantissa) * 2(exponent-127) Dynamic range: Largest positive number: e(max) = 255, m(max) = 1-2-(23-1) max = [1 + (1 -2-24)] * 2255-127 = 3.4 * 1038 Smallest positive number: e(min) = 0, m(min) = 0.5 (normalised 0.100…0b) min = 1.5 * 2-127 = 8.816 * 10-39

Floating Point IEEE Standard value = (-1)sign * (1.mantissa) * 2(exponent-127) Dynamic range: Largest negative number: e(max) = 255, m(max) = 1-2-24 max = [-1 + (1 -2-24)] * 2255-127 = -3.4 * 1038 Smallest negative number: e(min) = 0, m(min) = 0.5 (normalised 1.100…0b) min = -1.5 * 2-127 = -8.816 * 10-39

Floating/Fixed Point Summary Floating point single precision: Floating point double precision: s 31 e 30 23 22 m ... 1-bit 8-bits 23-bits value = (-1)s * 1.m * 2e-127 63 62 52 51 s e e ... e e m m ... m m 1-bit 11-bits 52-bits odd:even registers value = (-1)s * 1.m * 2e-1023

Floating/Fixed Point Summary (Short: N = 16; Int: N = 32) Unsigned integer: Signed integer: Signed fractional: x 2N-1 20 ... 21 x -2N-1 20 ... 21 x -20 2-(N-1) ... 2-1 2-2

Floating/Fixed Point Dynamic Range Smallest Number (positive) Largest Number (positive) Smallest Number (negative) Floating Point Single Precision 3.4 x 1038 8.8 x 10-39 -3.4 x 1038 216 - 1 1 -216 16-bit 232 - 1 -232 32-bit 1-2-15 2-15 -1 1-2-31 2-31 Integer Fixed Point Fractional

Numerical Issues - Useful Tips Multiply by 2: Use shift left Divide by 2: Use shift right Log2N: Use shift Sine, Cosine, Log: Use look up tables To convert a fractional number to hex: Num x 215 Then convert to hex e.g: convert 0.5 to hex 0.5 x 215 = 16384 (16384)dec = (0x4000)hex

Numerical Issues - 32-bit Multiplication It is possible to perform 32-bit multiplication using 16-bit multipliers. Example: c = a x b (with 32-bit values). a = ah al bh bl b = 32-bits a * b = (ah << 16 + al)* (bh << 16 + bl) = [(ah * bh) << 32] + [(al * bh) << 16] + [(ah * bl) << 16] + [al * bl ]

Links Further reading: Understanding TMS320C62xx DSP Single-precision Floating-Point Functions: \Links\spra515.pdf TMS320C6000 Integer Division: \Links\spra707.pdf

Chapter 13 Numerical Issues - End -