Lecture 09a Numerical Issues. Lecture 09a, Slide 2 Learning Objectives  Numerical issues and data formats.  Fixed point.  Fractional number.  Floating.

Slides:

Advertisements

Similar presentations

Chapter 13 Numerical Issues

Advertisements

Microcomputer Systems 1

Fixed Point Numbers The binary integer arithmetic you are used to is known by the more general term of Fixed Point arithmetic. Fixed Point means that we.

Arithmetic in Computers Chapter 4 Arithmetic in Computers2 Outline Data representation integers Unsigned integers Signed integers Floating-points.

Computer Engineering FloatingPoint page 1 Floating Point Number system corresponding to the decimal notation 1,837 * 10 significand exponent A great number.

Princess Sumaya Univ. Computer Engineering Dept. Chapter 3:

Princess Sumaya Univ. Computer Engineering Dept. Chapter 3: IT Students.

Floating Point Numbers

CS 447 – Computer Architecture Lecture 3 Computer Arithmetic (2)

Floating Point Numbers

Computer ArchitectureFall 2007 © September 5, 2007 Karem Sakallah CS 447 – Computer Architecture.

1 Lecture 3 Bit Operations Floating Point – 32 bits or 64 bits 1.

COE 308: Computer Architecture (T032) Dr. Marwan Abu-Amara Integer & Floating-Point Arithmetic (cont.) (Appendix A, Computer Architecture: A Quantitative.

Floating Point Numbers

COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Hao Ji.

Computer ArchitectureFall 2008 © August 27, CS 447 – Computer Architecture Lecture 4 Computer Arithmetic (2)

Floating Point Numbers.  Floating point numbers are real numbers.  In Java, this just means any numbers that aren’t integers (whole numbers)  For example…

Simple Data Type Representation and conversion of numbers

Computer Organization and Architecture Computer Arithmetic Chapter 9.

Computer Arithmetic Nizamettin AYDIN

Computer Arithmetic. Instruction Formats Layout of bits in an instruction Includes opcode Includes (implicit or explicit) operand(s) Usually more than.

Computer Architecture Lecture 3: Logical circuits, computer arithmetics Piotr Bilski.

Number Systems II Prepared by Dr P Marais (Modified by D Burford)

Computer Arithmetic.

NUMBER REPRESENTATION CHAPTER 3 – part 3. ONE’S COMPLEMENT REPRESENTATION CHAPTER 3 – part 3.

Dale Roberts Department of Computer and Information Science, School of Science, IUPUI CSCI 230 Information Representation: Negative and Floating Point.

Fixed-Point Arithmetics: Part II

Number Systems So far we have studied the following integer number systems in computer Unsigned numbers Sign/magnitude numbers Two’s complement numbers.

Computing Systems Basic arithmetic for computers.

Computer Architecture

ECE232: Hardware Organization and Design

Computer Architecture and Operating Systems CS 3230 :Assembly Section Lecture 10 Department of Computer Science and Software Engineering University of.

S. Rawat I.I.T. Kanpur. Floating-point representation IEEE numbers are stored using a kind of scientific notation. ± mantissa * 2 exponent We can represent.

CPS3340 COMPUTER ARCHITECTURE Fall Semester, /14/2013 Lecture 16: Floating Point Instructor: Ashraf Yaseen DEPARTMENT OF MATH & COMPUTER SCIENCE.

CH09 Computer Arithmetic  CPU combines of ALU and Control Unit, this chapter discusses ALU The Arithmetic and Logic Unit (ALU) Number Systems Integer.

Oct. 18, 2007SYSC 2001* - Fall SYSC2001-Ch9.ppt1 See Stallings Chapter 9 Computer Arithmetic.

9.4 FLOATING-POINT REPRESENTATION

Floating Point Representations CDA 3101 Discussion Session 02.

Fixed and Floating Point Numbers Lesson 3 Ioan Despi.

CSC 221 Computer Organization and Assembly Language

Integer & Floating Point Representations CDA 3101 Discussion Session 05.

Princess Sumaya Univ. Computer Engineering Dept. Chapter 3:

Number Representation for

Floating Point Arithmetic

Lecture notes Reading: Section 3.4, 3.5, 3.6 Multiplication

Fixed & Floating Number Format Dr. Hugh Blanton ENTC 4337/5337.

Computer Arithmetic Floating Point. We need a way to represent –numbers with fractions, e.g., –very small numbers, e.g., –very large.

Floating Point Numbers Representation, Operations, and Accuracy CS223 Digital Design.

Monday, January 14 Homework #1 is posted on the website Homework #1 is posted on the website Due before class, Jan. 16 Due before class, Jan. 16.

Data Representation: Floating Point for Real Numbers Computer Organization and Assembly Language: Module 11.

CS 232: Computer Architecture II Prof. Laxmikant (Sanjay) Kale Floating point arithmetic.

Fixed-point and floating-point numbers Ellen Spertus MCS 111 October 4, 2001.

Floating Point Numbers

Floating Point Representations

Computer Architecture & Operations I

Floating Point Number system corresponding to the decimal notation

CS1010 Programming Methodology

Data Representation Data Types Complements Fixed Point Representation

Computer Arithmetic Multiplication, Floating Point

ECEG-3202 Computer Architecture and Organization

Lecture 9: Shift, Mult, Div Fixed & Floating Point

Presentation transcript:

Lecture 09a Numerical Issues

Lecture 09a, Slide 2 Learning Objectives  Numerical issues and data formats.  Fixed point.  Fractional number.  Floating point.  Comparison of formats and dynamic ranges.

Lecture 09a, Slide 3 Numerical Issues and Data Formats C6000 Numerical Representation Fixed point arithmetic:  16-bit (integer or fractional).  Signed or unsigned. Floating point arithmetic:  32-bit single precision.  64-bit double precision.

Lecture 09a, Slide 4 Fixed Point Arithmetic - Definition  For simplicity a 4-bit representation is used: Decimal Equivalent Binary Number Unsigned integer numbers

Lecture 09a, Slide 5 Fixed Point Arithmetic - Definition  For simplicity a 4-bit representation is used: Decimal Equivalent Binary Number Unsigned integer numbers

Lecture 09a, Slide 6 Fixed Point Arithmetic - Definition  For simplicity a 4-bit representation is used: Decimal Equivalent Binary Number Unsigned integer numbers

Lecture 09a, Slide 7 Fixed Point Arithmetic - Definition  For simplicity a 4-bit representation is used: Decimal Equivalent Binary Number Unsigned integer numbers

Lecture 09a, Slide 8 Fixed Point Arithmetic - Definition  For simplicity a 4-bit representation is used: Decimal Equivalent Binary Number Signed integer numbers

Lecture 09a, Slide 9 Fixed Point Arithmetic - Definition  For simplicity a 4-bit representation is used: Decimal Equivalent Binary Number Signed integer numbers

Lecture 09a, Slide 10 Fixed Point Arithmetic - Definition  For simplicity a 4-bit representation is used: Decimal Equivalent Binary Number Signed integer numbers

Lecture 09a, Slide 11 Fixed Point Arithmetic - Definition  For simplicity a 4-bit representation is used: Decimal Equivalent Binary Number Signed integer numbers

Lecture 09a, Slide 12 Fixed Point Arithmetic - Definition  For simplicity a 4-bit representation is used: Decimal Equivalent Binary Number Signed integer numbers

Lecture 09a, Slide 13 Fixed Point Arithmetic - Definition  For simplicity a 4-bit representation is used: Decimal Equivalent Binary Number Signed integer numbers

Lecture 09a, Slide 14 Fixed Point Arithmetic - Definition  For simplicity a 4-bit representation is used: Decimal Equivalent Binary Number Signed integer numbers

Lecture 09a, Slide 15 Fixed Point Arithmetic - Problems  The following equation is the basis of many DSP algorithms (See Lecture 01):  Two problems arise when using signed and unsigned integers:  Multiplication overflow.  Addition overflow.

Lecture 09a, Slide 16  16-bit x 16-bit = 32-bit  Example: using 4-bit representation  24 cannot be represented with 4-bits. Multiplication Overflow x x

Lecture 09a, Slide 17  32-bit + 32-bit = 33-bit  Example: using 4-bit representation  16 cannot be represented with 4-bits. Addition Overflow

Lecture 09a, Slide 18 Fixed Point Arithmetic - Solution  The solutions for reducing the overflow problem are:  Saturate the result.  Use double precision result.  Use fractional arithmetic.  Use floating point arithmetic.

Lecture 09a, Slide 19 Solution - Saturate the result  Unsigned numbers:  If A x B  15  result = A x B  If A x B > 15  result = x Saturated

Lecture 09a, Slide 20 Solution - Saturate the result  Signed numbers:  If -8  A x B  7  result = A x B  If A x B > 7  result = 7  If A x B < -8  result = x Saturated

Lecture 09a, Slide 21 Solution - Double precision result  For a 4-bit x 4-bit multiplication hold the result in an 8-bit location.  Problems:  Uses more memory for storing data.  If the result is used in another multiplication the data needs to be represented into single precision format (e.g. prod = prod x sum).  Results need to be scaled down if it is to be sent to an D/A converter.

Lecture 09a, Slide 22 Solution - Fractional arithmetic  If A and B are fractional then:  A x B < min(A, B)  i.e. The result is less than the operands hence it will never overflow.  Examples:  0.6 x 0.2 = 0.12 (0.12 < 0.6 and 0.12 < 0.2)  0.9 x 0.9 = 0.81 (0.81 < 0.9)  0.1 x 0.1 = 0.01 (0.01 < 0.1)

Lecture 09a, Slide (N-1) + Fractional numbers  Definition: (N-1) 0111 = MAX 0001 = 2 -(N-1) 1000 = MAX+2 -(N-1) = 1 MAX = 1-2 -(N-1)  Largest Number:  What is the largest number?

Lecture 09a, Slide 24 Fractional numbers  Definition: (N-1) (N-1) = MIN = -1  For 16-bit representation:  MAX = =  MIN = -1  -1  x < 1  Smallest Number:  What is the smallest number?

Lecture 09a, Slide 25 Fractional numbers - Sign Extension  To keep the same resolution as the operands we need to select these 4-bits: 0110a= = = b= = = Sign extension 1110 x

Lecture 09a, Slide 26 Q-Format IQ-Math

Lecture 09a, Slide 27 Fractional numbers - Sign Extension  The way to do it is to shift left by one bit and store upper 4-bits or right shift by three and store the lower 4-bits: 0110 a= = = b= = = Sign extension 1110 x Sign extension bits

Lecture 09a, Slide 28CPU MPY A3,A4,A6 NOPQ15 s.s.xxxxxxxxxxxxxxx s.s.yyyyyyyyyyyyyyy x Q15 s.s.szzzzzzzzzzzzzzzzzzzzzzzzzzzzzz Q30 15-bit * 15-bit Multiplication Store to Data Memory SHR A6,15,A6 SHR A6,15,A6 STH A6,*A7 STH A6,*A7 s.s.zzzzzzzzzzzzzzz Q15

Lecture 09a, Slide 29 ‘C6000 C Data Types TypeSizeRepresentation char, signed char8 bitsASCII unsigned char8 bitsASCII short16 bits2’s complement unsigned short16 bitsbinary int, signed int32 bits2s complement unsigned int32 bitsbinary long, signed long40 bits 2’s complement unsigned long40 bits binary enum32 bits 2’s complement float32 bits IEEE 32-bit double64 bits IEEE 64-bit long double64 bits IEEE 64-bit pointers32 bits binary

Lecture 09a, Slide 30  Pseudo assembly language:  Pseudo ‘C’ language: Fractional numbers - Sign Extension A0 = 0x ; initial value A1 = 0.5; initial value A2 = 0.5; initial value A3 = 0; initial value MPY A1, A2, A3; A3 = 0x SHL A3,1,A3; A3 = 0x STH A3, *A0; 0x2000 -> 0x or MPY A1, A2, A3; A3 = 0x SHR A3,15,A3; A3 = 0x STH A3, *A0; 0x2000 -> 0x short a, b, result; int prod; prod = a * b; prod = prod >> 15; result = (short) prod;

Lecture 09a, Slide 31 Fractional numbers - Problems  There are some problems that need to be resolved when using fractional numbers.  These are:  Result of -1 x -1 = 1  Accumulative overflow.

Lecture 09a, Slide 32 Problem of -1 x -1  We have seen that:  -1  x < 1  -1 x -1 = 1 which cannot be represented.  Solution:  There are two instructions that saturate the result if you have -1 x -1: SMPYSMPYH

Lecture 09a, Slide 33 Problem of -1 x -1  In one cycle these instructions do the following:  Multiply.  Shift left by 1-bit.  Saturate if the sign bits are 01.  It can be shown that: Positive Result Negative Result -1 x -1 Result Result of MPY(H) 00.xxx-xb11.xxx-xb01.xxx-xb Result of SMPY(H) 0.xxx-xb1.xxx-xb0.xxx-xb

Lecture 09a, Slide 34 Problem of Accumulative Overflow  In this case the overflow is due to the summation.  Examples of overflow: 0x7fff + 0x0002 = 0x8001 0x7fff + 0x0002 = 0x8001 0x7ffe0x00000xffff0x7fff 0x8001 (positive number + positive number = negative number!) 0xffff + 0x0002 = 0x0001 0xffff + 0x0002 = 0x0001 (negative number + positive number = negative number!)

Lecture 09a, Slide 35 Problem of Accumulative Overflow  Solutions: (1)Saturate the intermediate results by using these add instructions: If saturation occurs the SAT bit in the CSR is set to 1. You must clear it. (2)Use guard bits: e.g. ADD A1, A2, A1:A0 SADDSSUB

Lecture 09a, Slide 36 Problem of Accumulative Overflow  Solutions: (3)Do nothing if the system is Non-Gain: With a non-gain system the final result is always less than unity. Example system: This will be non-gain if:

Lecture 09a, Slide 37 Floating Point Arithmetic  The C67xx support both single and double precision floating point formats.  The single precision format is as follows: s 31 e 30 e 2221 eem... m0mm... 1-bit8-bits23-bits value = (-1) sign * (1.mantissa) * 2 (exponent-127) s = sign bit e = exponent (8-bit biased : -127) m = mantissa (23-bit normalised fraction)

Lecture 09a, Slide 38 Floating Point Arithmetic Example  Example: Conversion between integer and floating point.  Convert ‘dd’ to the IEEE floating point format: int dd = 0x ; flot1 = (float) dd;

Lecture 09a, Slide 39 Floating Point Arithmetic Example flot1 = 0x4EC To view the value of “flot1” use: View: Memory:Address= &flot1 We find:

Lecture 09a, Slide 40 Floating Point Arithmetic Example  Let us check to see if we have the same number: s = 0 e = b = = 157 m = 0.100b = 0.5 float1 = (-1) 0 * (1.5) * 2 ( ) = 1.5 * 2 30 = decimal = 0x

Lecture 09a, Slide 41 Floating Point IEEE Standard  Special values: s 01ss01s e 0000<e< m 00mm000000000000mm000000m00 Number 0 -0 (-1) s * 0.m * (-1) s * 1.m * 2 e  -  NaN (not a number)

Lecture 09a, Slide 42 Floating Point IEEE Standard  Dynamic range:  Largest positive number:  e(max) = 255,  m(max) = 1-2 -(23-1)  max = [1 + ( )] * = 3.4 *  Smallest positive number:  e(min) = 0,  m(min) = 0.5 (normalised 0.100…0b)  min= 1.5 * = * value = (-1) sign * (1.mantissa) * 2 (exponent-127)

Lecture 09a, Slide 43 Floating Point IEEE Standard  Dynamic range:  Largest negative number:  e(max) = 255,  m(max) =  max = [-1 + ( )] * = -3.4 *  Smallest negative number:  e(min) = 0,  m(min) = 0.5 (normalised 1.100…0b)  min= -1.5 * = * value = (-1) sign * (1.mantissa) * 2 (exponent-127)

Lecture 09a, Slide 44 Floating/Fixed Point Summary  Floating point single precision:  Floating point double precision: s 31 e 30 e 2322 eem... m0mm... 1-bit8-bits23-bits s 63 e 62 e 5251 eem... m 0 mm... 1-bit11-bits52-bits value = (-1) s * 1.m * 2 e-127 value = (-1) s * 1.m * 2 e-1023 odd:even registers

Lecture 09a, Slide 45 Floating/Fixed Point Summary (Short: N = 16;Int: N = 32)  Unsigned integer:  Signed integer:  Signed fractional: x 2 N xx x -2 N xx x (N-1) x... x 2 -1 x 2 -2

Lecture 09a, Slide 46 Floating/Fixed Point Dynamic Range Smallest Number (positive) Largest Number (positive) Smallest Number (negative) Floating Point Single Precision 3.4 x x x bit bit bit bit Integer Fixed Point Fractional

Lecture 09a, Slide 47 Numerical Issues - Useful Tips  Multiply by 2: Use shift left  Divide by 2:Use shift right  Log 2 N:Use shift  Sine, Cosine, Log:Use look up tables  To convert a fractional number to hex:  Num x 2 15  Then convert to hex e.g: convert 0.5 to hex  0.5 x 2 15 =  (16384) dec = (0x4000) hex

Lecture 09a, Slide 48 Numerical Issues - 32-bit Multiplication  It is possible to perform 32-bit multiplication using 16-bit multipliers.  Example: c = a x b (with 32-bit values). ahahahah alalalal bhbhbhbh blblblbl a = b = 32-bits a * b =(a h << 16 + a l )* (b h << 16 + b l ) a * b =(a h << 16 + a l )* (b h << 16 + b l ) =[(a h * b h ) << 32] + [(a l * b h ) << 16] + [(a h * b l ) << 16] + [a l * b l ]

Lecture 09a, Slide 49Links  Further reading:  Understanding TMS320C62xx DSP Single-precision Floating-Point Functions: spra515.pdf spra515.pdf  TMS320C6000 Integer Division: spra707.pdf spra707.pdf

Lecture 09a Numerical Issues - End -