FINITE word length effect in fixed point processing The Digital Signal Processors have finite width of the data bus. The word-length after mathematical.

Slides:

Advertisements

Similar presentations

FINITE WORD LENGTH EFFECTS

Advertisements

Programmable FIR Filter Design

Roundoff and truncation errors

2009 Spring Errors & Source of Errors SpringBIL108E Errors in Computing Several causes for malfunction in computer systems. –Hardware fails –Critical.

Fixed Point Numbers The binary integer arithmetic you are used to is known by the more general term of Fixed Point arithmetic. Fixed Point means that we.

CENG536 Computer Engineering department Çankaya University.

Topics covered: Floating point arithmetic CSE243: Introduction to Computer Architecture and Hardware/Software Interface.

Digital Signal Processing – Chapter 11 Introduction to the Design of Discrete Filters Prof. Yasser Mostafa Kadah

1 IEEE Floating Point Revision Guide for Phase Test Week 5.

ECIV 201 Computational Methods for Civil Engineers Richard P. Ray, Ph.D., P.E. Error Analysis.

Assembly Language and Computer Architecture Using C++ and Java

Signed Numbers.

Assembly Language and Computer Architecture Using C++ and Java

Implementation of Basic Digital Filter Structures R.C. Maher ECEN4002/5002 DSP Laboratory Spring 2003.

DIGITAL SYSTEMS TCE1111 Representation and Arithmetic Operations with Signed Numbers Week 6 and 7 (Lecture 1 of 2)

Mark Allie comp.dsp Signal to Noise and Numeric Range issues for Direct Form I & II IIR Filters on Modern Analog Devices and TI Digital Signal Processors.

ELEN 5346/4304 DSP and Filter Design Fall Lecture 12: Number representation and Quantization effects Instructor: Dr. Gleb V. Tcheslavski Contact:

Data Representation – Binary Numbers

Computer Organization and Architecture Computer Arithmetic Chapter 9.

Computer Arithmetic Nizamettin AYDIN

Computer Arithmetic. Instruction Formats Layout of bits in an instruction Includes opcode Includes (implicit or explicit) operand(s) Usually more than.

Number Systems Part 2 Numerical Overflow Right and Left Shifts Storage Methods Subtraction Ranges.

Computer Arithmetic.

Fixed-Point Arithmetics: Part II

46 Number Systems Problem: Implement simple pocket calculator Need: Display, adders & subtractors, inputs Display: Seven segment displays Inputs: Switches.

ECEG-3202: Computer Architecture and Organization, Dept of ECE, AAU 1 Floating-Point Arithmetic Operations.

1 EENG 2710 Chapter 1 Number Systems and Codes. 2 Chapter 1 Homework 1.1c, 1.2c, 1.3c, 1.4e, 1.5e, 1.6c, 1.7e, 1.8a, 1.9a, 1.10b, 1.13a, 1.19.

Round-off Errors and Computer Arithmetic. The arithmetic performed by a calculator or computer is different from the arithmetic in algebra and calculus.

Topic 1 – Number Systems. What is a Number System? A number system consists of an ordered set of symbols (digits) with relations defined for addition,

Ch.5 Fixed-Point vs. Floating Point. 5.1 Q-format Number Representation on Fixed-Point DSPs 2’s Complement Number –B = b N-1 …b 1 b 0 –Decimal Value D.

Fixed & Floating Number Format Dr. Hugh Blanton ENTC 4337/5337.

Number Systems & Operations

Finite Precision Numerical Effects

DEPARTMENTT OF ECE TECHNICAL QUIZ-1 AY Sub Code/Name: EC6502/Principles of digital Signal Processing Topic: Unit 1 & Unit 3 Sem/year: V/III.

Recursive Architectures for 2DLNS Multiplication RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR 11 Recursive Architectures for 2DLNS.

FINITE word length effect in fixed point processing

William Stallings Computer Organization and Architecture 8th Edition

MATH Lesson 2 Binary arithmetic.

CHAPTER 5: Representing Numerical Data

Floating Point Representations

Department of Computer Science Georgia State University

Unit IV Finite Word Length Effects

Integer Division.

EEE4176 Applications of Digital Signal Processing

FINITE word length effect in fixed point processing

Outline Introduction Floating Point Arithmetic Adder Multiplier.

Errors in Numerical Methods

Errors in Numerical Methods

Quantization in Implementing Systems

Digital Logic & Design Lecture 02.

ECEG-3202 Computer Architecture and Organization

DEPARTMENT OF INFORMATION TECHNOLOGY DIGITAL SIGNAL PROCESSING UNIT 4

DEPARTMENT OF INFORMATION TECHNOLOGY DIGITAL SIGNAL PROCESSING UNIT 4

Fixed-point Analysis of Digital Filters

Chapter3 Fixed Point Representation

Presentation transcript:

FINITE word length effect in fixed point processing The Digital Signal Processors have finite width of the data bus. The word-length after mathematical operations, if exceeds the bus width, will have to be omitted. This is the source of Serious Errors. We now discuss attributes that cause such errors.

Causes of word length error The different causes of the errors are: Run time error or, Register Over-flow, Arithmetic & coefficient truncation Data scaling in an attempt to reduce overflow. Zero-input limit cycling.

Fixed point design procedure Ideal floating point Floating Point To Fixed pt Realizable Filter Test & Evaluate Fail !!!causes OPTIONS Pass: Luck 1.Register over flow 2.Coefficient errors 3. Arithmetic Errors. 1, adjust binary point 2. Change architecture 3.Scale Parameters

Run time / over flow error: Definition The input data begins with sign bit followed by MSB to LSB extending to mantissa. When the input data length is larger than the bus width, the sign bit followed by MSB bits etc. overflows the register on sign bit side. Since the significant part is over-flown which causes the “remain” in the register useless.

Solution for Run time / over flow error: For fixed point format, we should prefer the subtraction by of 2’s complement. Here the effect of overflow is least significant. Once the sign bit enters the overflow point of the accumulator and overflow likely detected, a Flag is activated that will clamp the input to the data bus by stopping less significant bits entering. Inaccuracy incurs due to loss of data after clamping. Clamping is saturation.

Scaling to reduce overflow.. Here we scale the input to reduce its dynamic range. It is at the cost of performance of the system. The methods to scale are: RMS scaling: preferred Absolute scaling

Arithmetic truncation Errors When two inputs of length N are multiplied, the net word length can increase to 2N. 2N bit is rounding off to S(<2N) bits. These S bits are added with another S bits and the result is say, T bits. The contents are again rounded off to S bits and fed to accumulator. The output is truncated to M bits, the bus width. These multiple rounding-offs causes additive errors. If M<T< 2N, may cause an arithmetic error. Generally for N=16, 2x16 bit multiplication results in maximum 32 bits while the accumulator has a width of 40 bits.

Coefficient MULTIPLIER N BIT COEFFICIENT STORAGE REGISTER Adder INPUT x K Output y k = Ay k-1 +x k N bit Output Storage register y k-1 Ay k-1 Implementation of First Order Recursive Filter.

Multiply accumulate :MAC unit N-bit i/p 2N bit r/o to s bits  S-bits T- to S- bit r/o S- to M-bit r/o output S-bit Accumulator r/o  round off M bit bus

Coefficient Quantization A transfer function is represented by a ratio of Numerator and Denominator polynomials. The binary version of coefficients are not compatible with the data bus width. Requires quantization/truncation. The coefficients of the denominator may be sensitive to instability. Selecting suitable structures may reduce the sensitivity.

Limit Cycling In IIR filters, even though the present input is zero, the output may reveal some undesirable variations due to past signal. It is caused when the response of an unforced LTI system does not decays down to zero in stipulated time. Larger the size of data bus, smaller is the effect of limit cycling.

Methods to control the errors:  Scale the input and/or coefficients.  Increase the word length and  Select an alternative DSP architecture.

Example:1  The binary version of filter coefficients after arithmetical calculations can come out to be non-compatible with the length of data bus.  For a data bus of width n, the actual word length of data may be larger. We need to truncate it.  Example: H(z) = 1/[1+az -1 +b] where a = and b =

Truncation contd: Let the bus width be n=8. One bit for sign is dedicated. Balance are 7 bits. If the exponent a > 1; here one bit for exponent has to be spared. Balance are (n-2) =6 bits for mantissa. Round- off of quantization coefficient is to be used. For fixed point data processing:

Manipulation: 1.The result is added by 0.5, when ‘round-off’ is desired. 2.For bottom or floor, zero is added. 3.For top or ceiling 1.0 is added.

For fixed point data processing: CASE-1 when the coefficient has an integer before decimal. a = This ‘a’ is multiplied by 2 (n-2) = 64 to yield Add 0.5 for round-off: We obtain its binary equivalent truncated to 8 bits including sign bit. This amounts to = -125.

Case-1… In practice, we need not convert it in binary. We take integer number and forgo mantissa. We now convert -125 back into the coefficient level by dividing by 2 6 = 64. The resulting quantized coefficient is a~ = -125/64=

For fixed point data processing: CASE-2: when the coefficient is Mantissa. b= This b is multiplied by (2 n-1 -1). Given b= multiplied by = 127, the result = For round off add 0.5 : when converted in binary and truncated, becomes It is 126 when taken in decimal. The step is 126. This can be directly taken from by eliminating mantissa.. To reduce to coefficient size,126 is divided by 2 7 =128. The truncated coefficient b is

Final Solution and comments The quantized transfer function is: H’(z) = 1/[1+a’z -1 +b’] where a’ = and b’ = 0, Check it for stability. Try Matlab function: beq = a2dR(d,n) and beq = a2dT(d,n). These function should generate the decimal equivalent of the binary representation/decimal representation in sign-magnitude form of a decimal number with a specified number of bits for the fractional part obtained by rounding off.

Example:02 A Digital Signal Processor takes a word length of 7 bit and 1 sign bit. A set of coefficients have been calculated and tabulated as follows. Quantize the coefficients and find out the values the DSP will accepts in place of the calculated coefficients.

Soln:02 The number of bits excluding sign bit are 7. When treating integer + decimal, the total number of effective bits will be 6. When treating no integer and only decimal, the total number of effective bits will be 7. ADC being successive approximation, flooring scheme is applicable. Hence 0.5 is not added.

Soln_2… x(2 7 -1) =  92: 92/128 = x(2 6 )=  71: 71/64 = X(2 7 -1) =  75: 75/128= x2 6 =  204: 204/64 = x2 6 =  355: 355/64 =

Results in Tabular Form

Example:03 A filter function is represented by the polynomial: H[z] = 1/ [ z z -2 ]. The poles are at 0.99 and The system is unconditionally stable. The coefficients are quantized by a 6 bit uni-polar round-off quantizer. Whether the system remains stable?

Soln:03 (A)1.19: number of bits for effecting the unipolar bits are 6 that exclude sign bit. Hence 6 –1(integer)= x 2 5 =38.09  38: 38/2 5 = (B)0.198: number of bits effecting the unipolar bits are 6 excluding the sign bit x(2 6 -1)+0.5 =  12/2 6 = The TF becomes: H ~ (z) = 1/( z z -2 ] TF is marginally stable as roots are[ 1, ]

Effect of quantization on performance Problem:4 A band-pass digital filter to be used for digital clock recovery at 4.8 kbaud and a sampling frequency of kHz. The filter is characterized by H(z) = 1/[1– z z -2 ]. Assess the effect of 8 bit quantization on the (a) pole location of the filter and (b) it’s center frequency.

Soln_4: Given that the TF is a normalized 2 nd BPF. The poles are therefore complex conjugate. For the sake of simplicity we represent the TF = 1/[1+az -1 +bz -2 ] Here a = and b = Sampling frequency given is f s = kHz. The polar distance p of the pole (magnitude) from the origin =  b and is calculated to be p = Its angle  with respect to positive x-axis:   = cos -1 {( /2)/( )} = . Thus the location of poles are   

Soln: 4…. The least count per degree of the angle is /360 = kHz. The value of  ; the un-quantized center frequency is x = kHz. While quantization to 8 bits, 1 bit is sign bit. So the effective number of bits are 7 only. The rule is: effective bits will be 6 for integer + fractions, for fraction, the effective bits will be 7.

Soln.4…..;  Case 1 : Quantization of a = Keeping apart the sign, it contains an integer. Therefore x 2 6 =  125 It’s quantized equivalent is 125/2 6 =  Case 2: Quantization of b = It’s equivalent integer is x(2 7 -1) =  126 The resulting output is 126/2 7 =

Soln 4…..  The new value of p =  =   = cos-1{( /(2 x )} =   The LC per degree of the angle is kHz. The new center frequency is x = kHz.  Location of quantized poles are  