Abdullah Aldahami (11074595) March 12, 2010 1. 1. Introduction 2. Background 3. Proposed Multiplier Design a.System Overview b.Fixed Point Multiplier.

Slides:

Advertisements

Similar presentations

Zhongkai Chen. Gonzalez-Navarro, S. ; Tsen, C. ; Schulte, M. ; Univ. of Malaga, Malaga This paper appears in: Signals, Systems and Computers, ACSSC.

Advertisements

Computer Engineering FloatingPoint page 1 Floating Point Number system corresponding to the decimal notation 1,837 * 10 significand exponent A great number.

CENG536 Computer Engineering department Çankaya University.

1 CONSTRUCTING AN ARITHMETIC LOGIC UNIT CHAPTER 4: PART II.

Topics covered: Floating point arithmetic CSE243: Introduction to Computer Architecture and Hardware/Software Interface.

Decimal Floating-point Multiplication via Carry-Save Addition Mark Erle Systems & Technology Group International Business Machines Brian Hickmann & Mike.

Datorteknik FloatingPoint bild 1 Floating point Number system corresponding to the decimal notation 1,837 * 10 significand exponent a great number of corresponding.

EE 382 Processor DesignWinter 98/99Michael Flynn 1 AT Arithmetic Most concern has gone into creating fast implementation of (especially) FP Arith. Under.

Chapter # 5: Arithmetic Circuits Contemporary Logic Design Randy H

Energy and Delay Improvement via Decimal Floating Point Hossam A.H.Fahmy, Electronics and Communications Department, CairoUniversity Egypt and.

CSE 378 Floating-point1 How to represent real numbers In decimal scientific notation –sign –fraction –base (i.e., 10) to some power Most of the time, usual.

1 ECE369 Chapter 3. 2 ECE369 Multiplication More complicated than addition –Accomplished via shifting and addition More time and more area.

CPSC 321 Computer Architecture ALU Design – Integer Addition, Multiplication & Division Copyright 2002 David H. Albonesi and the University of Rochester.

ECEN 248 Integer Multiplication, Number Format Adopted from Copyright 2002 David H. Albonesi and the University of Rochester.

1/8/ L24 IEEE Floating Point Basics Copyright Joanne DeGroat, ECE, OSU1 IEEE Floating Point The IEEE Floating Point Standard and execution.

3-1 Chapter 3 - Arithmetic Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring Computer Architecture.

Information Representation (Level ISA3) Floating point numbers.

Computer Organization and Architecture Computer Arithmetic Chapter 9.

Computer Arithmetic Nizamettin AYDIN

Chapter 6-2 Multiplier Multiplier Next Lecture Divider

3-1 Chapter 3 - Arithmetic Principles of Computer Architecture by M. Murdocca and V. Heuring © 1999 M. Murdocca and V. Heuring Principles of Computer Architecture.

AICCSA’06 Sharja 1 A CAD Tool for Scalable Floating Point Adder Design and Generation Using C++/VHDL By Asim J. Al-Khalili.

Chapter 6: Computer Arithmetic and the ALU

Number Systems So far we have studied the following integer number systems in computer Unsigned numbers Sign/magnitude numbers Two’s complement numbers.

Computing Systems Basic arithmetic for computers.

Arithmetic Chapter 4.

Fall 2011SYSC 5704: Elements of Computer Systems 1 Data Representation Also called Encoding Murdocca Chapter 2.

ECE232: Hardware Organization and Design

1/8/ L24 IEEE Floating Point Basics Copyright Joanne DeGroat, ECE, OSU1 IEEE Floating Point The IEEE Floating Point Standard and execution.

Floating Point. Agenda  History  Basic Terms  General representation of floating point  Constructing a simple floating point representation  Floating.

Data Representation in Computer Systems

S. Rawat I.I.T. Kanpur. Floating-point representation IEEE numbers are stored using a kind of scientific notation. ± mantissa * 2 exponent We can represent.

Chapter 8 Problems Prof. Sin-Min Lee Department of Mathematics and Computer Science.

Chapter # 5: Arithmetic Circuits

5-1 Programmable and Steering Logic Chapter # 5: Arithmetic Circuits.

CH09 Computer Arithmetic  CPU combines of ALU and Control Unit, this chapter discusses ALU The Arithmetic and Logic Unit (ALU) Number Systems Integer.

Computer Arithmetic II Instructor: Mozafar Bag-Mohammadi Spring 2006 University of Ilam.

HW/SW PARTITIONING OF FLOATING POINT SOFTWARE APPLICATIONS TO FIXED - POINTED COPROCESSOR CIRCUITS - Nalini Kumar Gaurav Chitroda Komal Kasat.

1 A Combined Decimal and Binary Floating-point Multiplier Charles Tsen, Sonia González-Navarro, Michael Schulte, Brian Hickmann, Katherine Compton 2009.

Fixed and Floating Point Numbers Lesson 3 Ioan Despi.

AMIN FARMAHININ-FARAHANI CHARLES TSEN KATHERINE COMPTON FPGA Implementation of a 64-bit BID-Based Decimal Floating Point Adder/Subtractor.

Computer Arithmetic II Instructor: Mozafar Bag-Mohammadi Ilam University.

CSC 221 Computer Organization and Assembly Language

1/8/ L24 IEEE Floating Point Basics Copyright Joanne DeGroat, ECE, OSU1 IEEE Floating Point The IEEE Floating Point Standard and execution.

Computer Arithmetic Floating Point. We need a way to represent –numbers with fractions, e.g., –very small numbers, e.g., –very large.

Computer Engineering FloatingPoint page 1 Floating Point Number system corresponding to the decimal notation 1,837 * 10 significand exponent A great number.

Chapter 3 Arithmetic for Computers. Chapter 3 — Arithmetic for Computers — 2 Arithmetic for Computers Operations on integers Addition and subtraction.

Floating Point Numbers Representation, Operations, and Accuracy CS223 Digital Design.

CS 232: Computer Architecture II Prof. Laxmikant (Sanjay) Kale Floating point arithmetic.

ECE DIGITAL LOGIC LECTURE 15: COMBINATIONAL CIRCUITS Assistant Prof. Fareena Saqib Florida Institute of Technology Fall 2015, 10/20/2015.

Chapter 8 Computer Arithmetic. 8.1 Unsigned Notation Non-negative notation  It treats every number as either zero or a positive value  Range: 0 to 2.

CH.3 Floating Point Hardware and Algorithms 3/10/

The Power Architecture and Power.org word marks and the Power and Power.org logos and related marks are trademarks and service marks licensed by Power.org.

1/8/ L25 Floating Point Adder Copyright Joanne DeGroat, ECE, OSU1 IEEE Floating Point Adder Using the IEEE Floating Point Standard for an.

S 2/e C D A Computer Systems Design and Architecture Second Edition© 2004 Prentice Hall Chapter 6 Overview Number Systems and Radix Conversion Fixed point.

By Liang-Kai Wang and Michael J. Schulte Joseph Schneider March 12, 2010.

Cosc 2150: Computer Organization Chapter 9, Part 3 Floating point numbers.

Floating Point Representations

Floating Point Number system corresponding to the decimal notation

CS 232: Computer Architecture II

Outline Introduction Floating Point Arithmetic Adder Multiplier.

CSCE 350 Computer Architecture

Arithmetic Logical Unit

How to represent real numbers

Presentation transcript:

Abdullah Aldahami ( ) March 12,

1. Introduction 2. Background 3. Proposed Multiplier Design a.System Overview b.Fixed Point Multiplier c.Master Control Unit d.Rounding and output formulation 4. Verification and Synthesis Results 5. Conclusion 2

 The paper presents a fully parallel Decimal64 floating point (FP) multiplier compliant to IEEE Std for floating point arithmetic.  The proposed multiplier possesses novel methods to target low latency. The proposed design is based on a previously published fixed point multiplier that uses a novel BCD–4221 recoding for decimal digits to improve the area and latency of the partial product generation and the partial product reduction tree. 3

 Decimal arithmetic received an increased attention in the last decade because of its growing need in many commercial applications and database systems where the binary arithmetic is not sufficient. The arithmetic operations need to be executed in decimal format. This is because the inexact mapping between some decimal and binary numbers, such as 0.1, cannot be represented accurately using binary format in a limited precision.  Decimal floating point (DFP) units are expected to be embedded in many processors’ cores to perform the decimal operation faster than the software packages and with higher accuracy.  The paper introduces a decimal floating point multiplier based on radix-10 fixed point multiplier that introduced an efficient implementation by the parallel generation of partial products followed by a novel carry save addition (CSA) tree to end the reduction of the partial products in Carry Save (CS) format. 4

5  Decimal multiplication performs the computation: (P=A×B) Where A is the multiplicand, B is the multiplier, and P is the product. It is assumed that A and B are each n digits hence P is maximally 2n digits that must be rounded in order to fit in a limited precision of n digits.  Several approaches to decimal multiplication are proposed, the simple and straight forward one is to iterate over the digits of the multiplier B and based on the value of the current digit, add the corresponding multiple of the multiplicand A to an intermediate product. In this approach the multiplier multiples 2A through 9A must be generated which consumes large area and delay. The following equation represents this approach to decimal multiplication: X i+1 = (X i + A.B i ) Where X is the partial product, X 0 = 0 and 0 ≤ i ≤ n-1.

6  Another approach is to generate secondary multiples which are a reduced set of multiples and generate any other missing multiple by adding two multiples from this secondary set based on the value of the current digit of the multiplier B.  This approach reduces the complexity of generating eight multiples using an addition operation. The following equation represents this approach to decimal multiplication: X i+1 = (X i + A.B’ l + A.B” l ) Where A.B’ l and A.B” l are the secondary multiples which together equal the proper primary multiple.  The fixed multiplication operation consists of three main stages: generation of partial products, reduction of partial products to two operands and a final carry propagate addition.

7  An IEEE Std DFP number contains a sign bit, an integer significand with a precision of n digits, and a biased exponent. The value of a finite DFP number is: D = – 1 sign × Significand × 10 E-Bias Where, E is the biased non-negative integer exponent.  Biased exponents in this paper represented by E relate to IEEE Std ’s exponents by: E = e + Bias Where e is the unbiased exponent defined in the IEEE Std The significand can be encoded either in binary or in Densely Packed Decimal (DPD), which in the IEEE Std is referred to as the decimal encoding. The exponent must be in the range [E min, E max ], after biasing.

8  The IEEE Std defines the significand of the decimal floating point number as a non-normalized significand. Thus, the decimal floating-point number may have redundant representations.  For example, the value of 320 × may be represented as 320 × 10 24, 32×10 25, or 3200 × This set of representations for a certain decimal floating-point number is called a cohort.  Because of this possibility of multiple representations, IEEE Std defines a preferred exponent for each arithmetic operation, which for multiplication is: PE = E a + E b – Bias Where, E a and E b are the biased exponents of the multiplicand and multiplier operands, respectively.

9 a. System Overview  The multiplier reads the two operands and extracts the sign, exponent and significand. The significand is decoded from a densely-packed decimal (DPD) encoding into Binary Coded Decimal (BCD).  The proposed multiplier contains two main paths: Significand path to generate the output significand of the product, and the exponent path to generate the output exponent of the product and the corresponding flags. The proposed design block diagram is shown in Fig. 1. The fixed point multiplier (FPM) operates once the decoded significands become available.

10 b. Fixed Point Multiplier  Fixed point multiplier consists of three main blocks. The partial product generation generates multiples of multiplicand based on the multiplier digits. The CSA reduction tree reduces the generated partial product to two vectors (CS format) to be added. The decimal carry propagation adder adds the output from the CSA tree, generating an intermediate product to be aligned based on the shift amount generated from the MC and then the shifted product is rounded to fit in the required precision of n digits.  Fig. 2 illustrates a block diagram of the multiplier encoding and the multiple sets selection units.

11  The output two vectors from the CSA tree are introduced to a novel decimal carry propagation adder to generate the intermediate product. The proposed decimal carry propagation adder is illustrated in Fig. 3.  The carry propagation is based on a Kogge-stone tree that aims to propagate the carry faster.

12 c. Master Control Unit  The exponent of the intermediate product, shift amount and the sticky counter are calculated in the master control unit in parallel with the fixed-point multiplication. The decimal point should be in the middle of the intermediate product which is (2n) length. Thus (n) digits of the intermediate product are to the right of the decimal point.  This increases the exponent of the intermediate product (EIP) by (n). The EIP is calculated using this equation: EIP = E a + E b – Bias + n  A sticky counter generated from the MC contains the number of digits that must be ORed to generate the sticky bit. To improve the latency of the multiplier, a novel sticky bit generation unit is developed to generate the sticky bit in parallel with the shifter, the sticky counter is used for generating a bit vector of (2n) length; the vector has 1’s in bits corresponding to the digits that will be ORed to generate the sticky bit, and 0’s in the other bits.

13  Sticky counter (SC) is calculated early as it depends also on the leading zeros in the multiplier and multiplicand. SC = Max (0, n – (LZ a + LZ b ) )  The sticky counter is decremented twice. This insures that the round and guard digits are not included in the sticky bit generation.

14 d. Rounding and output formulation  Rounder takes (n+1) digits from the shifted intermediate product (SIP) and the sticky bit. The proposed multiplier supports the five rounding directions listed in the IEEE Std (Round to Nearest Ties to Even (RNE), Round to Nearest Ties Away from Zero (RNA), Round to Positive Infinity (RPI), Round to negative Infinity (RNI), Round Toward Zero (RTZ)). It supports two other rounding directions. (Round Away from Zero (RAZ), Round to nearest, Ties Toward Zero (RNZ)).  Table I illustrates the rounding scheme used in our DFP multiplier design.

15  The multiplier is modeled using RTL VHDL and then it is functionally verified using FPGEN test cases supplied by IBM.  With a large number of random test cases, the results are generated using the DecNumber library that implements the general decimal arithmetic specification in ANSI C.  The multiplier is synthesized using TSMC 0.18 μm technology. The design is synthesized for a large number of pipeline stages to explore latency, area, and delay tradeoffs. This synthesis was performed using the retiming feature. The synthesis results are given in Table II. The proposed DFP multiplier has a low latency and a small area. Fig. 4 illustrates the relation between the area × delay product of our design and area × delay product of the design versus the number of pipelined stages.

16

17

18  The combinational design shows about 17% improvement in area × delay product over the parallel decimal floating point multiplier.  For Decimal128, our decimal floating point multiplier has delay of 10 ns with mm 2 area. The proposed floating point multiplier for decimal64 is hardware verified by integrating into NIOS II processor on Altera Cyclone II FPGA development kit.

19  Here, a decimal fully parallel and pipelined floating point multiplier is presented. Several enhancements are used to improve the latency such as the use of a parallel fixed point multiplier, the generation of the sticky bit in parallel and the use of a fast decimal carry propagation adder.  The multiplier is synthesized in 0.18 μm technology and pipelined for different numbers of stages. The multiplier shows very good performance with respect to delay and area. The multiplier is hardware verified through Altera Cyclone II FPGA testing.

20