Presentation is loading. Please wait.

Presentation is loading. Please wait.

IEEE Floating Point Adder

Similar presentations


Presentation on theme: "IEEE Floating Point Adder"— Presentation transcript:

1 IEEE Floating Point Adder
Using the IEEE Floating Point Standard for an add/subtract execution unit 1/8/ L25 Floating Point Adder Copyright Joanne DeGroat, ECE, OSU

2 Copyright 2006 - Joanne DeGroat, ECE, OSU
Lecture overview The Interface Part by part A floating point adder design 1/8/ L25 Floating Point Adder Copyright Joanne DeGroat, ECE, OSU

3 Adder is double precision
Value of bits in word representation is: If e=2047 and f /= 0, then v is NaN regardless of s If e=2047 and f = 0, then v = (-1)s ¥ If 0 < e < 2047, then v = (-1)s 2e-1023 (1.f) – normalized number If e = 0 and f /= 0, the v = (-1)s (0.f) Denormalized numbers – allow for graceful underflow If e = 0 and f = 0 the v = (-1)s 0 (zero) 1/8/ L25 Floating Point Adder Copyright Joanne DeGroat, ECE, OSU

4 Copyright 2006 - Joanne DeGroat, ECE, OSU
Specification of a FPA Floating Point Add/Subtract Unit Specification Inputs in IEEE 754 Double Precision Must perform both addition and subtraction Must handle the full floating point standard Normalized numbers Not a Numbers – NaNs +/- Infinity Denormalized numbers 1/8/ L25 Floating Point Adder Copyright Joanne DeGroat, ECE, OSU

5 Specifications continued
Result will be a IEEE 754 Double Precision representation Unit will correctly handle the invalid operation of adding + ¥ and - ¥ = Nan per the standard Unit latches it inputs into registers from parallel 64-bit data busses. There is a separate signal line that indicates the operation add or subtract 1/8/ L25 Floating Point Adder Copyright Joanne DeGroat, ECE, OSU

6 Specifications continued
Outputs The correctly represented result Flags that are output are Zero result Overflow to infinity from normalized numbers as inputs NaN result Overshift (result is the larger of the two operands) Denormalized result Inexact (result was rounded) Invalid operation for addition 1/8/ L25 Floating Point Adder Copyright Joanne DeGroat, ECE, OSU

7 High level block diagram
Basic architecture interface Data – 64 bit A,B,& C Busses Control signals – Latch, Add/Sub, Asel, Drive Condition Flags Output – 7 Flag signals Clocks – Phi1 and Phi2 (a 2 phase clocked architecture 1/8/ L25 Floating Point Adder Copyright Joanne DeGroat, ECE, OSU

8 Copyright 2006 - Joanne DeGroat, ECE, OSU
Start the VHDL The entity interface 1/8/ L25 Floating Point Adder Copyright Joanne DeGroat, ECE, OSU

9 Copyright 2006 - Joanne DeGroat, ECE, OSU
Basic design Can be divided into functional sub-blocks First latch and drive 1/8/ L25 Floating Point Adder Copyright Joanne DeGroat, ECE, OSU

10 What goes in the other blocks
Adjusting the inputs to prepare to add The add Then renormalize Finally round result 1/8/ L25 Floating Point Adder Copyright Joanne DeGroat, ECE, OSU

11 VHDL coding for the latched
A first cut The input latches Note 2 phase 1/8/ L25 Floating Point Adder Copyright Joanne DeGroat, ECE, OSU

12 Copyright 2006 - Joanne DeGroat, ECE, OSU
And on the output Drivers Note use of guarded blocks 1/8/ L25 Floating Point Adder Copyright Joanne DeGroat, ECE, OSU

13 And what goes in between?
In the final design, lots goes in between, but You first want to make sure that the latches are working properly So just pass one input to the output and check And once this works properly can move on with the design 1/8/ L25 Floating Point Adder Copyright Joanne DeGroat, ECE, OSU

14 Copyright 2006 - Joanne DeGroat, ECE, OSU
The first section Prepare to add Identify type of inputs and appropriately adjust operands 1/8/ L25 Floating Point Adder Copyright Joanne DeGroat, ECE, OSU

15 The exponent unit portion
Must get the larger exponent And the difference between the exponents which is the shift distance Also several control signals Exponent all 0s and all 1s Exponent A>B, A<B, = 1/8/ L25 Floating Point Adder Copyright Joanne DeGroat, ECE, OSU

16 Mantissa Processing Logic
Need to examine the two fractional parts and generate several control signals that are required to prepare the operands Need relational signals M>, M=, M< Needed to know which operand to shift Need to know if stored fractional part if all 0’s or not Needed for NaN, 0, ¥ and determination 1/8/ L25 Floating Point Adder Copyright Joanne DeGroat, ECE, OSU

17 After generating control signals
Step 1 is to select between a normalized mantissa and a denormalized mantissa For normalized – Prepend NOT(Ex0) If Ex0 is a 1 then the exponent if all 0s and you have a denormalized number or 0 When Ex0 is a 0 you have a NaN, infinity, or a normalized number Other selection is the factional part shifted left by 1 and postpended by a 0 For denormalized numbers Taking it from to and can now treat it like a normalized number 1/8/ L25 Floating Point Adder Copyright Joanne DeGroat, ECE, OSU

18 Now select between these two
Select the denormalized WHEN Ex0 * (NOT Mx0) When Ex0 is a 1 you have a denormalized number or 0 When Mx0 is a 0 there is a least 1 bit of the fractional part that is a 1 and thus you have a denormalized number Select the NaN, infinity, 0, normalized number Select this case when Ex0 is a 0 or Mx0 is a 1 When Mx0 is a 1 have infinity, 0, or a normalized number When Ex0 is a 0 have a normalized number, infinity, or NaN 1/8/ L25 Floating Point Adder Copyright Joanne DeGroat, ECE, OSU

19 Shown in table form ??skip
Selection table to also point out this relationship Note that for a 0 have NOT(Ex0) prepended to the fractional part or a …000 1/8/ L25 Floating Point Adder Copyright Joanne DeGroat, ECE, OSU

20 Selections are input to a crossbar
The crossbar switch place the larger value on the right path and the small onto the left path The small is the operand to shift if any shifting to align the binary point is needed The equation for exchange on the crossbar is E> + (E=*M>) or shift the A input to the right side if the exponent of A is the larger OR the exponents are equal and the fractional part of A is larger 1/8/ L25 Floating Point Adder Copyright Joanne DeGroat, ECE, OSU

21 Copyright 2006 - Joanne DeGroat, ECE, OSU
The next multiplexers Now have the smaller on the left path and the larger on the right path. On the left path if either exponent is all 1s then that operand is NaN or infinity and has been crossbarred, or is equal, to the right path operand. In this case want to simply pass it through to the output by adding 0 to it. So a 0 is one choice of the left path mux. On the right path select the right path value or mux in a hardwired NaN for an illegal operation 1/8/ L25 Floating Point Adder Copyright Joanne DeGroat, ECE, OSU

22 Copyright 2006 - Joanne DeGroat, ECE, OSU
Linear shifting Next step is to linear shift the left operand The exponent generates the exponent > signals by subtracting the exponents ExpA-ExpB and ExpB-ExpA Then with the help of the all control signals the exponent difference is known and this value is sent to the shifter. 1/8/ L25 Floating Point Adder Copyright Joanne DeGroat, ECE, OSU

23 Copyright 2006 - Joanne DeGroat, ECE, OSU
One last mulitplexer The right path operand, the larger is simply input to the ADDER. On the left path the output of the linear shifter is sent to the ADDER for a + operation OR The one’s complement of the value is sent to the ADDER for a – operation. In this case the input carry is handled appropriately. 1/8/ L25 Floating Point Adder Copyright Joanne DeGroat, ECE, OSU

24 Code for this section - behavioral
Most of code is generation of various signals and movement of data in muxes 1/8/ L25 Floating Point Adder Copyright Joanne DeGroat, ECE, OSU

25 Copyright 2006 - Joanne DeGroat, ECE, OSU
Xbar code highlight Code swap <= expgt OR (expeq AND mangt); xbar_r <= lxbarin when (swap = ‘1’) else rxbarin; xbar_l <= rxbarin when (swap = ‘1’) else lxbarin; 1/8/ L25 Floating Point Adder Copyright Joanne DeGroat, ECE, OSU

26 Copyright 2006 - Joanne DeGroat, ECE, OSU
Hard code NaN VHDL code The code -- Control equation for mux in_mux_r_man <= expa1 AND mana0 and expb1 AND manb0 and (signa XOR signb); in_mux_r <= nan_man WHEN (in_mux_r_man = ‘1’) ELSE xbar_r; 1/8/ L25 Floating Point Adder Copyright Joanne DeGroat, ECE, OSU

27 Copyright 2006 - Joanne DeGroat, ECE, OSU
Now add the mantissas Simply add the two mantissas. As the sign of the B input was XORed with the operation, i.e., inverted if it was a subtract operation, the carry in the the XOR of the two signs. If the signs are different then a subtract is being performed and a ‘1’ if being input to the carry in of the adder. The adder does two’s complement addition. Inputs are of the form x.xxxxx…xx or 54 bits. The output is of the form xx.xxxx…xxx or 58 bits 1/8/ L25 Floating Point Adder Copyright Joanne DeGroat, ECE, OSU

28 On to the next challenge
This is perhaps the hardest part – renormalization of the result Have a result exponent (the exponent of the larger) and a mantissa in the form xx.xxxxxx…xxxx The following slide shows the processing needed 1/8/ L25 Floating Point Adder Copyright Joanne DeGroat, ECE, OSU

29 Copyright 2006 - Joanne DeGroat, ECE, OSU
Renormalization Unit Have exponent and mantissa to deal with. 1/8/ L25 Floating Point Adder Copyright Joanne DeGroat, ECE, OSU

30 Many choices to deal with
May need to shift the mantissa 1 position to the right on a fixed binary point. May be OK as is May have to shift left – then need to know the position of the leading 1. In a behavioral model can simply shift left once, increment a counter and then check. In hardware need a leading 1 detector that give the position of the leading 1 so that the mantissa can be shifter left. 1/8/ L25 Floating Point Adder Copyright Joanne DeGroat, ECE, OSU

31 Copyright 2006 - Joanne DeGroat, ECE, OSU
Interactions All shifts of mantissa result in exponent adjustment. There are 4 choices on the exponent As is Incremented by 1 Adjusted down by some amount depending on shift Zero 1/8/ L25 Floating Point Adder Copyright Joanne DeGroat, ECE, OSU

32 Copyright 2006 - Joanne DeGroat, ECE, OSU
Interactions There are 5 choices on the mantissa As is Right shifted by 1 – increment exp by 1 Left shifted for leading 1 Left shifted and then right shifted by 1 Hardwired 0 This part is the same for both addition and multiplication. Easy to do algorithmically. 1/8/ L25 Floating Point Adder Copyright Joanne DeGroat, ECE, OSU

33 Copyright 2006 - Joanne DeGroat, ECE, OSU
Rounding Unit Once done with renormalization will look at the guard bits to determine rounding. Standard specifies several rounding modes. Can also just truncate. 1/8/ L25 Floating Point Adder Copyright Joanne DeGroat, ECE, OSU

34 Copyright 2006 - Joanne DeGroat, ECE, OSU
Rounding Can result in changes to both the mantissa and the exponent. After rounding final result is output in normalized form. 1/8/ L25 Floating Point Adder Copyright Joanne DeGroat, ECE, OSU

35 And don’t forget the flags
Any arithmetic unit output flags on the status and validity of the result. The flags to be generated are output from various control signals or combinations of various control signals. 1/8/ L25 Floating Point Adder Copyright Joanne DeGroat, ECE, OSU

36 To test (verify) the design
Must test for normal operation and boundary conditions Will check A by B NaN NaN +/- infinity /- infinity +/ /- 0 Denorm Denorm Norm Norm For both direct and all crossed pairings 1/8/ L25 Floating Point Adder Copyright Joanne DeGroat, ECE, OSU

37 Copyright 2006 - Joanne DeGroat, ECE, OSU
Boundary conditions Wish to check several boundary conditions Denorm + Denorm = Max Denorm Denorm + Denorm = Min Norm Norm – Norm = Max Denorm Rounding using first guard bit Rounding using 1st and 2nd guard bits 1/8/ L25 Floating Point Adder Copyright Joanne DeGroat, ECE, OSU

38 Copyright 2006 - Joanne DeGroat, ECE, OSU
Testing Testing of the design code is not necessarily the same as the testing the would be done on the chip. The “testing” of the design is call verification and must insure that all possible input combinations produce the specified output. 1/8/ L25 Floating Point Adder Copyright Joanne DeGroat, ECE, OSU

39 Scan of entire architecture
1/8/ L25 Floating Point Adder Copyright Joanne DeGroat, ECE, OSU

40 Copyright 2006 - Joanne DeGroat, ECE, OSU
Scan of the chip 1/8/ L25 Floating Point Adder Copyright Joanne DeGroat, ECE, OSU


Download ppt "IEEE Floating Point Adder"

Similar presentations


Ads by Google