Presentation is loading. Please wait.

Presentation is loading. Please wait.

Joseph Schneider February 23, 2010 1.  Fused Multiply-Add (FMA) is a unit designed to perform (A x B) + C as a single instruction  Faster, more precise.

Similar presentations


Presentation on theme: "Joseph Schneider February 23, 2010 1.  Fused Multiply-Add (FMA) is a unit designed to perform (A x B) + C as a single instruction  Faster, more precise."— Presentation transcript:

1 Joseph Schneider February 23, 2010 1

2  Fused Multiply-Add (FMA) is a unit designed to perform (A x B) + C as a single instruction  Faster, more precise than using two consecutive instructions with standard multiplier and adder  Can perform standard addition and multiplication with appropriate constants 2

3  Performing standard addition and multiplication suffers greater latencies than when using a standard adder or multiplier  When using an FMA instead, can’t perform addition and multiplication in parallel 3

4  Goal: To design architecture between FADD and FMUL units.  Reuse components to minimize area and power consumption  Allow both standard operations and the FMA functionality 4

5  Floating-point units all assume double- precision (64-bit) IEEE-754 standard format 5

6  Compare adder standalone, multiplier standalone, FMA standalone, and the FMA bridge  Compared on basis of latency, area, and power 6

7  (A x B) + C  A and B multiplied while C is aligned based on exponent difference  Carry-save adder implemented  Result is rounded- only once as opposed to two roundings necessary for performing the equation in two operations 7

8 8

9  Follows same architecture of FMA, only reusing parts from FADD and FMUL as appropriate  From FMUL, uses multiplier array.  From FADD, uses rounding unit.  In this method, FADD and FMUL can be used individually or in parallel, while the FMA is used only when needed.  Clock-gating used to ensure bridge is only powered when needed 9

10 10

11  Same as a standard unit, only with additional outputs from multiplier array leading to FMA  Round element shut down via clock-gating when performing an FMA operation 11

12 12

13  Uses Farmwald dual-path FADD design; Two paths available based on exponent difference of inputs  Multiplexer used to select between paths for rounding unit now include option for FMA input  In this manner, FMA uses FADD’s rounding unit 13

14 14

15 15

16  End result, Bridge FMA hardware is essentially the original FMA hardware, only without the multiplier array and rounding unit. 16

17 17

18  FMUL, FADD, FMA, and Bridge FMA all implemented in Verilog  Uses AMD 65-nm silicon-on-insulator design set 18

19  Bridge architecture 30%-70% faster than FMA architecture when performing FADD or FMUL instructions with significant savings in power consumption  Also allows for an FADD and FMUL instruction in parallel, further improving speed  12% performance gain when executing FMA instruction over consecutive operations on individual FADD and FMUL. 19

20  Takes 40% more area to include Bridge FMA with FADD and FMUL Unit  60% increase in power for FMA instruction over consecutive FADD and FMUL instructions in worst case conditions  Increased latency and power over standalone FMA unit 20


Download ppt "Joseph Schneider February 23, 2010 1.  Fused Multiply-Add (FMA) is a unit designed to perform (A x B) + C as a single instruction  Faster, more precise."

Similar presentations


Ads by Google