Download presentation
Presentation is loading. Please wait.
1
1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation W2 Project Objective: Design a crucial part of a GPU called the Multiply Accumulate Unit (MAC) which will revolutionize graphics. Design Manager: Zack Menegakis
2
2 Agenda Marketing (Jigar) Project Description (Farhan) Algorithmic Description (Farhan) Design Process (Sonali) Floorplan Evolution (Sonali) Layout (Avni) Design Specifications (Avni) Conclusion (Jigar)
3
3 MARKETING Application of product: HDR rendering in gaming graphics Why HDR? Used in games like Far Cry Optimization for speed( chose this because of market) Competition- if enter market, possible barriers to entry
4
4 MAD MAC and HDR What is HDR? Show animation explaining concept
5
5 MAD MAC and HDR MAD MAC accelerates FP16 blending to enable true HDR graphics What is HDR? HDR = High Dynamic Range Dynamic range is defined as the ratio of the largest value of a signal to the lowest measurable value Dynamic range of luminance in real-world scenes can be 100,000 : 1 With HDR rendering, pixel intensity are allowed to extend beyond [0..1] range of traditional graphics Nature isn’t clamped to [0..1] and neither should CG In lay terms: Bright things can be really bright Dark things can be really dark And the details can be seen in both
6
6
7
7 Multiply Accumulate unit (MAC) Executes function AB+C on 16 bit floating point inputs. Inputs will be OpenEXR format. Multiply and add in parallel to greatly speed up operation Rounding is only performed only once so greater accuracy than individual multiply and add functions. Also known as: Fused Multiply Add (FMA) Multiply Add (MAD/MADD) in graphics shader programs Many applications benefit from a fast FMA Graphics – HDR rendering, blending and shader ops DSPs – computing vector dot-products in digital filters Fast division, square root – eliminates extra hardware Available in many newer CPUs and DSPs because it’s so cool One ring (circuit) to rule them all! PROJECT DESCRIPTION
8
8 ALGORITHMIC DESCRIPTION Step through entire process Multiply and align occurs concurrently- always align C to A*B Outputs go to adder, normalize, round, overflow checker and output register
9
9 RegArray ARegArray BRegArray C Multiplier Exp CalcAlign Adder/Subtractor Control Logic & Sign Dtrmin Normalize Round Ovf Checker Leading 0 Anticipator 10 5 5 5 14 35 22 5 4 36 14 10 1 5 5 Input Output 16 Reg Y 15 1 1 1 Block Diagram
10
10 IMPLEMENTATION Implementation of each module- how and why we chose a particular method keeping in mind goal of speed( multiplier, adder)
11
11 Design Decisions (contd.): Multiplier Implementation – 11 x 11 Carry-Save Multiplier – Reasons: Fast because it avoids having ripple carry in every stage Enables Compact Layout
12
12 Design Process Verilog-> Schematic-> Layout –Behavioral -> Structural Verilog –Transistors/gates -> Full Schematic –Gate/Component Layout -> Top Level Transistor Count fluctuated from 20,200 to 12,800 Major design decisions –Decided against implementing denormal arithmetic because it would increase the complexity of the project beyond the scope of the class –Round performed only once at the end. –Picked nPass over Tgate in the normalize shifter –Adder: variable length carry select-> Han-Carlson binary tree adder
13
13 VERIFICATION OF DESIGN Verilog Simulations ( show outputs) – Overview – How/Why it works – Behavioral/Structural Explain why we couldn’t get a high-level simulator and how we tested our verilog design.
14
14 SCHEMATICS Show schematics of major blocks: adder, multiplier, and top-level HOW WE VERIFIED: analog simulation
15
15 Top Level Schematic
16
16 Multiplier Schematic
17
17 Adder Schematic
18
18 FLOORPLAN EVOLUTION Initial floorplan How it evolved (with animation)- why and how we changed it
19
19 Multiplier Align C Reg A Reg B Exp Calc Reg C Pipeline Reg Adder Ld Zero Pipeline Reg Normalize Round Reg Y Main Floorplan
20
20 Floorplan
21
21 Full Chip Layout Exponent Align Zero Adder Multiplier NormalizeNormalize RoundRound OvfOvf
22
22 Pipelining Initially planned 5-6 pipeline stages Reduced to 4 pipeline stages – made possible by implementing fast carry lookahead adders in critical path modules (adder and multiplier)
23
23 Pipeline Reg Pipelining Stages Multiplier Align C Reg A Reg B Exp Calc Reg C Pipeline Reg Adder Ld Zero Pipeline Reg Normalize Round Reg Y Pipeline Reg Overflow checker
24
24 LAYOUT Final Layout Layout of large blocks such as multiplier, adder and normalize
25
25 Layout Decisions 3 standard cell heights Uniform width vdd and ground rails Wider vdd and ground rails in power hungry modules Max of 8 flip flops per clock pulse generator Metal directionality
26
26 Multiplier Layout with pipelining
27
27 Adder Layout
28
28 Normalize Layout
29
29 FINAL LAYOUT
30
30 Design Specifications Worst case delay = 2.25ns Long buses are all buffered (not tested yet) Estimated clocking speed = 400MHz Height by width = 193.86 um * 301.545 um Area = 58,458 um^2 Aspect ratio = 1:1.55 Total Transistor density = 0.22
31
31 Layout densities Active : 14.05% Poly : 9.25% Metal 1 : 33.89% Metal 2 : 18.00% Metal 3 : 14.99% Metal 4 : 6.29%
32
32 Layer Masks - Poly
33
33 Layer Masks – Metal 1
34
34 Layer Masks – Metal 2
35
35 Layer Masks – Metal 3
36
36 Layer Masks – Metal 4
37
37 Schematic Power: mW (350Mhz) Layout Power: mW Schematic Delay Layout Delay Multiplier -w/ pipeline 2.97 ?? N/A ?? 3.38n 1.9n N/A 2.25n Exponents1.6082.211.01n1.2n Align0.0940.113480p637p Adder8.489.731.34n1.7n Leading 00.2320.857506p551p Normalize1.4581.546407p437p Round0.6311.21864p986p OvfCheck0.130.19453p475p Registers?? 179p193p Total?? --
38
38 Area: um 2 Transistor Count Transistor Density Multiplier -w/ pipeline 2038844960.22 Exponents5,1637380.14 Align3,9955000.13 Adder13,20231740.24 Leading 01,2533640.29 Normalize3,1909420.3 Round1,8024940.28 OvfCheck200700.35 Registers, etc N/A1948N/A Total58,45812,7300.22
39
39 Conclusion More marketing Summarize chip functionality Extending applications of chip
40
40 Comments?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.