1. 2 Farhan Mohamed Ali Jigar Vora Sonali Kapoor Avni Jhunjhunwala 1 st May, 2006 Final Presentation MAD MAC 525 Design Manager: Zack Menegakis Design.

Slides:



Advertisements
Similar presentations
Feb. 17, 2011 Midterm overview Real life examples of built chips
Advertisements

Programmable FIR Filter Design
An Integrated Reduction Technique for a Double Precision Accumulator Krishna Nagar, Yan Zhang, Jason Bakos Dept. of Computer Science and Engineering University.
Multipliers Multipliers Booth’s Multiplier Floating Point Arithmetic.
UNIVERSITY OF MASSACHUSETTS Dept
EE 382 Processor DesignWinter 98/99Michael Flynn 1 AT Arithmetic Most concern has gone into creating fast implementation of (especially) FP Arith. Under.
Idongesit Ebong (1-1) Jenna Fu (1-2) Bowei Gai (1-3) Syed Hussain (1-4) Jonathan Lee (1-5) Design Manager: Myron Kwai Overall Project Objective: Design.
1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC th April, 2006 Short Final Presentation.
Copyright 2008 Koren ECE666/Koren Part.6b.1 Israel Koren Spring 2008 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer.
1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 7 MAD MAC th March, 2006 Functional Block.
Noise Canceling in 1-D Data: Presentation #13 Seri Rahayu Abd Rauf Fatima Boujarwah Juan Chen Liyana Mohd Sharipp Arti Thumar M2 April 20 th, 2005 Short.
Team W3: Anthony Marchetta Derek Ritchea David Roderick Adam Stoler Milestone 3: Feb. 4 th Size Estimates/Floorplan Overall Project Objective: Design an.
EECS Components and Design Techniques for Digital Systems Lec 18 – Arithmetic II (Multiplication) David Culler Electrical Engineering and Computer.
Viterbi Decoder: Presentation #11 M1 Overall Project Objective: Design a high speed Viterbi Decoder Stage 11: 12 th April 2004 Short Final Presentation.
DSP 'Swiss Army Knife' Team M3: Jacob Thomas Nick Marwaha Darren Shultz Craig T. LeVan Project Manager: Zachary Menegakis Overall Project Objective: General.
Huffman Encoder Project. Howd - Zur Hung Eric Lai Wei Jie Lee Yu - Chiang Lee Design Manager: Jonathan P. Lee Huffman Encoder Project Final Presentation.
1 Team M1 Enigma Machine 3rd May, 2006 Adithya Attawar (M11) Shilpi Chakrabarti (M12) Mike Sokolsky (M14) Design Manager: Prateek Goenka Adithya Attawar.
Group M3 Nick Marwaha Craig LeVan Jacob Thomas Darren Shultz Project Manager: Zachary Menegakis April 4, 2005 MILESTONE 11 LVS & Simulation DSP 'Swiss.
Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 4 MAD MAC th February, 2006 Gate Level Design.
1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 5 MAD MAC nd February, 2006 Top Level Integration.
Random Number Generator Dmitriy Solmonov W1-1 David Levitt W1-2 Jesse Guss W1-3 Sirisha Pillalamarri W1-4 Matt Russo W1-5 Design Manager – Thiago Hersan.
M2: Team Paradigm :: Pre-Final Presentation 2-D Discrete Cosine Transform Team Paradigm (Group M2): Tommy Taylor Brandon Hsiung Changshi Xiao Bongkwan.
Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 3 MAD MAC th February, 2006 Size estimates/Floor.
Group M3 Nick Marwaha Craig LeVan Jacob Thomas Darren Shultz Project Manager: Zachary Menegakis March 16, 2005 MILESTONE 8 Functional Blocks DSP 'Swiss.
1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 10 MAD MAC th April, 2006 Top-Level Layout.
High Dynamic Range Emeka Ezekwe M11 Christopher Thayer M12 Shabnam Aggarwal M13 Charles Fan M14 Manager: Matthew Russo 6/26/
Sprinkler Buddy Presentation #8: “Testing/Finalization of all Modules and Global Placement” 3/26/2007 Team M3 Kartik Murthy Panchalam Ramanujan Sasidhar.
Sprinkler Buddy Presentation #7: “Redesign of Adder Parts And Layout of Other Major Blocks” 3/07/2007 Team M3 Kalyan Kommineni Kartik Murthy Panchalam.
1 GPS Waypoint Navigation Team M-2: Charles Norman (M2-1) Julio Segundo (M2-2) Nan Li (M2-3) Shanshan Ma (M2-4) Design Manager: Zack Menegakis Presentation.
1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 8 MAD MAC nd March, 2006 Functional Block.
Team W3: Anthony Marchetta Derek Ritchea David Roderick Adam Stoler Milestone 5: Feb. 18 th Component Layout Overall Project Objective: Design an Air-Fuel.
Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Siven Seth (W2-5) Presentation 1 MAD MAC th January, 2006.
Team W1 Design Manager: Rebecca Miller 1. Bobby Colyer (W11) 2. Jeffrey Kuo (W12) 3. Myron Kwai (W13) 4. Shirlene Lim (W14) Stage II: February 4 th 2004.
Lecture 17: Adders.
1 ECE369 Chapter 3. 2 ECE369 Multiplication More complicated than addition –Accomplished via shifting and addition More time and more area.
1 GPS Waypoint Navigation Team M-2: Charles Norman (M2-1) Julio Segundo (M2-2) Nan Li (M2-3) Shanshan Ma (M2-4) Design Manager: Zack Menegakis Presentation.
Camera Auto Focus Group W1 Tom Goff Dave Hwang Kate Killfoile Greg Look Design Manager: Bowei Gai Final Presentation, April 30 th, 2007 Project Objective:
An Extra-Regular, Compact, Low-Power Multiplier Design Using Triple-Expansion Schemes and Borrow Parallel Counter Circuits Rong Lin Ronald B. Alonzo SUNY.
Random Number Generator Dmitriy Solmonov W1-1 David Levitt W1-2 Jesse Guss W1-3 Sirisha Pillalamarri W1-4 Matt Russo W1-5 Design Manager – Thiago Hersan.
Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Shiven Seth (W2-5) Presentation 1 MAD MAC st February,
1 Design Goal Design an Analog-to-Digital Conversion chip to meet demands of high quality voice applications such as: Digital Telephony, Digital Hearing.
1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 9 MAD MAC th March, 2006 Functional Block.
Parallel Prefix Adders A Case Study
Kathy Grimes. Signals Electrical Mechanical Acoustic Most real-world signals are Analog – they vary continuously over time Many Limitations with Analog.
DARPA Digital Audio Receiver, Processor and Amplifier Group Z James Cotton Bobak Nazer Ryan Verret.
Chapter 6-2 Multiplier Multiplier Next Lecture Divider
AICCSA’06 Sharja 1 A CAD Tool for Scalable Floating Point Adder Design and Generation Using C++/VHDL By Asim J. Al-Khalili.
Floating Point vs. Fixed Point for FPGA 1. Applications Digital Signal Processing -Encoders/Decoders -Compression -Encryption Control -Automotive/Aerospace.
07/19/2005 Arithmetic / Logic Unit – ALU Design Presentation F CSE : Introduction to Computer Architecture Slides by Gojko Babić.
L/O/G/O CPU Arithmetic Chapter 7 CS.216 Computer Architecture and Organization.
Advanced VLSI Design Unit 05: Datapath Units. Slide 2 Outline  Adders  Comparators  Shifters  Multi-input Adders  Multipliers.
July 2005Computer Architecture, The Arithmetic/Logic UnitSlide 1 Part III The Arithmetic/Logic Unit.
A Reconfigurable Low-power High-Performance Matrix Multiplier Architecture With Borrow Parallel Counters Counters : Rong Lin SUNY at Geneseo
Lecture 11, Advance Digital Design
EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 Digital Integrated Circuits A Design Perspective Arithmetic Circuits Jan M. Rabaey Anantha.
COMP541 Arithmetic Circuits
EE466: VLSI Design Lecture 13: Adders
EE141 Arithmetic Circuits 1 Chapter 14 Arithmetic Circuits Rev /12/2003 Rev /05/2003.
1/8/ L25 Floating Point Adder Copyright Joanne DeGroat, ECE, OSU1 IEEE Floating Point Adder Using the IEEE Floating Point Standard for an.
EE141 Arithmetic Circuits 1 Chapter 14 Arithmetic Circuits Rev /12/2003.
HDR- Design Presentation Team M1: Emeka Ezekwe (M11) Chris Thayer (M12) Shabnam Aggarwal (M13) Charles Fan (M14) Team M1 Manager: Matthew Russo.
UNIVERSITY OF MASSACHUSETTS Dept
ADPCM Adaptive Differential Pulse Code Modulation
ADPCM Adaptive Differential Pulse Code Modulation
Alpha Blending and Smoothing
Data Representation and Arithmetic Algorithms
Arithmetic Logical Unit
Data Representation and Arithmetic Algorithms
Part III The Arithmetic/Logic Unit
UNIVERSITY OF MASSACHUSETTS Dept
Presentation transcript:

1

2 Farhan Mohamed Ali Jigar Vora Sonali Kapoor Avni Jhunjhunwala 1 st May, 2006 Final Presentation MAD MAC 525 Design Manager: Zack Menegakis Design a crucial part of a GPU called the Multiply Accumulate Unit (MAC) which is revolutionizing graphics

3 Agenda Marketing – Jigar Project and Algorithm Description – Farhan Implementation Part I – Farhan Implementation Part II – Sonali Floorplan – Sonali Layout – Avni Verification – Avni Design Specifications – Avni Conclusion – Jigar

4 Marketing Jigar

5 Purpose MAD MAC 525 accelerates FP16 blending to enable true HDR graphics Huh?? MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

6

7 Beauty of High Dynamic Range With HDR rendering, pixel intensity can extend beyond the range of traditional graphics Nature doesn’t have a limited pixel intensity and neither should Computer Graphics In other words: Bright things can be really bright Dark things can be really dark And the details can be seen in both MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

8 Applications of HDR MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

9 Target Market Target Market Segment Graphic chip manufacturers High speed DSP manufacturers CPU co-processors Potential Customers MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

10 Design Comparison Top 180nm graphics chip is the NVIDIA NV16. Highest speed only 250MHz 9 bit Integer precision As games are becoming more advanced, they are in need of fast graphics chips Conclusion: Market Needs a FAST MAD MAC MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

11 Description and Implementation I Farhan

12 Multiply Accumulate unit (MAC) Executes function AB+C on 16 bit floating point inputs. Format – 1 bit sign, 5 bit exponent and 10 bit significand Multiply and add in parallel to greatly speed up operation Rounding performed only once so greater accuracy than individual multiply and add functions. Also known as: Fused Multiply Add (FMA) Multiply Add (MAD/MADD) in graphics shader programs Project Description MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

13 Algorithm FP Multiply (A*B) Multiply significands Add exponents Normalize Round FP Add (A+B) Align smaller number to larger number Add significands Normalize Round MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

14 Algorithm FP Multiply-Add (AB+C) Align sig C based on exp A+B-C Multiply significands A and B Add sig A*B result to aligned sig C Normalize Round MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

15 ABC Multiplier Exp CalcAlign Adder Normalize Round Ovf Checker Leading 0 Anticipator Output Y Block Diagram MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

16 Implementation Design target: 300MHz Speed is the design goal Ambitious target? How we planned achieve this Fast Logic – parallelize ops as much as possible Pipelining MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

17 Implementation Adder Carry Select vs Carry Lookahead tree MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

18 Implementation Adder Han-Carlson based carry lookahead adder 6 lookahead logic stages for 32 bit adder Less logic than a Kogge-Stone adder Less wiring than a Brent-Kung adder MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

19 Implementation Multiplier Carry-Save Multiplier Avoids having ripple carry in every stage Enables regular and compact layout Easy to pipeline Final 10 bit add stage using carry lookahead adder MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

20 Implementation Leading Zero Anticipator Predicts number of shifts to do in normalize Normalize begins with zero delay Operates in parallel with adder so normalize shifts can be predicted with accuracy of 1 shift to left or right MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

21 Implementation Latches Pulse Latches Practically eliminates setup time 16 transistors per pulse generator Simplified version of those used in a certain high speed CPU Clock pulse generator MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

22 Implementation II and Floorplan Sonali

23 Design Decision: Pass Logic Extensive use of Pass Logic  Reduces transistor count  Reduces area Transistor count reduced from 20,200 to 12,800 Example  Normalize: > 942  Align: > 530 Ensure all pass logic is buffered MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

24 Design Decision: Pipelining Initially planned 6 pipeline stages Reduced to 4 pipeline stages  Adder – Fast Carry Lookahead architecture  Multiplier – Ripple Carry to Carry Lookahead MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

25 Pipeline Stages Multiplier Align C Reg A Exp Calc Reg C Adder Ld Zero Normalize Round Reg B Output MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

26 Schematics MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify Multiplier I N P U T S PIPELINEPIPELINE O U T P U T S OUTPUTSOUTPUTS P I P E L I N E

27 Schematic Adder INPUTS OUTPUTS Look Ahead Logic MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify Sum Logic

28 Multiplier Align C Reg A Reg B Exp Calc Reg C Pipeline Reg Adder Ld Zero Pipeline Reg Normalize Round Initial Floorplan Reg Y Overflow checker Floorplan Evolution MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

29 Floorplan Evolution Exponents Align Ld zero Adder Multiplier NormalizeNormalize RoundRound OvfOvf Reg B Output Reg A Reg C Final Floorplan MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

30 Layout, Verification & Specification Avni

31 Layout Decisions 3 cell heights – 6.03, 5.04 and 3.55 Uniform width vdd and ground rails Wider vdd and ground rails in power hungry modules Max of 8 latches per clock pulse generator Uniform metal directionality within each block MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

32 Final Layout MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

33 Final Layout MULTIPLIER MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

34 Multiplier  Height:  Width:  Area: 20,388 I N ININ PIPELINEREGPIPELINEREG OUTPUTOUTPUT O U T P U T MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify BITSLICEBITSLICE

35 Final Layout MULTIPLIER ADDER MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

36 Adder A D D E R INCREMENTER  Height:122.9  Width:  Area:13,202 MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

37 Final Layout Exponents Align Ld zero Adder Multiplier N o r m a l i z e R o u n d O v f Input OUTOUT MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

38 Layer Masks MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify Active: 14.04%

39 Layer Masks Poly : 9.25% MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

40 Layer Masks Metal 1 : 34.08% MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

41 Layer Masks Metal 2 : 18.00% MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

42 Layer Masks Metal 3 : 14.99% MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

43 Layer Masks Metal 4 : 6.23% MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

44 Verification Of Design Behavioral and Structural Verilog Extensive Testing – Unable to find C or Matlab Code Schematic and Layout testing Analog Simulations – Compare Output with Behavioral Full Chip Verification MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

45 Design Specifications Critical path delay = 2.25ns Clock speed = 400MHz Pipeline stages = 4 Height by width = um * um Area = 59,214 um^2 Aspect ratio = 1:1.55 Transistor density = 0.22 Total Pin Count = 67 MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

46 Schematic Power: mW (400 MHz) Layout Power: mW (400 MHz) Schematic Power: mW (100 MHz) Layout Power: mW (100 MHz) Multiplier -w/ pipeline Exponents Align Adder Leading Normalize Round OvfCheck Total MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

47 Area: um 2 Transistor Count Transistor Density Schematic Delay (ns) Layout Delay (ns) Multiplier -w/ pipeline N/A 2.25 Exponents5, Align3, Adder13, Leading 01, Normalize3, Round1, OvfCheck Registers, etcN/A2038N/A Total59,21412, MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

48 Conclusion Jigar

49 Graphics – HDR Rendering, Blending and Shader ops Fastest 180nm GPU: 250 MHz (9-bit Int) MAD MAC 525: 400 MHz (16-bit FP) Everyone Needs a MAD MAC MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

50 DSPs – Computing Vector Dot-Products in Digital Filters Everyone Needs a MAD MAC MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

51 Enables Fast Division, Square Root Eliminates extra Hardware to handle such computation Available in many new CPUs such as STI’s Cell Everyone Needs a MAD MAC MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

52 Future Enhancements 16 to 32 Bits Newer process technology Possible modifications for low power apps MarketingDescriptionImplementingFloorplanLayoutSpecificationsVerify

53 MA D MAC 525 Everyone Wants A