Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Siven Seth (W2-5) Presentation 1 MAD MAC th January, 2006 Design Proposal W2
MAD MAC 525 Status: Project chosen Specifications defined Architecture (in progress) To be done Verilog : Gate level design Schematic Floor plan Layout Extraction, LVS, post-layout simulation
MAD MAC and HDR MAD MAC accelerates FP16 blending to enable true HDR graphics What is HDR? HDR = High Dynamic Range Dynamic range is defined as the ratio of the largest value of a signal to the lowest measurable value Dynamic range of luminance in real-world scenes can be 100,000 : 1 With HDR rendering, pixel intensity are allowed to extend beyond [0..1] range of traditional graphics Nature isn’t clamped to [0..1] and neither should CG In lay terms: Bright things can be really bright Dark things can be really dark And the details can be seen in both
Multiply Accumulate unit (MAC) Executes function AB+C on 16 bit floating point inputs Multiply and add in parallel to greatly speed up operation Rounding is only performed only once so greater accuracy than individual multiply and add functions. Also known as: Fused Multiply Add (FMA) Multiply Add (MAD/MADD) in graphics shader programs Many applications benefit from a fast FMA Graphics – HDR rendering, blending and shader ops DSPs – computing vector dot-products in digital filters Fast division, square root – eliminates extra hardware Available in many newer CPUs and DSPs because it’s so cool One ring (circuit) to rule them all! MAD MAC 525
Design Decisions: Implementing a 16 bit (fp16) format 1 bit sign, 10 bit significand and 5 bit exponent Range of 6.0e -8 to 6.5e 4 Used today in the industry in HLSL graphics shaders Compatible with OpenEXR format used in latest games Adder – high speed custom hybrid adder Multiplier – use array multiplier for speed and pipelining Speed – target >=300Mhz on 180nm manufacturing process
Block diagram
Estimated Transistor Count Registers (input, output, pipelining) Array multiplier6000 Adder 2000 Alignment shifter + lead 0 anticipator2000 Normalize 2000 Rounding 1500 Special cases and control logic 2000 Total18000
Problems and Questions??? Prefer 32 bit for greater accuracy but number of pins and transistor count would be beyond the scope of this class Hard to estimate transistor count at this point