Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Area-Efficient FPGA Logic Elements: Architecture and Synthesis Jason Anderson and Qiang Wang 1 IEEE/ACM ASP-DAC Yokohama, Japan January 26-28, 2011 1.

Similar presentations


Presentation on theme: "1 Area-Efficient FPGA Logic Elements: Architecture and Synthesis Jason Anderson and Qiang Wang 1 IEEE/ACM ASP-DAC Yokohama, Japan January 26-28, 2011 1."— Presentation transcript:

1 1 Area-Efficient FPGA Logic Elements: Architecture and Synthesis Jason Anderson and Qiang Wang 1 IEEE/ACM ASP-DAC Yokohama, Japan January 26-28, 2011 1 Q. Wang is now with Huawei Technologies (America).

2 2 Want More for Same Money!

3 3 Logic Density Objective Goal: Implement more gates per unit area of silicon.

4 4 FPGA Logic Blocks Use LUTs to implement logic functions. i3i1i2 S S S S S S S S 3-LUT i1i2 S S S S 2-LUT SRAM cell LUT size doubles with each input added. 4-LUT i1 i2 i3 i4 Can realize any 4-input function

5 5 Today’s FPGAs Modern FPGAs use 6-LUTs: –Fewer levels of logic vs. 4-LUTs  leads to higher speed. –But, many logic functions use < 6 inputs  is hard to efficiently utilize 6-LUTs. 6-LUTs in modern chips “fracturable” to improve logic density.

6 6 6-LUT Implementation 64 SRAM cells. 63 2-to-1 multiplexers in the tree. 5-LUT i6 i1 i2 i3 i4 i5

7 7 6-LUTs in Commercial Chips Dual-output 6-LUT Xilinx Virtex-6 LUT ALM Altera Stratix-IV ALM One 6-input function or any two functions that use up to 5 inputs. One 6-input function, two 4-input functions, one 5-input function+ one 3-input function, others.

8 8 Shannon Decomposition Consider Boolean function of n variables: g = f(x 1, x 2, …, x i, …, x n ) Can decompose w.r.t. one of its variables x i : g = x i  f(x 1, x 2, …, 1, …, x n ) + x i  f(x 1, x 2, …, 0, …, x n )

9 9 Shannon Decomposition Consider Boolean function of n variables: g = f(x 1, x 2, …, x i, …, x n ) Can decompose w.r.t. one of its variables x i : g = x i  f(x 1, x 2, …, 1, …, x n ) + x i  f(x 1, x 2, …, 0, …, x n ) 1-cofactor (g x i ) 0-cofactor (g x i )

10 10 Co-Factor Properties Use | g x i | to represent the # of variables in the co-factor: –Size of the co-factor.

11 11 Trimming Input Concept A k-variable logic function, g = f(x 1, …, x k ) has a trimming input x i if either | g x i | < k -1, or if | g x i | < k -1. Example (4 variables): g = ac + a’b’d + d’a 1-cofactor: g d = ac + a’b’ (3 variables) 0-cofactor: g d’ = ac + a = a (1 variable) d is a trimming input.

12 12 How Common Are Trimming Inputs? Logic Circuit Extremely Common!

13 13 Proposed 5+4-LUT Architecture 6-variable functions with a trimming input can fit into: 5-LUT 4-LUT s i6 i5i4i3i2i1 Contains: 32 + 16 + 1 = 49 SRAM cells (vs. 64 in 6-LUT) 31 + 15 + 2 = 48 2-to-1 multiplexers (vs. 63 in 6-LUT)

14 14 Gating Input Concept A k-variable logic function, g = f(x 1, …, x k ) has a gating input x i if either | g x i | = 0, or if | g x i | = 0. Interpretation: The (gating) state of x i causes g to be logic-0 or logic-1. Example (4-variables): g = (a+b+c)d 1-cofactor: g d = a+b+c (3 variables) 0-cofactor: g d’ = logic-0 (0 variables)

15 15 How Common are Gating Inputs? Logic Circuit Still Significant!

16 16 Proposed Extended 5-LUT Architecture 6-variable functions with a gating input can fit into: 5-LUT s i6 i5i4i3i2 s i1 Contains: 32 +2 = 34 SRAM cells (vs. 64) 31 + 2 = 33 2-to-1 multiplexers (vs. 63) Holds the gating state Gating input

17 17 Additional Architectures Two extended 4-LUTs 4-LUT i4i3i2i1 4-LUT s i5 s s i6 M1 M2 M3 35 SRAM cells Extended 4-LUT + extended 3-LUT 3-LUT i4i3i2i1 4-LUT s i5 s s i6 3-LUT s M1 M2 M3 28 SRAM cells i1 3-LUT i4i3i2 4-LUT s i5 s i6 3-LUT s Extended 4+3-LUT M1 M2 27 SRAM cells

18 18 ABC Logic Synthesis Mainly developed at UC Berkeley [Primary developer: Mishchenko starting 2005/2006]. Uses an AND-INVERTER graph (AIG): Quality LUT mapping solutions. abcd z abcd z complemented edge

19 19 Finding Shannon Co-Factors is Easy in ABC z ab c de f z a b c de f logic-0 x x 0-cofactor w.r.t. b logic-0 Observation: setting an AIG input to logic-0 or logic-1 simply “eliminates”a chunk of the AIG.

20 20 Finding Shannon Co-Factors is Easy in ABC z ab c de f z a b c de f logic-0 x x 0-cofactor w.r.t. b logic-0 z = [(de)’f]’ Observation: setting an AIG input to logic-0 or logic-1 simply “eliminates”a chunk of the AIG.

21 21 FPGA Technology Mapping z x a b c d “3-feasible” cuts for node z e Based on notion of “cuts”. Example: mapping to 3-LUTs z z c z x

22 22 Tech Mapping to K-LUTs 1.Walk circuit DAG from PIs to POs: –Compute K-feasible cuts for each node. –Select a “best” cut for each node: Best: area, depth, power, etc. 2.Walk circuit DAG from POs to PIs: –Construct mapped network by instantiating the selected best cuts.

23 23 Technology Mapping Discard cuts that do not meet requirements of target architecture. –Easy check with AIGs/Shannon decomp. Example: 5-LUT 4-LUT s i6 i5i4i3i2i1 5+4-LUT architecture Consider a cut C for function g: 1)If |Inputs(C)| < 6, C qualifies. 2)If |Inputs(C)| = 6, C qualifies iff  i  Inputs(C) such that |g i | < 5 or |g i | < 5.

24 24 Technology Mapping Example 2: Two extended 4-LUTs 4-LUT i4i3i2i1 4-LUT s i5 s s i6 M1 M2 M3 Which functions can fit? 1)Any function that uses  4 inputs. 2)Any 5-input function. 3)Any 6-input function where: a) both co-factors w.r.t. an input require no more than 4 distinct variables OR b) both co-factors w.r.t. an input have a (different) gating input.

25 25 Methodology Two metrics: –# of logic elements (area). –Mapped circuit depth [# of logic elements on the critical path] (speed).

26 26 Methodology 20 MCNC circuits; 13 VPR 5.0 circuits. –VPR 5.0 circuits synthesized to BLIF using Quartus 9.1 / QUIP. Baseline mapper: priority cuts [ICCAD’07] ( if command in depth mode). resyn2 tech. independent synthesis. Tech. indep. synth. + mapping run 6 times for each cct & best result taken.

27 27 Results MCNC VPR 5.0

28 28 Results 5-LUT 4-LUT s i6 i5i4i3i2i1 VPR 5.0 5+4-LUT achieves nearly identical #LEs/depth to 6-LUT

29 29 Results 5-LUT s i6 i5i4i3i2 s i1

30 30 Results 4-LUT i4i3i2i1 4-LUT s i5 s s i6 M1 M2 M3 Better than extended 5-LUT for almost same silicon area.

31 31 Results 3-LUT i4i3i2i1 4-LUT s i5 s s i6 3-LUT s M1 M2 M3

32 32 Results i1 3-LUT i4i3i2 4-LUT s i5 s i6 3-LUT s M1 M2

33 33 Overall Logic Density Estimate Assume baseline 6-LUT FPGA architecture with core tile silicon area breakdown: –50% routing; 25% LUT; 25% other (FF, carry, etc). Assume LUT portion reduced in proportion to # SRAM cells in LE. ArchitectureSRAM cellsRatio vs. 6-LUTs 6-LUT641.00 5-LUT320.50 5-LUT + 4-LUT490.77 Extended 5-LUT340.53 Two ext. 4-LUTs350.55 Ext. 4-LUT + Ext. 3-LUT280.44 Extended 4-LUT + 3-LUT270.42 Ratio of # SRAM cells (<=1) Si area relative to baseline arch = f X (75% + 25% X s) Ratio of # of LEs needed (>=1)

34 34 Area & Area-Depth Product Average across 33 circuits: Depth 5% 14%

35 35 Summary Logic functions very often have trimming inputs. –Inspire new logic element architectures. New elements can achieve estimated 14% improvement in logic density vs. 6-LUTs; 5% improvement in area-depth product. –May also offer power benefits. Results for 7-input elements in paper. Future work: don’t cares; fracturable LUTs.

36 36 Back-Up Slides

37 37 6-LUTS5+4-LUTExt. 5-LUTTwo ext. 4-LUTsExt. 4 + Ext. 3Ext. 4+3-LUT5-LUTs CIRCUITDEPTHAREADEPTHAREADEPTHAREADEPTHAREADEPTHAREADEPTHAREADEPTHAREA alu457745 67516760673467516866 apex268686 69066 690169067990 apex457525 57665769576257666840 bigkey35793 36913689369136923806 clma102795102767103231102917103044103237123179 des4691472558485819586758455948 diffeq8636864397229723973397309757 dsip368936913 3 369436923 elliptic101797101797111828111832111843111842121873 ex101062452624566250562517625006250572755 ex5p5504546854975518551255175594 frisc131735131740131832131834131835131835141842 misex357235 57455746574157516811 pdc71948719807202572071720357202572495 s29886418 86658648866386669731 s3841772567725517299872917730767302483068 s38584.1622876 6255362431625096255472688 seq5780578658135812580958055908 spla61670616987167161804716896180071906 tseng864786518680867987038 9692 GEOMEAN (STD 20 CIRCUITS):6.081075.166.081076.386.321143.296.271138.856.321144.996.271153.476.821241.43 RATIO VS 6-LUTS: 1.0001.0011.0391.0631.0311.0591.0391.0651.0311.0731.1231.155

38 38 6-LUTS5+4-LUTExt. 5-LUTTwo ext. 4-LUTsExt. 4 + Ext. 3Ext. 4+3-LUT5-LUTs DEPTHAREADEPTHAREADEPTHAREADEPTHAREADEPTHAREADEPTHAREADEPTHAREA cf_cordic_v_18_18_189382293866104203104114104593104585114461 cf_fir_24_16_161810929181045218118431811793181360518136301811668 des_perf33264340194431444081450444488144959 mac1171959172019182166182108182487182481212215 mac2316906317116327396327267328406328415397491 oc54222393222385222600222523222761222780242726 paj_boundtop_hierarchy_no_mem412944 4137141327413714 61376 paj_raygentop_hierarchy_no_mem166314166193176693176518177276177281176677 paj_top_hierarchy_no_mem3332967333161434351963433758343947034394403435016 rs_decoder_2111649121751131919121870142043141989142265 sv_chip0_hierarchy_no_mem512615512684612970612909613728613727612955 sv_chip1_hierarchy_no_mem8254738248319269849264259295649 926882 sv_chip2_hierarchy_no_mem1846440194809919500621950070215533321554431950104 GEOMEAN (VPR 5.0 CIRCUITS):11.886384.0212.016505.7112.937001.2912.856837.2813.107689.5613.107658.7513.987233.55 RATIO VS 6-LUTS: 1.0111.0191.0881.0971.0811.0711.1021.2051.1021.2001.1761.133


Download ppt "1 Area-Efficient FPGA Logic Elements: Architecture and Synthesis Jason Anderson and Qiang Wang 1 IEEE/ACM ASP-DAC Yokohama, Japan January 26-28, 2011 1."

Similar presentations


Ads by Google