Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical.

Slides:



Advertisements
Similar presentations
Address comments to FPGA Area Reduction by Multi-Output Sequential Resynthesis Yu Hu 1, Victor Shih 2, Rupak Majumdar 2 and Lei He 1 1.
Advertisements

Exploiting Symmetry in SAT-Based Boolean Matching for Heterogeneous FPGA Technology Mapping Yu Hu 1, Victor Shih 2, Rupak Majumdar 2 and Lei He 1 1 Electrical.
EGRE 427 Advanced Digital Design Figures from Application-Specific Integrated Circuits, Michael John Sebastian Smith, Addison Wesley, 1997 Chapter 5 Programmable.
ECE 551 Digital System Design & Synthesis Lecture 08 The Synthesis Process Constraints and Design Rules High-Level Synthesis Options.
ECE 667 Synthesis & Verificatioin - FPGA Mapping 1 ECE 667 Synthesis and Verification of Digital Systems Technology Mapping for FPGAs D.Chen, J.Cong, DAOMap.
ECE 506 Reconfigurable Computing Lecture 6 Clustering Ali Akoglu.
FPGA Technology Mapping Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
FPGA Latency Optimization Using System-level Transformations and DFG Restructuring Daniel Gomez-Prado, Maciej Ciesielski, and Russell Tessier Department.
Architecture Design Methodology. 2 The effects of architecture design on metrics:  Area (cost)  Performance  Power Target market:  A set of application.
Application Specific Instruction Generation for Configurable Processor Architectures VLSI CAD Lab Computer Science Department, UCLA Led by Jason Cong Yiping.
Combining Technology Mapping and Retiming EECS 290A Sequential Logic Synthesis and Verification.
1 DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs Deming Chen, Jacon Cong ICCAD 2004 Presented by: Wei Chen.
ENGIN112 L38: Programmable Logic December 5, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 38 Programmable Logic.
Exploiting Symmetry in SAT-Based Boolean Matching for Heterogeneous FPGA Technology Mapping Yu Hu 1, Victor Shih 2, Rupak Majumdar 2 and Lei He 1 1 Electrical.
Address comments to Robust FPGA Resynthesis Based on Fault-Tolerant Boolean Matching Yu Hu 1, Zhe Feng 1, Lei He 1 and Rupak Majumdar 2.
Yu Hu1, Satyaki Das2 Steve Trimberger2, and Lei He1
DAG-Aware AIG Rewriting Alan Mishchenko, Satrajit Chatterjee, Robert Brayton Department of EECS, University of California Berkeley Presented by Rozana.
 Y. Hu, V. Shih, R. Majumdar and L. He, “Exploiting Symmetries to Speedup SAT-based Boolean Matching for Logic Synthesis of FPGAs”, TCAD  Y. Hu,
 2000 M. CiesielskiPTL Synthesis1 Synthesis for Pass Transistor Logic Maciej Ciesielski Dept. of Electrical & Computer Engineering University of Massachusetts,
Wenlong Yang Lingli Wang State Key Lab of ASIC and System Fudan University, Shanghai, China Alan Mishchenko Department of EECS University of California,
Functional Timing Analysis Made Fast and General Presenter: Yi-Ting Chung Advisor: Jie-Hong Roland Jiang 03/09/2012 Graduate Institute of Electronics Engineering,
USING SAT-BASED CRAIG INTERPOLATION TO ENLARGE CLOCK GATING FUNCTIONS Ting-Hao Lin, Chung-Yang (Ric) Huang Graduate Institute of Electrical Engineering,
Power Reduction for FPGA using Multiple Vdd/Vth
POWER-DRIVEN MAPPING K-LUT-BASED FPGA CIRCUITS I. Bucur, N. Cupcea, C. Stefanescu, A. Surpateanu Computer Science and Engineering Department, University.
Electrical and Computer Engineering Archana Rengaraj ABC Logic Synthesis basics ECE 667 Synthesis and Verification of Digital Systems Spring 2011.
1 Rapid Estimation of Power Consumption for Hybrid FPGAs Chun Hok Ho 1, Philip Leong 2, Wayne Luk 1, Steve Wilton 3 1 Department of Computing, Imperial.
SHA-3 Candidate Evaluation 1. FPGA Benchmarking - Phase Round-2 SHA-3 Candidates implemented by 33 graduate students following the same design.
05/04/06 1 Integrating Logic Synthesis, Tech mapping and Retiming Presented by Atchuthan Perinkulam Based on the above paper by A. Mishchenko et al, UCAL.
Reconfigurable Computing Using Content Addressable Memory (CAM) for Improved Performance and Resource Usage Group Members: Anderson Raid Marie Beltrao.
Design Space Exploration for Application Specific FPGAs in System-on-a-Chip Designs Mark Hammerquist, Roman Lysecky Department of Electrical and Computer.
Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley.
DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs Deming Chen and Jason Cong Computer Science Department University of California,
Resource Mapping and Scheduling for Heterogeneous Network Processor Systems Liang Yang, Tushar Gohad, Pavel Ghosh, Devesh Sinha, Arunabha Sen and Andrea.
Kuo-Hua Wang, Chung-Ming Chan, Jung-Chang Liu Dept. of CSIE Fu Jen Catholic University Slide: Chih-Fan Lai Simulation and SAT-Based Boolean Matching for.
On Logic Synthesis of Conventionally Hard to Synthesize Circuits Using Genetic Programming Petr Fišer, Jan Schmidt Faculty of Information Technology, Czech.
Timing-Driven Routing for FPGAs Based on Lagrangian Relaxation
On the Relation between SAT and BDDs for Equivalence Checking Sherief Reda Rolf Drechsler Alex Orailoglu Computer Science & Engineering Dept. University.
Lecture 6: Mapping to Embedded Memory and PLAs September 27, 2004 ECE 697F Reconfigurable Computing Lecture 6 Mapping to Embedded Memory and PLAs.
1 Synthesizing Datapath Circuits for FPGAs With Emphasis on Area Minimization Andy Ye, David Lewis, Jonathan Rose Department of Electrical and Computer.
Wenlong Yang Lingli Wang State Key Lab of ASIC and System Fudan University, Shanghai, China Alan Mishchenko Department of EECS University of California,
1 Area-Efficient FPGA Logic Elements: Architecture and Synthesis Jason Anderson and Qiang Wang 1 IEEE/ACM ASP-DAC Yokohama, Japan January 26-28,
In-Place Decomposition for Robustness in FPGA Ju-Yueh Lee, Zhe Feng, and Lei He Electrical Engineering Dept., UCLA Presented by Ju-Yueh Lee Address comments.
Static Timing Analysis
FPGA Logic Cluster Design Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
1 WireMap FPGA Technology Mapping for Improved Routability Stephen Jang, Xilinx Inc. Billy Chan, Xilinx Inc. Kevin Chung, Xilinx Inc. Alan Mishchenko,
IPR: In-Place Reconfiguration for FPGA Fault Tolerance Zhe Feng 1, Yu Hu 1, Lei He 1 and Rupak Majumdar 2 1 Electrical Engineering Department 2 Computer.
DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs Deming Chen, Jason Cong , Computer Science Department , UCLA Presented.
Review of “Register Binding for FPGAs with Embedded Memory” by Hassan Al Atat and Iyad Ouaiss Lisa Steffen CprE 583.
Lecture 17: Dynamic Reconfiguration I November 10, 2004 ECE 697F Reconfigurable Computing Lecture 17 Dynamic Reconfiguration I Acknowledgement: Andre DeHon.
Philip Brisk 2 Paolo Ienne 2 Hadi Parandeh-Afshar 1,2 1: University of Tehran, ECE Department 2: EPFL, School of Computer and Communication Sciences Improving.
Fast Synthesis of Clock Gating from Existing Logic Aaron P. Hurst Univ. of California, Berkeley Portions In Collaboration with… Artur Quiring and Andreas.
1 Architecture of Datapath- oriented Coarse-grain Logic and Routing for FPGAs Andy Ye, Jonathan Rose, David Lewis Department of Electrical and Computer.
Fault-Tolerant Resynthesis for Dual-Output LUTs Roy Lee 1, Yu Hu 1, Rupak Majumdar 2, Lei He 1 and Minming Li 3 1 Electrical Engineering Dept., UCLA 2.
Presenter: Darshika G. Perera Assistant Professor
A New Logic Synthesis, ExorBDS
Synthesis for Verification
HeAP: Heterogeneous Analytical Placement for FPGAs
Mapping into LUT Structures
Delay Optimization using SOP Balancing
Alan Mishchenko Satrajit Chatterjee Robert Brayton UC Berkeley
Applying Logic Synthesis for Speeding Up SAT
LPSAT: A Unified Approach to RTL Satisfiability
SAT-Based Area Recovery in Technology Mapping
Alan Mishchenko University of California, Berkeley
Reinventing The Wheel: Developing a New Standard-Cell Synthesis Flow
Improvements in FPGA Technology Mapping
Yu Hu1, Satyaki Das2, Steve Trimberger2, and Lei He1
Delay Optimization using SOP Balancing
Reinventing The Wheel: Developing a New Standard-Cell Synthesis Flow
Robert Brayton Alan Mishchenko Niklas Een
Presentation transcript:

Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical Engineering Dept., UCLA 2. Research Labs, Xilinx Inc. Presented by Yu Hu Address comments to

Outline Introduction Design of the Macro-gates Synthesis for the Proposed FPGA Architecture Comparison of Heterogeneous FPGA Architectures Conclusions and Future Work

Heterogeneity in FPGA Architectures Heterogeneity among SLICEs PProgrammable logic and routing TTiles are not identical soft logic fabric [Kaviani, FPGA’96]] hard structures [Jamieson, FPL’05] DDedicated hard structures e.g. DSP e.g memory block Heterogeneity within a SLICE PProgrammable logic and routing TTiles (SLICEs) are identical DDifferent logics exist within a SLICE e.g. LUTs with different size [Cong, FPGA’99] e.g. mixed PLAs and LUTs [Cong, TODAES’05] e.g. mixed macro-gates and LUTs (source:

Heterogeneous FPGA with Macro-Gates There exists programmability and cost trade-off between LUTs and macrogates  Xilinx V4 benefits from small gates (MUX2, XOR2) built in SLICEs. The benefit of wider macro-gates  Effectiveness of the incorporation of wider logic functions (macro gates) is not clear. Our contributions  Design a new FPGA architecture with mixed LUTs and macro- gates  Propose a new automatic synthesis flow for mapping a circuit to the proposed FPGA architecture  Evaluate the architecture and show that the proposed architecture reduces delay and area by 16.5% and 30%, respective, compared to the LUT-only architecture.

Outline Introduction Design of the Macro-gates Synthesis for the Proposed FPGA Architecture Comparison of Heterogeneous FPGA Architectures Conclusions and Future Work

Overview of Macro-Gate Design Key problem  Select the logic functions for the macro-gate Problem formulation:  Input: a set of training circuits, which have been mapped to K-input LUTs  Output: N K-input Boolean functions: f 1, …, f N  Objective: Maximize the number of logics (in the training circuit set) which can be implemented by f 1, …, f N The proposed solution  Ranking of the logic functions for a set of training circuits

NPN-Class Diagram: Organization of Logics Canonical and efficient representation of all NPN classes  NPN-Equivalent: functional equivalency under inputs negation, permutation or output negation  E.g., f(a,b,c)=a+bc, g(a,b,c)=b’a+b’c NPN-Cofactor relationship is indicated DAG: easy to manipulate It becomes impractical to compute for more than 6-input functions!  Solution: Utilization NPN-Class Diagram Level3: 3-input Level2: 2-input Level1: 1-input Level0: constant Wider inputs

UND: Utilization NPN-Class Diagram UND is an DAG, sub-graph of NCD Help for scoring and ranking functions ab’c’+a’bc’ ab’c’+a’bc’ / 1 / xx% abc ab / 0 / xx% a / 0 / xx% ab’+a’b / 0 / xx% -0- / 0 / xx% abc/ 1 / xx% ab’+a’b a Implementation capability Appearance frequency functionality

UND: Utilization NPN-Class Diagram ab’c’+a’bc’ ab’c’+a’bc’ / 1 / xx% abc ab / 0 / xx% a / 0 / xx% ab’+a’b / 0 / xx% -0- / 0 / xx% abc/ 1 / xx% ab’+a’b a ab’+a’b / 1 / xx% a / 1 / xx%

a / 1 / 25% ab’+a’b / 1 / 50% UND: Utilization NPN-Class Diagram Calculate Implementation Capability ab’c’+a’bc’ ab’c’+a’bc’ / 1 / 75% abc ab / 0 / 25% -0- / 0 / xx% abc/ 1 / 50% ab’+a’b a Fanout cone of ab’c+a’bc’ The topology property (DAG) of UND enables us to efficiently explore different metrics for functionality ranking, e.g., utilization rate.

Recap: Overall Flow for Macro-Gate Design Map with LUT-N Extract logic functions Generate Utilization NPN Diagram Calculate score For logic functions Rank logic functions ab’c’+a’bc’ / 1 / xx% ab / 0 / xx% a / 0 / xx% ab’+a’b / 0 / xx% -0- / 0 / xx% abc/ 1 / xx% ab’+a’b / 1 / xx% a / 1 / xx% F f g d e h b a c LUT LUT LUT and2(3) inv(1) nand2(2) …… a / 1 / 25% ab’+a’b / 1 / 50% ab’c’+a’bc’ / 1 / 75% ab / 0 / 25% -0- / 0 / xx% abc/ 1 / 50% 1+1*1/2= *1/2= *1/3= *2/3+1*1/3=2 Best function: ab’c’+a’bc’

Proposed Macro-Gates and FPGA Architecture For IWLS’05 benchmarks, the following four 6-input functions have the highest ranks  GI1=a b c d e f (AND-6)  GI2=a’ b’ c’ + b c f’ + b c’ d’ + b’ c e(MUX-4)  GI3=a b' c d' e + b c e f + d e f  GI4=a b' + a' c d' + b' c' + e' + f‘ It can implement over 50% of logic functions in IWLS’05 benchmarks. The architecture of the proposed macro-gate and FPGA SLICE are

Outline Design of the Embedded Macro-gates Synthesis for the Proposed FPGA Architecture  Technology Mapping for Heterogeneous FPGAs  SAT-based Packing  Place and Routing Comparison of Heterogeneous FPGA Architectures Conclusions and Future Work

Functional & Structural Cut Enumeration a b d z yx c w a=(x+y)’ b=y+wz d=ab=(x+y)’(y+wz) =x’y’wz Is x’v’wz in library? 4-input macro gate lib …… Yes Phase1:Enumerate and label cuts from PIs to Pos  Check the feasibility of a cut w.r.t. the macro-gate Phase2:Select best choice from POs to Pis A general yet efficient solution is SAT based Boolean matching  Exploiting Symmetry in SAT-Based Boolean Matching for Heterogeneous FPGA Technology Mapping, Session 5C.1, ICCAD 07

Key in Technology Mapping: Balance Resource Utilization Asymmetric architecture causes problem to resource utilization Exclusively use of one logic resource leads to lots of unused fabric Simple yet effective solution :  Change LUT-MG ratio by adjusting their area weights.  Precise calibration is hard to reach by this approach. Total# too large! Hard to obtain precise calibration Objective architecture: LUT6:MacroGate6 =1:1 Best LUT-MG ratio = 1:1 LUT-MG ratio = LUT#/MG#

Post-Mapping Area Recovery (motivation example) Given:  Target architecture = LUT6 + MG6  LUT-MG ratio in target architecture = 1:1  LUT# < MG# in the mapped design  Intrinsic delay (LUT6 : MG6) = 5:4 Objective: balance LUT MG number without increasing delay LUT6 MG6 5 / 5 4 / 5 9 / 9 9 / / / 13 PI PO MG6 8 / 9

Post-Mapping Area Recovery (motivation example) Given:  Target architecture = LUT6 + MG6  LUT-MG ratio in target architecture = 1:1  LUT# < MG# in the mapped design  Intrinsic delay (LUT6 : MG6) = 5:4 Objective: balance LUT MG number without increasing delay LUT6 MG6 5 / 5 4 / 5 9 / 9 10 / / / 13 PI PO MG6 8 / 9 LUT6

Post-Mapping Area Recovery (motivation example) Given:  Target architecture = LUT6 + MG6  LUT-MG ratio in target architecture = 1:1  LUT# < MG# in the mapped design  Intrinsic delay (LUT6 : MG6) = 5:4 Objective: balance LUT MG number without increasing delay LUT6 MG6 5 / 5 9 / 9 10 / / / 13 PI PO MG6 10 / 9 LUT6 Timing target violation! Timing slack budgeting is necessary!

Post Mapping Area Recovery by Timing Budgeting Formulated as an Integer Linear Programming (ILP) Problem Objective (minimize gap between target and actual LUT-MG ratios): min |m2+…+m7-7/2| Arrival time constraints: ai+dj+bj<=aj Clock period target: ai<=17 LUT assignment with given timing slack: (5-4)*mj<=bj, mj={0,1} LUT6 MG6 a1 a6 a5 a2 a3 a4 PI PO MG6 a7 MG6 Easy to be generalized to handle arch  with multiple macro gates  with different input pin numbers

Outline Design of the Embedded Macro-gates Synthesis for the Proposed FPGA Architecture  Technology Mapping for Heterogeneous FPGAs  SAT-based Packing Comparison of Heterogeneous FPGA Architectures Conclusions and Future Work

SAT-Based Packing Motivation  Traditional packing tools, e.g., T-VPack, hard-codes the architecture specification of a SLICEs…. Re-impalement from scratch when architecture changes  Propose a unified implementation of the packers for different architectures: easy to perform architecture exploration! The architecture dependent sub-problem in packing  Structural feasibility checking for a sub-circuit to the SLICE Solution  Solve the problem of validating SLICE packing as a local place&route problem  A SAT solver is used to carry out the validation checking

Example of SAT-Based SLICE Packing Examples of constraints: (for each classes of constraint…) Placement and routing choice variables: U 10 Exclusively constraint: ∨ Presence constraint: ∨ Input/Output constraint: → U 10 Routing constraint: G 0 →out ∧ U 10 ) → U 12

Recap: Overall Synthesis Flow Area weight Setting Cut-based Mapping Area-Balance Trade-off? Y N Post-mapping Area recovery LUT6 M G6 LUT6 MG 6 LUT6 MG 6 LUT6 packing F f g d e h b a c LUT LUT LUT LUT

Outline Motivation and Objectives Methodology for Logic Function Exploration Technology Mapping for Heterogeneous FPGAs Evaluation of Heterogeneous FPGA Architectures Conclusions and Future Work

Experimental Setting Design library parameters [Cong, TODAES ’ 05] Benchmark set: IWLS 2005 Four architectures are compared:  LUT4, LUT4 + macro gate, LUT6, and LUT6 + macro gate  Synthesize the proposed macro-gate by SIS1.2  Delay and area model Interconnect delay is igonired

Delay Comparisons Compared to LUT4, LUT4+MG reduces both logic depth and delay by 9.2%. Compared to LUT6, LUT6+MG reduces delay by 30% while increasing logic depth by 36.5%.  A LUT6 can implement more logics than a macro-gate

Logic Area Comparisons Compared to LUT4, LUT4+MG reduces logic area by 12.5%. Compared to LUT6, LUT6+MG reduces logic area by 16.9%.

Outline Motivation and Objectives Methodology for Logic Function Exploration Technology Mapping for Heterogeneous FPGAs Comparison of Heterogeneous FPGA Architectures Conclusions and Future Work

Conclusions  A novel FPGA architecture with the mixed LUTs and macro- gates is proposed  A synthesis flow for the proposed architecture is implemented  The preliminary experimental results show the effectiveness of the proposed architecture for the area and delay reduction Future Work  Perform the physical design for the synthesized circuits and compare the routing costs, architecture evaluation considering interconnect delay  Study the effectiveness of the power reduction for the proposed architecture  Macro-gates with wider inputs will be examined