Presentation is loading. Please wait.

Presentation is loading. Please wait.

High-Level Synthesis for FPGAs: From Prototyping to Deployment (To appear in IEEE TCAD 2011) Course Presentation By: Murtaza Merchant 1.

Similar presentations


Presentation on theme: "High-Level Synthesis for FPGAs: From Prototyping to Deployment (To appear in IEEE TCAD 2011) Course Presentation By: Murtaza Merchant 1."— Presentation transcript:

1 High-Level Synthesis for FPGAs: From Prototyping to Deployment (To appear in IEEE TCAD 2011) Course Presentation By: Murtaza Merchant 1

2 Overview Introduction Motivation Reasons for failure Historical Background Design strategies for HLS tool Autopilot HLS design flow for FPGAs Platform modeling for FPGAs Advances in Synthesis and optimization Algorithms Integration with Domain-Specific design platforms Results Conclusion 2

3 Introduction Automatic synthesis of high-level description to low-level cycle-accurate (RTL) specifications High level specification Untimed Partially timed Targeted for FPGAs State-of-art C-to- FPGA synthesis solutions targeting multiple application domains 3

4 Why do we need HLS tools? Embedded processors Hardware/Software co-design SoC design complexity Behavioral IP reuse System-level Verification Transaction-level modeling (TLM) Time-to-market Ease of use Code density reduces by 7x-10x 4

5 Reasons for Failure Lack of comprehensive design language support Behavioral HDL was used C and C++ lack the constructs and semantics to represent Design hierarchy Timing Synchronization Concurrency Lack of reusable and portable design specification Functional specification highly tool dependent Lack of satisfactory quality of results (QoR) 5

6 Historical Background Academic effort HAL developed at Bell-Northern Research [DAC ‘86] ADAM system developed at USC [DAC ‘89] Hercules/Hebe HLS system at Stanford [Euro ASIC ‘90] Hyper/Hyper-LP system at UC Berkeley [ICCAD ‘92] Industry efforts Catapult C from Mentor Graphics [2004] C-to-Silicon Compiler from Cadence [2008] Synphony C Compiler from Synopsys [2009] AutoPilot from AutoESL (UCLA xPilot project) [DAC 2009] 6

7 Design strategies for HLS tool Restrict the use of dynamic constructs Pointers Recursion Polymorphism Use of hardware-oriented language extensions HardwareC, SpecC, Handel-C Libraries (SystemC) Efficient Parallel Architectures Allow an optimization-oriented design process Modification Refactoring 7

8 Autopilot HLS design flow for FPGAs State-of-art commercial HLS tool Inputs: High-level language ANSI C, C++, SystemC Outputs: RTL Verilog, VHDL Cycle-accurate SystemC Automatic co-simulation 8

9 Platform modeling for FPGAs Target specific synthesis and optimization Mapping group of operations to platform-specific blocks Prefabricated architecture blocks in FPGA DSP48 blocks BRAMs Component pre-characterization Modeling process Characterize delay/area/power for each hardware resource Select best implementation choice using characterization data 9

10 Advances in Synthesis and optimization Algorithms Efficient mathematical programming formulations to scheduling Soft constraints and applications for platform-based Optimization Pattern mining for efficient sharing Memory analysis and optimizations 10

11 Mathematical programming formulations to scheduling Heuristic: List scheduling Leads to sub-optimal solutions Exact formulations : Integer linear programming Difficult to scale to large designs O(m×n) binary variables to encode a scheduling solution with n operations and m steps System of difference constraint (SDC) Linear- programming formulation Efficient and scalable O(n) variables used to encode a scheduling solution with n operations 11

12 SDC based linear-programming formulation Scheduling variable s i = [0, Lv] for each operation i Represent the time step at which the operation is scheduled Constraint represented in integer-difference form: s i – s j  d ij Generated constraint matrix is totally Unimodular Every square submatrix has a determinant of 0 or ±1 Unimodular matrices guaranteed to have optimal integral solutions No expensive branch-and-bound procedures 12 J. Cong and Z. Zhang, “An efficient and versatile scheduling algorithm based on SDC formulation,” in Proc. DAC'06, pp. 433-438.

13 Representing constraints for SDC Expressed in integer difference form Data dependencies Control dependencies Relative timing in I/O protocols Latency upper-bounds What about resource constraints? Use heuristics Generate pair-wise orderings 13 J. Cong and Z. Zhang, “An efficient and versatile scheduling algorithm based on SDC formulation,” in Proc. DAC'06, pp. 433-438.

14 Soft constraints for multiple design intentions Design intentions are expressed as constraints Strict (hard) constraints limit the ability to handle multiple conflicting design intentions Eliminates improving some aspects of the design with some other reasonable estimated violations Alternative: Use soft constraints allowing some constraints to be violated 14

15 Soft constraints for multiple design intentions Consider scheduling problem with hard and soft constraints G, H: hard and soft constraint matrices Hj: jth row of H v j : violation variable Ø j (v j ) :penalty term to objective function 15

16 Pattern mining for efficient sharing Sharing of functional units, storage units or interconnects by multiple operations in a time-multiplexed manner Need for multiplexers Large multiplexers expensive on FPGA platforms More overhead, than benefit due to sharing Extract common patterns in the data-flow graph Different instances of the same pattern can share resources Graph editing distance used as a metric to measure the similarity two patterns 16 J. Cong and W. Jiang, “Pattern-based behavior synthesis for FPGA resource reduction,” in Proc. FPGA'08, Feb. 2008, pp. 107-116.

17 Pattern mining example Figures (a) and (b): Original DFG and resource binding Figures (c) and (d): DFG and resource binding post pattern mining 17J. Cong and W. Jiang, “Pattern-based behavior synthesis for FPGA resource reduction,” in Proc. FPGA'08, Feb. 2008, pp. 107-116.

18 Memory analysis and optimizations FPGAs performance limited by memory bandwidth Memory partition critical to meet performance target Automatic partitioning of array elements across multiple physical memory blocks necessary for pipelined loops to increase throughput reduce power Capture all possible reference conflicts under partitioning in a conflict graph. Iterative algorithm used to perform scheduling and memory partitioning using the conflict graph. 18 J. Cong, W. Jiang, B. Liu, and Y. Zou, “Automatic memory partitioning and scheduling for throughput and power optimization,” in Proc. ICCAD '09, Nov. 2009, pp. 697-704.

19 Memory partitioning and scheduling for throughput optimization. 19 J. Cong, W. Jiang, B. Liu, and Y. Zou, “Automatic memory partitioning and scheduling for throughput and power optimization,” in Proc. ICCAD '09, Nov. 2009, pp. 697-704

20 Integration with Domain-Specific design platforms Interface cores Tight architecture requirements Implemented in RTL about 5% Processor subsystem HLS generated design 20

21 AutoPilot HLS Results (Sphere decoder) Design targeted to Xilinx Virtex 5 FPGA Application exhibits a large amount of parallelism resource sharing time-division multiplexing AutoPilot has better resource utilization and lesser development time 21

22 Conclusion The latest generation of FPGA HLS tools have made significant progress in: Providing wide language coverage Robust compilation technology Platform-based modeling Synthesis and optimization techniques Domain-specific system-level integration Provides highly competitive quality of results Comparable or better than manual RTL designs Transition from research and investigation to selected deployment 22

23 Challenges Support of memory hierarchy Many applications need access to external memory Lacks support of memory hierarchy Designers exposed to details of bus interfaces and memory controllers Higher-level models Extracts instruction-level and loop level parallelism Difficult to extract task-level parallelism from C/C++ In-System Design validation and debugging RTL-level timing accurate simulation used Debugging at RTL level is complicated Need for most verification and debugging in C domain 23

24 Questions?

25 Advances in Simulation and verification Develop, debug and functionally verify a design at an higher level Reduces verification effort due Easier to trace, identify and fix bugs at higher abstraction level Simulation at the higher level is orders of magnitude faster than RTL simulation More comprehensive tests and greater coverage. 25

26 Automatic co-simulation Direct reuse of the original test framework in C/C++ to verify the correctness of the synthesized RTL C-to-RTL transactor connect high-level interfacing constructs (parameters and global variables) with pin-level signals in RTL Helps designers avoid the timing-consuming manual creation of an RTL test bench 26

27 Equivalence Checking High-level models to RTL checking Require states in the designs to have one-to-one correspondence between flip-flops and latches in the two designs Significant state differences exist between the High level model and RTL model Necessary to use Sequential Logic Equivalence Checking (SEC) 27

28 Sequential Logic Equivalence Checking (SEC) Model extraction from SLM  Extracting the hardware model from SystemC or C/C++ Sequential analysis  Efficient unrolling of finite-state machines (FSMs) to align the SLM and RTL state machines to synchronizing states Bit-level/Word-level solvers  BDD, SAT to address the system level to RTL formal verification Mechanism for specifying temporal mappings at I/Os and state points  SEC tool to automatically infer these mappings  Commercial tool from Calypto design 28 A. Mathur, M. Fujita, E. Clarke, and P. Urard, “Functional equivalence verification tools in high-level synthesis flows,” IEEE Design & Test of Computers, vol. 26(4), pp. 88-95, Dec. 2009.


Download ppt "High-Level Synthesis for FPGAs: From Prototyping to Deployment (To appear in IEEE TCAD 2011) Course Presentation By: Murtaza Merchant 1."

Similar presentations


Ads by Google