High-Level Synthesis for FPGAs: From Prototyping to Deployment (To appear in IEEE TCAD 2011) Course Presentation By: Murtaza Merchant 1.

Slides:



Advertisements
Similar presentations
Copyright 2000 Cadence Design Systems. Permission is granted to reproduce without modification. Introduction An overview of formal methods for hardware.
Advertisements

1 General-Purpose Languages, High-Level Synthesis John Sanguinetti High-Level Modeling.
Hardware/ Software Partitioning 2011 年 12 月 09 日 Peter Marwedel TU Dortmund, Informatik 12 Germany Graphics: © Alexandra Nolte, Gesine Marwedel, 2003 These.
ECE-777 System Level Design and Automation Hardware/Software Co-design
ECOE 560 Design Methodologies and Tools for Software/Hardware Systems Spring 2004 Serdar Taşıran.
High Level Languages: A Comparison By Joel Best. 2 Sources The Challenges of Synthesizing Hardware from C-Like Languages  by Stephen A. Edwards High-Level.
A reconfigurable system featuring dynamically extensible embedded microprocessor, FPGA, and customizable I/O Borgatti, M. Lertora, F. Foret, B. Cali, L.
Evolution and History of Programming Languages Software/Hardware/System.
Graduate Computer Architecture I Lecture 15: Intro to Reconfigurable Devices.
Design For Verification Synopsys Inc, April 2003.
Presenter : Yeh Chi-Tsai System-on-chip validation using UML and CWL Qiang Zhu 1, Ryosuke Oish 1, Takashi Hasegawa 2, Tsuneo Nakata 1 1 Fujitsu Laboratories.
1 HW/SW Partitioning Embedded Systems Design. 2 Hardware/Software Codesign “Exploration of the system design space formed by combinations of hardware.
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
ECE Synthesis & Verification1 ECE 667 Spring 2011 Synthesis and Verification of Digital Systems Verification Introduction.
Transaction Level Modeling Definitions and Approximations Trevor Meyerowitz EE290A Presentation May 12, 2005.
Codesign Framework Parts of this lecture are borrowed from lectures of Johan Lilius of TUCS and ASV/LL of UC Berkeley available in their web.
November 18, 2004 Embedded System Design Flow Arkadeb Ghosal Alessandro Pinto Daniele Gasperini Alberto Sangiovanni-Vincentelli
ECE 699: Lecture 2 ZYNQ Design Flow.
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.
Mahapatra-Texas A&M-Fall'001 Codesign Framework Parts of this lecture are borrowed from lectures of Johan Lilius of TUCS and ASV/LL of UC Berkeley available.
Center for Embedded Computer Systems University of California, Irvine and San Diego SPARK: A Parallelizing High-Level Synthesis.
Network-on-Chip: Communication Synthesis Department of Computer Science Texas A&M University.
Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Hardware/software partitioning  Functionality to be implemented in software.
(1) Introduction © Sudhakar Yalamanchili, Georgia Institute of Technology, 2006.
© 2011 Xilinx, Inc. All Rights Reserved Intro to System Generator This material exempt per Department of Commerce license exception TSU.
© Copyright Alvarion Ltd. Hardware Acceleration February 2006.
Delevopment Tools Beyond HDL
Role of Standards in TLM driven D&V Methodology
1 Chapter 2. The System-on-a-Chip Design Process Canonical SoC Design System design flow The Specification Problem System design.
TM Efficient IP Design flow for Low-Power High-Level Synthesis Quick & Accurate Power Analysis and Optimization Flow JAN Asher Berkovitz Yaniv.
EECE **** Embedded System Design
Chap. 1 Overview of Digital Design with Verilog. 2 Overview of Digital Design with Verilog HDL Evolution of computer aided digital circuit design Emergence.
CAD Techniques for IP-Based and System-On-Chip Designs Allen C.-H. Wu Department of Computer Science Tsing Hua University Hsinchu, Taiwan, R.O.C {
ASIC/FPGA design flow. FPGA Design Flow Detailed (RTL) Design Detailed (RTL) Design Ideas (Specifications) Design Ideas (Specifications) Device Programming.
Extreme Makeover for EDA Industry
Automated Design of Custom Architecture Tulika Mitra
Section 10: Advanced Topics 1 M. Balakrishnan Dept. of Comp. Sci. & Engg. I.I.T. Delhi.
Robust Low Power VLSI ECE 7502 S2015 Fault Diagnosis and Logic Debugging Using Boolean Satisfiability ECE 7502 Class Discussion Benjamin Melton Thursday.
Using Formal Verification to Exhaustively Verify SoC Assemblies by Mark Handover Kenny Ranerup Applications Engineer ASIC Consultant Mentor Graphics Corp.
HW/SW PARTITIONING OF FLOATING POINT SOFTWARE APPLICATIONS TO FIXED - POINTED COPROCESSOR CIRCUITS - Nalini Kumar Gaurav Chitroda Komal Kasat.
1 H ardware D escription L anguages Modeling Digital Systems.
CSE 494: Electronic Design Automation Lecture 2 VLSI Design, Physical Design Automation, Design Styles.
IEEE ICECS 2010 SysPy: Using Python for processor-centric SoC design Evangelos Logaras Elias S. Manolakos {evlog, Department of Informatics.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Hardware/Software Co-design Design of Hardware/Software Systems A Class Presentation for VLSI Course by : Akbar Sharifi Based on the work presented in.
Chonnam national university VLSI Lab 8.4 Block Integration for Hard Macros The process of integrating the subblocks into the macro.
ESL and High-level Design: Who Cares? Anmol Mathur CTO and co-founder, Calypto Design Systems.
- 1 - EE898_HW/SW Partitioning Hardware/software partitioning  Functionality to be implemented in software or in hardware? No need to consider special.
EE 466/586 VLSI Design Partha Pande School of EECS Washington State University
MILAN: Technical Overview October 2, 2002 Akos Ledeczi MILAN Workshop Institute for Software Integrated.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
An Overview of Hardware Design Methodology Ian Mitchelle De Vera.
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
ECE-C662 Lecture 2 Prawat Nagvajara
1 Hardware/Software Co-Design Final Project Emulation on Distributed Simulation Co-Verification System 陳少傑 教授 R 黃鼎鈞 R 尤建智 R 林語亭.
Hardware Accelerator for Hot-word Recognition Gautam Das Govardan Jonathan Mathews Wasim Shaikh Mojes Koli.
Teaching The Principles Of System Design, Platform Development and Hardware Acceleration Tim Kranich
FPGA Hardware Synthesis Jessica Baxter. Reference M. Haldar, A. Nayak, N. Shenoy, A. Choudhary and P. Banerjee, “FPGA Hardware Synthesis from MATLAB”,
Multi-objective Topology Synthesis and FPGA Prototyping Framework of Application Specific Network-on-Chip m Akram Ben Ahmed Xinyu LI, Omar Hammami.
ECE 587 Hardware/Software Co- Design Lecture 23 LLVM and xPilot Professor Jia Wang Department of Electrical and Computer Engineering Illinois Institute.
Problem: design complexity advances in a pace that far exceeds the pace in which verification technology advances. More accurately: (verification complexity)
K-Nearest Neighbor Digit Recognition ApplicationDomainConstraintsKernels/Algorithms Voice Removal and Pitch ShiftingAudio ProcessingLatency (Real-Time)FFT,
CoDeveloper Overview Updated February 19, Introducing CoDeveloper™  Targeting hardware/software programmable platforms  Target platforms feature.
April 15, 2013 Atul Kwatra Principal Engineer Intel Corporation Hardware/Software Co-design using SystemC/TLM – Challenges & Opportunities ISCUG ’13.
System-on-Chip Design
ASIC Design Methodology
Reconfigurable Computing
ECE-C662 Introduction to Behavioral Synthesis Knapp Text Ch
ECE 699: Lecture 3 ZYNQ Design Flow.
Digital Designs – What does it take
Presentation transcript:

High-Level Synthesis for FPGAs: From Prototyping to Deployment (To appear in IEEE TCAD 2011) Course Presentation By: Murtaza Merchant 1

Overview Introduction Motivation Reasons for failure Historical Background Design strategies for HLS tool Autopilot HLS design flow for FPGAs Platform modeling for FPGAs Advances in Synthesis and optimization Algorithms Integration with Domain-Specific design platforms Results Conclusion 2

Introduction Automatic synthesis of high-level description to low-level cycle-accurate (RTL) specifications High level specification Untimed Partially timed Targeted for FPGAs State-of-art C-to- FPGA synthesis solutions targeting multiple application domains 3

Why do we need HLS tools? Embedded processors Hardware/Software co-design SoC design complexity Behavioral IP reuse System-level Verification Transaction-level modeling (TLM) Time-to-market Ease of use Code density reduces by 7x-10x 4

Reasons for Failure Lack of comprehensive design language support Behavioral HDL was used C and C++ lack the constructs and semantics to represent Design hierarchy Timing Synchronization Concurrency Lack of reusable and portable design specification Functional specification highly tool dependent Lack of satisfactory quality of results (QoR) 5

Historical Background Academic effort HAL developed at Bell-Northern Research [DAC ‘86] ADAM system developed at USC [DAC ‘89] Hercules/Hebe HLS system at Stanford [Euro ASIC ‘90] Hyper/Hyper-LP system at UC Berkeley [ICCAD ‘92] Industry efforts Catapult C from Mentor Graphics [2004] C-to-Silicon Compiler from Cadence [2008] Synphony C Compiler from Synopsys [2009] AutoPilot from AutoESL (UCLA xPilot project) [DAC 2009] 6

Design strategies for HLS tool Restrict the use of dynamic constructs Pointers Recursion Polymorphism Use of hardware-oriented language extensions HardwareC, SpecC, Handel-C Libraries (SystemC) Efficient Parallel Architectures Allow an optimization-oriented design process Modification Refactoring 7

Autopilot HLS design flow for FPGAs State-of-art commercial HLS tool Inputs: High-level language ANSI C, C++, SystemC Outputs: RTL Verilog, VHDL Cycle-accurate SystemC Automatic co-simulation 8

Platform modeling for FPGAs Target specific synthesis and optimization Mapping group of operations to platform-specific blocks Prefabricated architecture blocks in FPGA DSP48 blocks BRAMs Component pre-characterization Modeling process Characterize delay/area/power for each hardware resource Select best implementation choice using characterization data 9

Advances in Synthesis and optimization Algorithms Efficient mathematical programming formulations to scheduling Soft constraints and applications for platform-based Optimization Pattern mining for efficient sharing Memory analysis and optimizations 10

Mathematical programming formulations to scheduling Heuristic: List scheduling Leads to sub-optimal solutions Exact formulations : Integer linear programming Difficult to scale to large designs O(m×n) binary variables to encode a scheduling solution with n operations and m steps System of difference constraint (SDC) Linear- programming formulation Efficient and scalable O(n) variables used to encode a scheduling solution with n operations 11

SDC based linear-programming formulation Scheduling variable s i = [0, Lv] for each operation i Represent the time step at which the operation is scheduled Constraint represented in integer-difference form: s i – s j  d ij Generated constraint matrix is totally Unimodular Every square submatrix has a determinant of 0 or ±1 Unimodular matrices guaranteed to have optimal integral solutions No expensive branch-and-bound procedures 12 J. Cong and Z. Zhang, “An efficient and versatile scheduling algorithm based on SDC formulation,” in Proc. DAC'06, pp

Representing constraints for SDC Expressed in integer difference form Data dependencies Control dependencies Relative timing in I/O protocols Latency upper-bounds What about resource constraints? Use heuristics Generate pair-wise orderings 13 J. Cong and Z. Zhang, “An efficient and versatile scheduling algorithm based on SDC formulation,” in Proc. DAC'06, pp

Soft constraints for multiple design intentions Design intentions are expressed as constraints Strict (hard) constraints limit the ability to handle multiple conflicting design intentions Eliminates improving some aspects of the design with some other reasonable estimated violations Alternative: Use soft constraints allowing some constraints to be violated 14

Soft constraints for multiple design intentions Consider scheduling problem with hard and soft constraints G, H: hard and soft constraint matrices Hj: jth row of H v j : violation variable Ø j (v j ) :penalty term to objective function 15

Pattern mining for efficient sharing Sharing of functional units, storage units or interconnects by multiple operations in a time-multiplexed manner Need for multiplexers Large multiplexers expensive on FPGA platforms More overhead, than benefit due to sharing Extract common patterns in the data-flow graph Different instances of the same pattern can share resources Graph editing distance used as a metric to measure the similarity two patterns 16 J. Cong and W. Jiang, “Pattern-based behavior synthesis for FPGA resource reduction,” in Proc. FPGA'08, Feb. 2008, pp

Pattern mining example Figures (a) and (b): Original DFG and resource binding Figures (c) and (d): DFG and resource binding post pattern mining 17J. Cong and W. Jiang, “Pattern-based behavior synthesis for FPGA resource reduction,” in Proc. FPGA'08, Feb. 2008, pp

Memory analysis and optimizations FPGAs performance limited by memory bandwidth Memory partition critical to meet performance target Automatic partitioning of array elements across multiple physical memory blocks necessary for pipelined loops to increase throughput reduce power Capture all possible reference conflicts under partitioning in a conflict graph. Iterative algorithm used to perform scheduling and memory partitioning using the conflict graph. 18 J. Cong, W. Jiang, B. Liu, and Y. Zou, “Automatic memory partitioning and scheduling for throughput and power optimization,” in Proc. ICCAD '09, Nov. 2009, pp

Memory partitioning and scheduling for throughput optimization. 19 J. Cong, W. Jiang, B. Liu, and Y. Zou, “Automatic memory partitioning and scheduling for throughput and power optimization,” in Proc. ICCAD '09, Nov. 2009, pp

Integration with Domain-Specific design platforms Interface cores Tight architecture requirements Implemented in RTL about 5% Processor subsystem HLS generated design 20

AutoPilot HLS Results (Sphere decoder) Design targeted to Xilinx Virtex 5 FPGA Application exhibits a large amount of parallelism resource sharing time-division multiplexing AutoPilot has better resource utilization and lesser development time 21

Conclusion The latest generation of FPGA HLS tools have made significant progress in: Providing wide language coverage Robust compilation technology Platform-based modeling Synthesis and optimization techniques Domain-specific system-level integration Provides highly competitive quality of results Comparable or better than manual RTL designs Transition from research and investigation to selected deployment 22

Challenges Support of memory hierarchy Many applications need access to external memory Lacks support of memory hierarchy Designers exposed to details of bus interfaces and memory controllers Higher-level models Extracts instruction-level and loop level parallelism Difficult to extract task-level parallelism from C/C++ In-System Design validation and debugging RTL-level timing accurate simulation used Debugging at RTL level is complicated Need for most verification and debugging in C domain 23

Questions?

Advances in Simulation and verification Develop, debug and functionally verify a design at an higher level Reduces verification effort due Easier to trace, identify and fix bugs at higher abstraction level Simulation at the higher level is orders of magnitude faster than RTL simulation More comprehensive tests and greater coverage. 25

Automatic co-simulation Direct reuse of the original test framework in C/C++ to verify the correctness of the synthesized RTL C-to-RTL transactor connect high-level interfacing constructs (parameters and global variables) with pin-level signals in RTL Helps designers avoid the timing-consuming manual creation of an RTL test bench 26

Equivalence Checking High-level models to RTL checking Require states in the designs to have one-to-one correspondence between flip-flops and latches in the two designs Significant state differences exist between the High level model and RTL model Necessary to use Sequential Logic Equivalence Checking (SEC) 27

Sequential Logic Equivalence Checking (SEC) Model extraction from SLM  Extracting the hardware model from SystemC or C/C++ Sequential analysis  Efficient unrolling of finite-state machines (FSMs) to align the SLM and RTL state machines to synchronizing states Bit-level/Word-level solvers  BDD, SAT to address the system level to RTL formal verification Mechanism for specifying temporal mappings at I/Os and state points  SEC tool to automatically infer these mappings  Commercial tool from Calypto design 28 A. Mathur, M. Fujita, E. Clarke, and P. Urard, “Functional equivalence verification tools in high-level synthesis flows,” IEEE Design & Test of Computers, vol. 26(4), pp , Dec