1 General-Purpose Languages, High-Level Synthesis John Sanguinetti High-Level Modeling
2 Gate-level Design Verilog Schematic Gate Sim Verilog Gate-Level 1982 ARCHITECTURE VERIFICATION IMPLEMENTATION Gate-level design by schematics Gate-level verification in netlist simulators Architecture moves up to Verilog
3 Register-transfer Level Logic synthesis enabled more abstract design Verilog architectural language used for RTL design Architecture moves up to C++ Logic Synthesis 1982 Verilog Gate-Level Verilog C++ Verilog C++ RTL 1992 Gate Sim Verilog Schematic ARCHITECTURE VERIFICATION IMPLEMENTATION
4 Verilog C++ Verilog C++ RTL Higher-level Current architecture language (C++) will emerge as next design language Practical high- level synthesis in C++ will trigger the change Verilog Schematic Gate Sim Verilog Gate-Level C++ High Level 2002 High-level Synthesis ARCHITECTURE VERIFICATION IMPLEMENTATION
5 The Problem: Lack of Tools RTL Architecture Gate Algorithm Modular decomposition Structural elaboration Cycle timing Resource allocation Logic synthesis Gates C++ High-level Model Paper Spec RTL Implementation Model HDL IP Starting point is GPL (C++) Entry point to backend is HDL/RTL Refinement is manual Only GPL users: academics lunatic fringe
6 GPL candidates SpecC HandleC Java C++/Cynlib C++/SystemC Extended SystemC Weve made good progress
7 HLS: The Promise High-level Synthesis Enables higher levels of design abstraction Connects the starting point with the ending point Allows architectural exploration Eases technology process migration Achieves better results with less effort Enables faster simulation and design debugging at the behavioral level
8 HLS: The Experience Behavioral synthesis was not successful QOR marginal Hard to use, non-intuitive Results nearly impossible to verify Poisoned the market What went wrong? Started with the wrong input Point tool solution for a design flow problem
9 HLS: The Future is Now … High-level Synthesis We have the right starting point We can use a common test bench We can keep the interfaces constant We can produce RTL which meets timing constraints
10 CynthHL Design Flow Automatic generation of verifiable RTL from architectural C++ Single verification environment for entire design flow RTL Architecture Gate Modular decomposition Structural elaboration Cycle timing Resource allocation Logic synthesis Gates Algorithm Automatically synthesized RTL Implementation C++ to HDL CynthHL Algorithm Constraints Protocol
11 Design Exploration Typical architectural questions: What goes in hardware? software? How many data path elements? How wide should data paths be? What protocols should be used? How deep should pipelines be? More interesting: Whats the lowest distortion for a given die size? Whats the minimum area for a target frame rate? How much can I increase the signal-to-noise ratio with a 10% area increase?
12 Design Exploration These are all speed vs. area tradeoffs Speed: latency, throughput Area: how much parallel hardware Answers arent available until RTL has been produced Most answers require multiple implementation data points => Evaluating an architectural decision is very expensive
13 AES Encryption Algorithm Starting point 386 lines C++/ESC Module, testbench in ESC: Input key & block length, then key Input plain-text block Output encrypted block Goal: fastest design in minimum area Design exploration Unroll loops Enabled constant propagation Increased number of FSM states Decrease latency Increase functional units Decrease number of FSM states Result 3,917 lines of RTL 32 functional units 5 ROMs 100 registers Net result: 5x speed-up 1.2x size increase
14 Image Compression Algorithm Computationally intensive 753 lines of C++/ESC Memory-intensive Hard speed constraint 15 ms/frame 8 17ns clock I/O interface is not defined Includes testbench and golden results
15 Initial analysis Throughput requirement faster than latency allows Suggests some form of pipelining will be needed Loops in algorithm should be restructured to have only one inner loop Critical performance issue Memory usage, not operations (+, *, etc.) Pipelining makes memory usage more intense Input, rgb2yc Vertical filter Horizontal filter Output
16 Design Exploration With each architecture Synthesize one or more RTL implementations Use CynthHLs output to determine critical issues (memory vs. operations) Verify with same testbench Net result: 10,682 lines Verilog/RTL 108 functional units 9 RAMs, 1 ROM 160 registers Run time: 33.8s 15 ms / frame Merge loops Modify memory architecture Pipeline Verify each transformation
17 Synergy General-purpose programming language High-level synthesis Together, high-level design is a reality Separately, they are just curiosities SystemC + CynthHL = High-level Design
18 General-Purpose Languages, High-Level Synthesis John Sanguinetti High-Level Modeling
19 CynthHL Product Status Currently in Beta Beta released January 2002 Available Today – Everything you need: Synthesis to correct gates Design exploration Predictable timing helps timing closure Beta period being used to improve usability in real world design and verification flows Official product announcement 2H2002
20 Integrated Design and Verification Environment System-level model used for verification Spend time at the algorithmic level to get it right Reuse verification environment at lower levels Same TB used for algorithm and synthesized RTL System-level model used for design Once the architecture is verified, automatically create RTL implementation(s) Explore trade-offs between design goals by creating multiple implementations Automated path to CORRECT gates