SPARK Accelerating ASIC designs through parallelizing high-level synthesis Sumit Gupta Rajesh Gupta

Slides:



Advertisements
Similar presentations
ECOE 560 Design Methodologies and Tools for Software/Hardware Systems Spring 2004 Serdar Taşıran.
Advertisements

MISTY1 Block Cipher Undergrad Team U8 – JK FlipFlop Clark Cianfarini and Garrett Smith.
ECE Synthesis & Verification - Lecture 2 1 ECE 667 Spring 2011 ECE 667 Spring 2011 Synthesis and Verification of Digital Circuits High-Level (Architectural)
TIE Extensions for Cryptographic Acceleration Charles-Henri Gros Alan Keefer Ankur Singla.
High Level Languages: A Comparison By Joel Best. 2 Sources The Challenges of Synthesizing Hardware from C-Like Languages  by Stephen A. Edwards High-Level.
Altera FLEX 10K technology in Real Time Application.
Tuan Tran. What is CISC? CISC stands for Complex Instruction Set Computer. CISC are chips that are easy to program and which make efficient use of memory.
08/31/2001Copyright CECS & The Spark Project SPARK High Level Synthesis System Sumit GuptaTimothy KamMichael KishinevskyShai Rotem Nick SavoiuNikil DuttRajesh.
High-Level Constructors and Estimators Majid Sarrafzadeh and Jason Cong Computer Science Department
Design Automation of Co-Processors for Application Specific Instruction Set Processors Seng Lin Shee.
08/31/2001Copyright CECS & The Spark Project Center for Embedded Computer Systems University of California, Irvine Conditional.
Behavioral Synthesis Outline –Synthesis Procedure –Example –Domain-Specific Synthesis –Silicon Compilers –Example Tools Goal –Understand behavioral synthesis.
1 HW/SW Partitioning Embedded Systems Design. 2 Hardware/Software Codesign “Exploration of the system design space formed by combinations of hardware.
Center for Embedded Computer Systems University of California, Irvine Coordinated Coarse-Grain and Fine-Grain Optimizations.
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
Center for Embedded Computer Systems University of California, Irvine Coordinated Coarse Grain and Fine Grain Optimizations.
08/31/2001Copyright CECS & The Spark Project Center for Embedded Computer Systems University of California, Irvine High-Level.
Center for Embedded Computer Systems Dynamic Conditional Branch Balancing during the High-Level Synthesis of Control-Intensive.
Center for Embedded Computer Systems University of California, Irvine and San Diego SPARK: A C-to-VHDL Parallelizing High-Level.
Validating High-Level Synthesis Sudipta Kundu, Sorin Lerner, Rajesh Gupta Department of Computer Science and Engineering, University of California, San.
Next-generation Chips & Computing with Atoms Igor Markov ACAL / EECS, Univ. of Michigan.
Center for Embedded Computer Systems University of California, Irvine Coordinated Coarse-Grain and Fine-Grain Optimizations.
Center for Embedded Computer Systems University of California, Irvine and San Diego SPARK: A Parallelizing High-Level Synthesis.
Center for Embedded Computer Systems University of California, Irvine and San Diego Hardware and Interface Synthesis of.
Center for Embedded Computer Systems University of California, Irvine SPARK: A High-Level Synthesis Framework for Applying.
Center for Embedded Computer Systems University of California, Irvine Dynamic Common Sub-Expression Elimination during Scheduling.
Transaction Level Modeling Definitions and Approximations Trevor Meyerowitz EE290A Presentation May 12, 2005.
Center for Embedded Computer Systems University of California, Irvine and San Diego Loop Shifting and Compaction for the.
UCB November 8, 2001 Krishna V Palem Proceler Inc. Customization Using Variable Instruction Sets Krishna V Palem CTO Proceler Inc.
Trend towards Embedded Multiprocessors Popular Examples –Network processors (Intel, Motorola, etc.) –Graphics (NVIDIA) –Gaming (IBM, Sony, and Toshiba)
ECE 699: Lecture 2 ZYNQ Design Flow.
DAC 2001: Paper 18.2 Center for Embedded Computer Systems, UC Irvine Center for Embedded Computer Systems University of California, Irvine
Center for Embedded Computer Systems University of California, Irvine and San Diego SPARK: A Parallelizing High-Level Synthesis.
Center for Embedded Computer Systems University of California, Irvine and San Diego SPARK: A Parallelizing High-Level Synthesis.
Study of AES Encryption/Decription Optimizations Nathan Windels.
TM Efficient IP Design flow for Low-Power High-Level Synthesis Quick & Accurate Power Analysis and Optimization Flow JAN Asher Berkovitz Yaniv.
CAD Techniques for IP-Based and System-On-Chip Designs Allen C.-H. Wu Department of Computer Science Tsing Hua University Hsinchu, Taiwan, R.O.C {
1 Designing for 65nm and Beyond Where’s The Revolution ?!? Greg Spirakis Absolutely, positively not working for Intel (or anyone else) EDP 2005.
T. E. Potok - University of Tennessee Software Engineering Dr. Thomas E. Potok Adjunct Professor UT Research Staff Member ORNL.
COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.
XStream: Rapid Generation of Custom Processors for ASIC Designs Binu Mathew * ASIC: Application Specific Integrated Circuit.
Automated Design of Custom Architecture Tulika Mitra
Section 10: Advanced Topics 1 M. Balakrishnan Dept. of Comp. Sci. & Engg. I.I.T. Delhi.
Xilinx Programmable Logic Design Solutions Version 2.1i Designing the Industry’s First 2 Million Gate FPGA Drop-In 64 Bit / 66 MHz PCI Design.
CSE 494: Electronic Design Automation Lecture 2 VLSI Design, Physical Design Automation, Design Styles.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Page 1 Reconfigurable Communications Processor Principal Investigator: Chris Papachristou Task Number: NAG Electrical Engineering & Computer Science.
ESL and High-level Design: Who Cares? Anmol Mathur CTO and co-founder, Calypto Design Systems.
The Macro Design Process The Issues 1. Overview of IP Design 2. Key Features 3. Planning and Specification 4. Macro Design and Verification 5. Soft Macro.
USC Search Space Properties for Pipelined FPGA Applications University of Southern California Information Sciences Institute Heidi Ziegler, Mary Hall,
- 1 - EE898_HW/SW Partitioning Hardware/software partitioning  Functionality to be implemented in software or in hardware? No need to consider special.
Embedded Systems Design: A Unified Hardware/Software Introduction 1 Chapter 3 General-Purpose Processors: Software.
EE3A1 Computer Hardware and Digital Design
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 1 Bundled Execution.
Evaluating and Improving an OpenMP-based Circuit Design Tool Tim Beatty, Dr. Ken Kent, Dr. Eric Aubanel Faculty of Computer Science University of New Brunswick.
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
Introduction to VHDL Simulation … Synthesis …. The digital design process… Initial specification Block diagram Final product Circuit equations Logic design.
IMPLEMENTATION OF MIPS 64 WITH VERILOG HARDWARE DESIGN LANGUAGE BY PRAMOD MENON CET520 S’03.
ECE 551: Digital System Design & Synthesis Motivation and Introduction Lectures Set 1 (3 Lectures)
CML Path Selection based Branching for CGRAs ShriHari RajendranRadhika Thesis Committee : Prof. Aviral Shrivastava (Chair) Prof. Jennifer Blain Christen.
Lecture 1: Introduction CprE 585 Advanced Computer Architecture, Fall 2004 Zhao Zhang.
ECE 587 Hardware/Software Co- Design Lecture 23 LLVM and xPilot Professor Jia Wang Department of Electrical and Computer Engineering Illinois Institute.
K-Nearest Neighbor Digit Recognition ApplicationDomainConstraintsKernels/Algorithms Voice Removal and Pitch ShiftingAudio ProcessingLatency (Real-Time)FFT,
DAC50, Designer Track, 156-VB543 Parallel Design Methodology for Video Codec LSI with High-level Synthesis and FPGA-based Platform Kazuya YOKOHARI, Koyo.
ASIC Design Methodology
ECE 699: Lecture 3 ZYNQ Design Flow.
Architectural-Level Synthesis
HIGH LEVEL SYNTHESIS.
ECE 448 Lecture 6 Finite State Machines State Diagrams, State Tables, Algorithmic State Machine (ASM) Charts, and VHDL code ECE 448 – FPGA and ASIC Design.
Digital Designs – What does it take
Presentation transcript:

SPARK Accelerating ASIC designs through parallelizing high-level synthesis Sumit Gupta Rajesh Gupta

© 2003 Spark Team, Confidential 2 Outline The target The problem The technology The competition The market opportunity The people The status The plan

3 A Chip Is A Wonderful Thing! A typical chip, circa: 2006 l50 square millimeters l50 million transistors l1-10 GHz, MOP/sq mm, MIPS/mW l300 mm, 10,000 units/wafer, 20K wafers/month l$5 per part Does not matter what you build lProcessor, MEMS, Networking, Wireless, Memory nBut it takes $20M to build one today, going to $50+M lSo there is a strong incentive to port your application, system, box to the “chip”

4 But Design Decisions Matter!

© 2003 Spark Team, Confidential 5 Technical Target Anyone and everyone with a technology IP to grind (build on-chip) –E.g., WLAN, Cellphone Chips: about 50 GOPS in BB processing –and about 72 other application ‘markets’ enhanced by ASIC/FPGA parts More technically –Behavioral descriptions with complex and nested conditionals and loops.

© 2003 Spark Team, Confidential 6 The Problem Doing chip design in a system house is increasingly a costly proposition –Case Study: Conexant in a chip 9 month from PRD to parts 7 months from PRD to synthesizable RTL The pain is in getting the algorithmic right for the chip implementation Would love a “compiler” –but “push-buttons” just do not work.

© 2003 Spark Team, Confidential 7 Enter High-Level Synthesis Task Analysis HW/SW Partitioning ASIC Processor Core Memory FPGA I/O Hardware Behavioral Description Software Behavioral Description Software Compiler High Level Synthesis

© 2003 Spark Team, Confidential 8 Poor QOR, even Poor Controllability M e m o r y ALU Control Data path d = e - fg = h + i If Node TF c x = a + b c = a < b j = d x g l = e + x x = a + b; c = a < b; if (c) then d = e – f; else g = h + i; j = d x g; l = e + x;

© 2003 Spark Team, Confidential 9 The Technology: Enter SPARK C Input VHDL Output Original CDFG Optimized CDFG Scheduling & Binding Source-Level Compiler Transformations Scheduling Compiler & Dynamic Transformations By the time you got to CDFG, it is already too late Parallelize (judiciously) and submerge it with HLS.

© 2003 Spark Team, Confidential 10 Why SPARK, Why Now? The chip designer is finally –letting go of the cycle boundary in design –being replaced by non-chip types Education and awareness through –Synopsys Behavioral Compiler –But not ready to be the dominator… SPARK changes the landscape –Parallelizing compilation as the ‘power tool’

© 2003 Spark Team, Confidential 11 SPARK Core Strengths Focus on –Transformations that increase amount of parallelism available in the source description –Tightly integrate with parallelizing compiler transformations Provide a HLS Toolbox for the micro- architect –Fire the circuit designer.

© 2003 Spark Team, Confidential 12 The POC and The Experiments Intel ILD design –Produced a design that fundamentally restructures the input description (the way a designer would, and no tool could) Bunch of other media benchmarks –40-70% improvement in delay for the same area –Based on Synopsys backend See appendix.

© 2003 Spark Team, Confidential 13 The Market Opportunity The big picture –Semi is $140B, Fabless Semi is $15B –EDA currently is about $4B Current EDA market –$1B Synthesis and verification $400M synthesis, $400M verification, $200M E. –$3B in PDA, IP and Design Services. $400M Synthesis –90% is RTL and below. Market movement and ‘structural’ changes.

© 2003 Spark Team, Confidential 14 Future ESL and Synthesis Market Keys to growth –ASIC focus (including structured ASICS) –‘Power tool’ key to commanding high ASPs Challenge –The raid of the FPGAs In which case, PHLS will be OEM’d –ASICs mired in Nano swamp Attention shifts to PDA, stationary semi market

© 2003 Spark Team, Confidential 15 The Competition The early educator: Synopsys BC –Classical HLS that just does not work, fundamentally flawed The improviser: Cadence Get2Chip A2C –Done a good job at RTL The others –Celoxica, Forte, Synfora, BlueSpec –“Boutiques” primarily targeted for “somebody else”

Synopsys Behav. Compiler Traditional HLS: Synthesis from subset of SystemC and Behav VHDL No parallelizing and beyond basic block (BBB) transformations Cadence/Get2ChipA2C Traditional HLS; closely tied to logic synthesis No parallelizing and BBB trafos Celoxica DK Design Suite Uses explicitly parallelized input in Handel-C; traditional HLS No pure behavioral input such as C or SystemC Forte DS Cynthesiz er Traditional HLS from SystemC with design space exploration No parallel and BBB trafos SynforaNA Maps applications to a VLIW processor and a pipelined array of processors – uses parallelizing transformations in VLIW compiler Does not do HLS at all – it’s more of a mapping tool from C to a processor array BlueSpecNA Based on term rewriting systems; starts from a description closer to RTL than to behav Not HLS – input is behav code already scheduled into states The Competition

© 2003 Spark Team, Confidential 17 What Do We Want To Do? Make it accessible to SystemC, SystemVerilog –Front end architecture to port it across Implement missing compiler passes –Really standard stuff but missing piece now Work out a design flow –Build a path to existing RTL flow incl. validation Industry strength characterization Secure IP rights

© 2003 Spark Team, Confidential 18 Synergistic Activities SPARK release on the web –Mailing list –Build the users group –Expand to SystemC User Community Kluwer book in preparation –Announcement at DATE, Feb 2004 –Availability at DAC, June 2004

© 2003 Spark Team, Confidential 19 Exit Strategy Not yet worked out, but… Build a stand-alone EDA company –As a standalone it would not work unless complemented by verification Build to be bought –As an HLS company License technology –Companies that have shown interest in licensing it Poseidon Systems, Cadence

© 2003 Spark Team, Confidential 20 SPARK History A joint project –Rajesh Gupta, Nikil Dutt, Alex Nicolau Kicked off in Fall 1999 –First Ph.D., Sumit Gupta, 2003 Supported by –Semiconductor Research Corporation, SRC –Intel grant as a match to UC Micro –National Science Foundation.

Copyright Sumit Gupta Case Study: Intel Instruction Length Decoder Stream of Instructions Instruction Length Decoder First Insn Second Insn Third Instruction Instruction Buffer

Copyright Sumit Gupta ILD Synthesis: Resulting Architecture Speculate Operations, Fully Unroll Loop, Eliminate Loop Index Variable Multi-cycle Sequential Architecture Multi-cycle Sequential Architecture Single cycle Parallel Architecture Single cycle Parallel Architecture Our toolbox approach enables us to develop a script to synthesize applications from different domains Our toolbox approach enables us to develop a script to synthesize applications from different domains Final design looks close to the actual implementation done by Intel Final design looks close to the actual implementation done by Intel

Copyright Sumit Gupta Target Applications Design # of Ifs # of Loops # Non-Empty Basic Blocks # of Operations MPEG-1 pred MPEG-1 pred MPEG-2 dp_frame GIMPtiler

Copyright Sumit Gupta Speculative Code Motions + Pre-Synthesis Transforms + Dynamic CSE Scheduling & Logic Synthesis Results Non-speculative CMs: Within BBs & Across Hier Blocks 42% 10% 36% 8% 39% Overall: % improvement in Delay Almost constant Area

Copyright Sumit Gupta Non-speculative CMs: Within BBs & Across Hier Blocks + Speculative Code Motions + Pre-Synthesis Transforms + Dynamic CSE Scheduling & Logic Synthesis Results 14% 20% 1% 33% 41% 52% Overall: % improvement in Delay Almost constant Area