February 13, 2008L04-1http://csg.csail.mit.edu/6.375 Bluespec: The need for a new design methodology Arvind Computer Science & Artificial Intelligence.

Slides:



Advertisements
Similar presentations
1 General-Purpose Languages, High-Level Synthesis John Sanguinetti High-Level Modeling.
Advertisements

ECOE 560 Design Methodologies and Tools for Software/Hardware Systems Spring 2004 Serdar Taşıran.
An EHR based methodology for Concurrency management Arvind (with Asif Khan) Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
March, 2007http://csg.csail.mit.edu/arvind802.11a-1 Architectural Exploration: Area-Performance tradeoff in a Transmitter Arvind Computer Science.
March 2007http://csg.csail.mit.edu/arvindSemantics-1 Scheduling Primitives for Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts.
1 A Case for Teaching Parallel Programming to Freshmen Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology Workshop.
February 28 – March 3, 2011 Stepwise Refinement and Reuse: The Key to ESL Ashok B. Mehta Senior Manager (DTP/SJDMP) TSMC Technology, Inc. Mark Glasser.
May 27, 2008 L1-1 Why formal verification remains on the fringes of commercial development Arvind Computer Science & Artificial.
June 5, Architectural Exploration: a Transmitter Arvind, Nirav Dave, Steve Gerding, Mike Pellauer Computer Science & Artificial Intelligence.
Digital Signal Processing and Field Programmable Gate Arrays By: Peter Holko.
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
Chapter 17 Parallel Processing.
November 18, 2004 Embedded System Design Flow Arkadeb Ghosal Alessandro Pinto Daniele Gasperini Alberto Sangiovanni-Vincentelli
6/30/2015HY220: Ιάκωβος Μαυροειδής1 Moore’s Law Gordon Moore (co-founder of Intel) predicted in 1965 that the transistor density of semiconductor chips.
1 RAMP Infrastructure Krste Asanovic UC Berkeley RAMP Tutorial, ISCA/FCRC, San Diego June 10, 2007.
Embedded Systems Design at Mentor. Platform Express Drag and Drop Design in Minutes IP Described In XML Databook s Simple System Diagrams represent complex.
GallagherP188/MAPLD20041 Accelerating DSP Algorithms Using FPGAs Sean Gallagher DSP Specialist Xilinx Inc.
(1) Introduction © Sudhakar Yalamanchili, Georgia Institute of Technology, 2006.
L : Complex Digital Systems Lecturer: Arvind TA: Richard S. Uhler Administration: Sally Lee February 2, 2011.
Constructive Computer Architecture Tutorial 4: SMIPS on FPGA Andy Wright 6.S195 TA October 7, 2013http://csg.csail.mit.edu/6.s195T04-1.
6.375: Complex Digital Systems Lecturer: Arvind TA: Richard S. Uhler Administration: Sally Lee February 6, 2013http://csg.csail.mit.edu/6.375 L01-1.
Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of.
On “Control Adaptivity”: how to sell the idea of Atomic Transactions to the HW design community Rishiyur Nikhil Bluespec, Inc. WG 2.8 meeting, June 16,
December 10, 2009 L29-1 The Semantics of Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute.
Chap. 1 Overview of Digital Design with Verilog. 2 Overview of Digital Design with Verilog HDL Evolution of computer aided digital circuit design Emergence.
CAD Techniques for IP-Based and System-On-Chip Designs Allen C.-H. Wu Department of Computer Science Tsing Hua University Hsinchu, Taiwan, R.O.C {
February 4, 2009http://csg.csail.mit.edu/6.375/L Complex Digital System Spring 2009 Lecturer:Arvind TA: K. Elliott Fleming Assistant: Sally Lee.
SystemC: A Complete Digital System Modeling Language: A Case Study Reni Rambus Inc.
Extreme Makeover for EDA Industry
February 22, 2005http://csg.csail.mit.edu/6.884/L07-1 Bluespec-1: Design Affects Everything Arvind Computer Science & Artificial Intelligence Lab Massachusetts.
Bounded Dataflow Networks and Latency Insensitive Circuits Cont… Arvind Computer Science and Artificial Intelligence Laboratory MIT Based on the work of.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Chonnam national university VLSI Lab 8.4 Block Integration for Hard Macros The process of integrating the subblocks into the macro.
Catapult™ C Synthesis Crossing the Gap between Algorithm and Hardware Architecture Mac Moore North American Product Specialist Advanced Synthesis Solutions.
ESL and High-level Design: Who Cares? Anmol Mathur CTO and co-founder, Calypto Design Systems.
F. Gharsalli, S. Meftali, F. Rousseau, A.A. Jerraya TIMA laboratory 46 avenue Felix Viallet Grenoble Cedex - France Embedded Memory Wrapper Generation.
February 14, 2007L04-1http://csg.csail.mit.edu/6.375/ Bluespec-1: Design methods to facilitate rapid growth of SoCs Arvind Computer Science & Artificial.
September 3, 2009L02-1http://csg.csail.mit.edu/korea Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial.
Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Folded Combinational Circuits as an example of Sequential Circuits Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
The Macro Design Process The Issues 1. Overview of IP Design 2. Key Features 3. Planning and Specification 4. Macro Design and Verification 5. Soft Macro.
March, 2007Intro-1http://csg.csail.mit.edu/arvind Design methods to facilitate rapid growth of SoCs Arvind Computer Science & Artificial Intelligence Lab.
1 Extending FPGA Verification Through The PLI Charles Howard Senior Research Engineer Southwest Research Institute San Antonio, Texas (210)
November 29, 2011 Final Presentation. Team Members Troy Huguet Computer Engineer Post-Route Testing Parker Jacobs Computer Engineer Post-Route Testing.
1 Modular Refinement of H.264 Kermin Fleming. 2 What is H.264? Mobile Devices Low bit-rate Video Decoder –Follow on to MPEG-2 and H.26x Operates on pixel.
System-on-Chip Design Hao Zheng Comp Sci & Eng U of South Florida 1.
2/19/2016http://csg.csail.mit.edu/6.375L11-01 FPGAs K. Elliott Fleming Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
ECE 587 Hardware/Software Co- Design Lecture 23 LLVM and xPilot Professor Jia Wang Department of Electrical and Computer Engineering Illinois Institute.
Multiple Clock Domains (MCD) Arvind with Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
Problem: design complexity advances in a pace that far exceeds the pace in which verification technology advances. More accurately: (verification complexity)
CoDeveloper Overview Updated February 19, Introducing CoDeveloper™  Targeting hardware/software programmable platforms  Target platforms feature.
DAC50, Designer Track, 156-VB543 Parallel Design Methodology for Video Codec LSI with High-level Synthesis and FPGA-based Platform Kazuya YOKOHARI, Koyo.
Chapter Goals Describe the application development process and the role of methodologies, models, and tools Compare and contrast programming language generations.
Introduction to Bluespec: A new methodology for designing Hardware
Introduction to Bluespec: A new methodology for designing Hardware
System-on-Chip Design
Architectural Exploration:
Architecture & Organization 1
Introduction to Bluespec: A new methodology for designing Hardware
IP – Based Design Methodology
Highly Efficient and Flexible Video Encoder on CPU+FPGA Platform
Bluespec-1: Design Affects Everything
Architecture & Organization 1
Introduction to Bluespec: A new methodology for designing Hardware
HIGH LEVEL SYNTHESIS.
Win with HDL Slide 4 System Level Design
Design methods to facilitate rapid growth of SoCs
Introduction to Bluespec: A new methodology for designing Hardware
Presentation transcript:

February 13, 2008L04-1http://csg.csail.mit.edu/6.375 Bluespec: The need for a new design methodology Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology February 13, 2008

L04-2http://csg.csail.mit.edu/6.375 Real power saving implies specialized hardware H.264 implementations in software vs hardware the power/energy savings could be 100 to 1000 fold but our mind set is that hardware design is Difficult, risky  Increased time-to-market Inflexible, brittle, error prone,...  How to deal with changing standards, errors New design flows and tools can change this mind set

February 13, 2008 L04-3http://csg.csail.mit.edu/6.375 Economic relevance Cell phones, PDAs, sensors,...  Demand a much greater variety of chips Cost of development, business risks,...  Forces us towards specialization primarily through software New tools can enable a much greater variety of chips

February 13, 2008 L04-4http://csg.csail.mit.edu/6.375 SoC Trajectory: more application specific blocks On-chip memory banks Structured on- chip networks General- purpose processors Application- specific processing units Can we rapidly produce high-quality chips and surrounding systems and software?

February 13, 2008 L04-5http://csg.csail.mit.edu/6.375 Making hardware design easier Extreme IP reuse Multiple instantiations of a block for different performance and application requirements Packaging of IP so that the blocks can be assembled easily to build a large system (black box model) Whole system simulation to enable concurrent hardware-software development Need new methods and tools to accomplish this goal “Intellectual Property”

February 13, 2008 L04-6http://csg.csail.mit.edu/6.375 data_in push_req_n pop_req_n clk rstn data_out full empty IP Reuse sounds wonderful until you try it... Example: Commercially available FIFO IP block These constraints are spread over many pages of the documentation... No machine verification of such informal constraints is feasible Bluespec can change all this

February 13, 2008 L04-7http://csg.csail.mit.edu/6.375 Bluespec promotes composition through guarded interfaces not full not empty n n rdy enab rdy enab rdy enq deq first FIFO theModuleA theModuleB theFifo.enq(value1); theFifo.deq(); value2 = theFifo.first(); theFifo.enq(value3); theFifo.deq(); value4 = theFifo.first(); theFifo Enqueue arbitration control Dequeue arbitration control Self-documenting interfaces; Automatic generation of logic to eliminate conflicts in use.

February 13, 2008 L04-8http://csg.csail.mit.edu/6.375 Bluespec: A new way of expressing behavior using Guarded Atomic Actions Formalizes composition Modules with guarded interfaces Compiler manages connectivity (muxing and associated control) Powerful static elaboration facility Permits parameterization of designs at all levels Transaction level modeling Allows C and Verilog codes to be encapsulated in Bluespec modules  Smaller, simpler, clearer, more correct code  not just simulation, synthesis as well Bluespec

February 13, 2008 L04-9http://csg.csail.mit.edu/6.375 Bluespec Tool flow Bluespec SystemVerilog source Verilog 95 RTL Verilog sim VCD output Debussy Visualization Bluespec Compiler RTL synthesis gates C Bluesim Cycle Accurate FPGA Power estimatio n tool Works in conjunction with exiting tool flows

February 13, 2008 L04-10http://csg.csail.mit.edu/6.375 Recent Applications Multiradio OFDM: From WiFi to WiMax a and from the same source H.264 Decoder Baseline profile, 720p X ~75 frames FPGA implementation working Other examples: Processors, Cache Coherence Protocols, IP Lookup,... Research sponsors have agreed to publish all designs done at MIT under the MIT open source license

February 13, 2008 L04-11http://csg.csail.mit.edu/6.375 Importance of Publishing Bluespec Designs Enables whole community to undertake much more ambitious projects We already see the effects in projects Enables derivative designs, specializations and variety at a fraction of the development cost

February 13, 2008L04-12http://csg.csail.mit.edu/6.375 Multi-radio OFDM workbench [MEMOCODE 2006, MEMOCODE 2007]

February 13, 2008 L04-13http://csg.csail.mit.edu/6.375 IP Reuse via parameterized modules Example OFDM based protocols MAC standard specific potential reuse Scrambler FEC Encoder InterleaverMapper Pilot & Guard Insertion IFFT CP Insertion De- Scrambler FEC Decoder De- Interleaver De- Mapper Channel Estimater FFTSynchronizer TX Controller RX Controller S/P D/A A/D Different algorithms Different throughput requirements Reusable algorithm with different parameter settings WiFi: 0.25MHz WiMAX: 0.03MHz WUSB: 128pt 8MHz 85% reusable code between WiFi and WiMAX From WiFi to WiMAX in 4 weeks  (Alfred) Man Chuek Ng, … WiFi: x 7 +x 4 +1 WiMAX: x 15 +x WUSB: x 15 +x Convolutional Reed-Solomon Turbo

February 13, 2008 L04-14http://csg.csail.mit.edu/ a Architectural Exploration (Only the IFFT block is changing) [MEMOCODE 2006] IFFT DesignArea (mm 2 ) Symbol Latency (CLKs) Throughput Latency (CLKs/sym) Min. Freq Required Average Power (mW) Pipelined MHz4.92 Combinational MHz3.99 Folded (16 Bfly-4s) MHz7.27 Super-Folded (8 Bfly-4s) MHz10.9 SF(4 Bfly-4s) MHz14.4 SF(2 Bfly-4s) MHz21.1 SF (1 Bfly4) MHZ34.6 TSMC.18 micron; numbers reported are before place and route. (DesignCompiler), Power numbers are from Sequence PowerTheater These designs were done in ~ 3 man-days

February 13, 2008L04-15http://csg.csail.mit.edu/6.375 Video Codec: H.264 Chun-Chieh Lin (MIT MS thesis 2006) Kermin Elliott Fleming

February 13, 2008 L04-16http://csg.csail.mit.edu/6.375 H.264 Video Decoder May be implemented in hardware or software depending upon... NAL unwrap Parse + CAVLC Inverse Quant Transformation Deblock Filter Intra Prediction Inter Prediction Ref Frames Compressed Bits Frames Different requirements for different environments - QVGA 320x240p (30 fps) - DVD 720x480p - HD DVD 1280x720p (60-75 fps)

February 13, 2008 L04-17http://csg.csail.mit.edu/6.375 Sequential code from ffmpeg void h264decode(){  int stage = S_NAL; while (!eof()){ createdOutput = 0; stallFromInterPred = 0; case (stage){ S_NAL: try_NAL(); if (createdOutput) stage = S_Parse; break; S_Parse: try_Parse(); stage=(createdOutput) ? S_IQIT: S_NAL; break; S_IQIT: try_IQIT(); stage=(createdOutput) ? S_Parse:S_Inter; break; S_Inter: try_Inter(); stage=(createdOutput) ? S_IQIT:S_Intra; if (stallFromInterPred) stage=S_Deblock; break; S_Intra: try_Intra(); stage=(createdOutput) ? S_Inter:S_Deblock; break; S_Deblock: try_deblock(); stage= S_Intra; break } } } Parse NAL IQ/IT Inter- Predict Intra- Predict Deblock ing 20K Lines of C out of 200K

February 13, 2008 L04-18http://csg.csail.mit.edu/6.375 Parallelizing the C code First step towards hardware generation from C Control structure is totally over specified and unscrambling it is beyond the capability of current compiler techniques Program structure is difficult to understand Packets are kept and modified in a global heap Some of these problems can be avoided by providing the programmer a few parallel constructs

February 13, 2008 L04-19http://csg.csail.mit.edu/6.375 H.264 Learnings Productivity: Base profile Effort: Less than one-man year 8K lines of Bluespec (contrast 20k to 80K lines of C) First draft decoded ~32fps, (Available C codes do not meet this performance) Architectural Exploration: Many improvements made over a period of several months to increase performance and reduce area Process several samples / cycle Adjust FIFO depths Pipeline modules: Interpolator, Deblocking filter After improvements decodes ~95fps (180nm) Modular refinement is both feasible and essential

February 13, 2008 L04-20http://csg.csail.mit.edu/6.375 H.264 Design Exploration Area (mm 2 ) Cycles /pixel Cycle time (ns) FPS 1280x720 First draft samples / FIFO elt samples / cycle Larger FIFOs Interpred in parallel Pipelined interp Tower 180nm library

February 13, 2008L04-21http://csg.csail.mit.edu/6.375 Bluespec for System Modeling and Synthesis

February 13, 2008 L04-22http://csg.csail.mit.edu/6.375 A typical SoC model Processor (ISS) DSP (App) DMA Mem Controller DRAM model Interconnect L2 cache Codec model The model may contain a mixture of SystemC and Bluespec modules Typical SystemC modules: CPU ISS models Existing SystemC IP Behavioral models in C or C++ targeted for synthesis Bluespec modules: Complex control – difficult to model in SystemC Hardware - realistic architectural exploration SystemC Bluespec Legend

February 13, 2008 L04-23http://csg.csail.mit.edu/6.375 Modeling Concurrency interconnect system bus interfaces “Algorithm accelerators” (for behavioral synthesis) P $ M pure behavioral model (representing RTL IP) CPU ISS SystemC Legend Bluespec Programming the interconnect without an accurate timing model is slightly bogus

February 13, 2008 L04-24http://csg.csail.mit.edu/6.375 Modular refinement Is it easy to build Bluespec wrappers for a class of C codes Bluespec modules can be introduced early because they Can be written at a very high level, Can interface to other SystemC TLM modules Can be refined into hardware/RTL System-level testbenchs can be reused at all levels

February 13, 2008 L04-25http://csg.csail.mit.edu/6.375 Other ongoing collaborative projects Performance modeling on FPGAs with Joel Emer at Intel Speeding up the software performance model of IA- 32 from 10Kips to 1-10Mips using FPGAs PowerPC model for FPGAs with K. Ekanadham & Jessica Tsang at IBM Boot Unix on an RTL model of a multi-threaded, multicore PowerPC on FPGAs Turbo decoder with Jamey Hicks & Gopal Raghavan at Nokia Integration of a parameterized Turbo decoder into an existing commercial design flow Accelerated test benches via FPGA With Suhas Pai at Qualcomm You will hear about it later in the course

February 13, 2008 L04-26http://csg.csail.mit.edu/6.375 Hardware synthesis: C-based tools vs Bluespec The goal of C-based tools (e.g., Catapult-C) is to generate good hardware given some area, timing, power or performance constraints The tool explores the design space to come up with the “right” design Language extensions are provided to overcome some of the limitations of C The goal of Bluespec is to enable the designer to generate a good implementation by letting him/her express the design at a high-level and explore alternatives via parameterization or refinement No automatic exploration of the design space Designer knows best – the tool automates some of the tedious and error-prone part of the hardware design process

February 13, 2008 L04-27http://csg.csail.mit.edu/6.375 Current research Make the path to hardware design easier FPGA emulation infrastructure Set up an infrastructure to study power related optimizations Hardware-software interaction: test benches, device drivers, transaction-level modeling Continue to explore new examples Semantic extensions and associated compiling schemes The sequential connective: Control over scheduling, Multi-cycle atomic actions Recursive method calls Exploratory: Compiling Bluespec for multicores

February 13, 2008 L04-28http://csg.csail.mit.edu/6.375 Bluespec promotes good Design methodology Can keep up with changing specs Permits architectural exploration Facilitates verification and debugging Eases changes for timing closure Eases changes for physical design Promotes reuse Design for Correctness