L01-1 6.375: Complex Digital Systems Lecturer: Arvind TA: Richard S. Uhler Administration: Sally Lee February 2, 2011.

Slides:



Advertisements
Similar presentations
FPGA (Field Programmable Gate Array)
Advertisements

TO COMPUTERS WITH BASIC CONCEPTS Lecturer: Mohamed-Nur Hussein Abdullahi Hame WEEK 1 M. Sc in CSE (Daffodil International University)
Device Tradeoffs Greg Stitt ECE Department University of Florida.
EELE 367 – Logic Design Module 2 – Modern Digital Design Flow Agenda 1.History of Digital Design Approach 2.HDLs 3.Design Abstraction 4.Modern Design Steps.
Programmable Logic Devices
Graduate Computer Architecture I Lecture 15: Intro to Reconfigurable Devices.
EECE579: Digital Design Flows
Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.
The Design Process Outline Goal Reading Design Domain Design Flow
Some Thoughts on Technology and Strategies for Petaflops.
Chapter 1. Introduction This course is all about how computers work But what do we mean by a computer? –Different types: desktop, servers, embedded devices.
Configurable System-on-Chip: Xilinx EDK
EET 4250: Chapter 1 Performance Measurement, Instruction Count & CPI Acknowledgements: Some slides and lecture notes for this course adapted from Prof.
Introduction to FPGA and DSPs Joe College, Chris Doyle, Ann Marie Rynning.
GallagherP188/MAPLD20041 Accelerating DSP Algorithms Using FPGAs Sean Gallagher DSP Specialist Xilinx Inc.
(1) Introduction © Sudhakar Yalamanchili, Georgia Institute of Technology, 2006.
6.375: Complex Digital Systems Lecturer: Arvind TA: Richard S. Uhler Administration: Sally Lee February 6, 2013http://csg.csail.mit.edu/6.375 L01-1.
February 4, 2009http://csg.csail.mit.edu/6.375/L Complex Digital System Spring 2009 Lecturer:Arvind TA: K. Elliott Fleming Assistant: Sally Lee.
CAD for Physical Design of VLSI Circuits
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
CS/ECE 3330 Computer Architecture Kim Hazelwood Fall 2009.
VLSI & ECAD LAB Introduction.
집적회로 Spring 2007 Prof. Sang Sik AHN Signal Processing LAB.
ASIP Architecture for Future Wireless Systems: Flexibility and Customization Joseph Cavallaro and Predrag Radosavljevic Rice University Center for Multimedia.
1 Moore’s Law in Microprocessors Pentium® proc P Year Transistors.
CSE 494: Electronic Design Automation Lecture 2 VLSI Design, Physical Design Automation, Design Styles.
Lecture 2 1 ECE 412: Microcomputer Laboratory Lecture 2: Design Methodologies.
J. Christiansen, CERN - EP/MIC
COE 405 Design and Modeling of Digital Systems
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Chonnam national university VLSI Lab 8.4 Block Integration for Hard Macros The process of integrating the subblocks into the macro.
Reminder Lab 0 Xilinx ISE tutorial Research Send me an if interested Looking for those interested in RC with skills in compilers/languages/synthesis,
© 2012 xtUML.org Bill Chown – Mentor Graphics Model Driven Engineering.
Introduction to Reconfigurable Computing Greg Stitt ECE Department University of Florida.
Microelectronic Systems Institute Leandro Soares Indrusiak Manfred Glesner Ricardo Reis Lookup-based Remote Laboratory for FPGA Digital Design Prototyping.
Introduction to FPGA Created & Presented By Ali Masoudi For Advanced Digital Communication Lab (ADC-Lab) At Isfahan University Of technology (IUT) Department.
UNIT 1 Introduction. 1-2 OutlineOutline n Course Topics n Microelectronics n Design Styles n Design Domains and Levels of Abstractions n Digital System.
February 7, 2007http://csg.csail.mit.edu/6.375/L Complex Digital System Spring 2007 Lecturers: Arvind & Krste Asanović TAs: Myron King & Ajay.
March, 2007Intro-1http://csg.csail.mit.edu/arvind Design methods to facilitate rapid growth of SoCs Arvind Computer Science & Artificial Intelligence Lab.
EE3A1 Computer Hardware and Digital Design
Chapter 1 Computer Abstractions and Technology. Chapter 1 — Computer Abstractions and Technology — 2 The Computer Revolution Progress in computer technology.
An Overview of Hardware Design Methodology Ian Mitchelle De Vera.
DSP Architectures Additional Slides Professor S. Srinivasan Electrical Engineering Department I.I.T.-Madras, Chennai –
M.Mohajjel. Why? TTM (Time-to-market) Prototyping Reconfigurable and Custom Computing 2Digital System Design.
Transistor Counts 1,000, ,000 10,000 1, i386 i486 Pentium ® Pentium ® Pro K 1 Billion Transistors.
EE141 © Digital Integrated Circuits 2nd Introduction 1 Principle of CMOS VLSI Design Introduction Adapted from Digital Integrated, Copyright 2003 Prentice.
ECE 551: Digital System Design & Synthesis Motivation and Introduction Lectures Set 1 (3 Lectures)
DR. SIMING LIU SPRING 2016 COMPUTER SCIENCE AND ENGINEERING UNIVERSITY OF NEVADA, RENO CS 219 Computer Organization.
Introduction to VLSI Design Amit Kumar Mishra ECE Department IIT Guwahati.
February 13, 2008L04-1http://csg.csail.mit.edu/6.375 Bluespec: The need for a new design methodology Arvind Computer Science & Artificial Intelligence.
Introduction to Field Programmable Gate Arrays (FPGAs) EDL Spring 2016 Johns Hopkins University Electrical and Computer Engineering March 2, 2016.
Microprocessor Design Process
FPGA Technology Overview Carl Lebsack * Some slides are from the “Programmable Logic” lecture slides by Dr. Morris Chang.
9/4/2001 ECE 551 Fall ECE Digital System Design & Synthesis Lecture 1 - Introduction  Overview oCourse Introduction oOverview of Contemporary.
Fall 2012 Parallel Computer Architecture Lecture 4: Multi-Core Processors Prof. Onur Mutlu Carnegie Mellon University 9/14/2012.
SUBJECT : DIGITAL ELECTRONICS CLASS : SEM 3(B) TOPIC : INTRODUCTION OF VHDL.
Programmable Logic Devices
System-on-Chip Design
ECE354 Embedded Systems Introduction C Andras Moritz.
Architecture & Organization 1
FPGAs in AWS and First Use Cases, Kees Vissers
ELEN 468 Advanced Logic Design
Architecture & Organization 1
332:437 Lecture 7 Verilog Hardware Description Language Basics
ECNG 1014: Digital Electronics Lecture 1: Course Overview
The performance requirements for DSP applications continue to grow and the traditional solutions do not adequately address this new challenge Paradigm.
332:437 Lecture 7 Verilog Hardware Description Language Basics
Chapter 1 Introduction.
HIGH LEVEL SYNTHESIS.
332:437 Lecture 7 Verilog Hardware Description Language Basics
Presentation transcript:

L : Complex Digital Systems Lecturer: Arvind TA: Richard S. Uhler Administration: Sally Lee February 2, 2011

L Why take Something new and exciting as well as useful Fun: Design systems that you never thought you could design in a course made possible by large FPGAs and Bluespec You will also discover that is possible to design complex digital systems with little knowledge of circuits February 2, 2011

L New, exciting and useful … February 2, 2011

L Wide Variety of Products Rely on ASICs ASIC = Application-Specific Integrated Circuit February 2, 2011

L Source: What’s required? ICs with dramatically higher performance, optimized for applications and at a size and power to deliver mobility cost to address mass consumer markets February 2, 2011

L Current Cellphone Architecture Two chips, each with an ARM general-purpose processor (GPP) and a DSP ( TI OMAP 2420) Comms. Processing Application Processing WLAN RF WCDMA/GSM RF Complex, High Performance but must not dissipate more than 3 watts Many specialized complex blocks February 2, 2011

L Server microprocessors also need specialized blocks compression/decompression encryption/decryption intrusion detection and other security related solutions Dealing with spam Self diagnosing errors and masking them … February 2, 2011

L Real power saving implies specialized hardware H.264 video decoder implementations in software vs. hardware the power/energy savings could be 100 to 1000 fold but our mind set is that hardware design is: Difficult, risky  Increases time-to-market Inflexible, brittle, error prone,...  Difficult to deal with changing standards, … New design flows and tools can change this mind set February 2, 2011

L Will multicores reduce the need for new hardware? Unlikely – because of power and performance 64-core Tilera February 2, 2011

L SoC & Multicore Convergence: more application specific blocks On-chip memory banks Structured on- chip networks General- purpose processors Application- specific processing units February 2, 2011

L To reduce the design cost of SoCs we need … Extreme IP reuse Multiple instantiations of a block for different performance and application requirements Packaging of IP so that the blocks can be assembled easily to build a large system (black box model) Architectural exploration to understand cost, power and performance tradeoffs Full system simulations for validation and verification “Intellectual Property” February 2, 2011

L Hardware design today is like programming was in the fifties, i.e., before the invention of high-level languages February 2, 2011

L Programmers had to know many detail of their computer An IBM 650 Instruction: “Load the contents of location 1234 into the distribution; put it also into the upper accumulator; set lower accumulator to zero; and then go to location 1009 for the next instruction.” Can you program a computer without knowing, for example, how many registers it has? IBM 650 (1954) Fortran changed this mind set (1956) 1950s reaction February 2, 2011

L For designing complex SoCs deep circuits knowledge is secondary Using modern high-level hardware synthesis tools like Bluespec requires computer science training in programming and architecture rather than circuit design February 2, 2011

L Bluespec A new way of expressing behavior A formal method of composing modules with parallel interfaces (ports) Compiler manages muxing of ports and associated control Powerful and zero-cost parameterization of modules Encapsulation of C and Verilog codes using Bluespec wrappers Helps Transaction Level modeling  Smaller, simpler, clearer, more correct code  not just simulation, synthesis as well Bluespec February 2, 2011

L IP Reuse via parameterized modules Example OFDM based protocols MAC standard specific potential reuse Scrambler FEC Encoder InterleaverMapper Pilot & Guard Insertion IFFT CP Insertion De- Scrambler FEC Decoder De- Interleaver De- Mapper Channel Estimater FFTSynchronizer TX Controller RX Controller S/P D/A A/D Different algorithms Different throughput requirements Reusable algorithm with different parameter settings WiFi: 0.25MHz WiMAX: 0.03MHz WUSB: 128pt 8MHz 85% reusable code between WiFi and WiMAX From WiFi to WiMAX in 4 weeks  (Alfred) Man Cheuk Ng, … WiFi: x 7 +x 4 +1 WiMAX: x 15 +x WUSB: x 15 +x Convolutional Reed-Solomon Turbo February 2, 2011

L High-level Synthesis from Bluespec VCD output Debussy Visualization C Bluesim Cycle Accurate Bluespec SystemVerilog source Verilog 95 RTL Verilog sim Bluespec Compiler RTL synthesis gates FPGA Power estimatio n tool First simulate Second run on FPGAs We won’t explore the chip design path February 2, 2011

L FPGAs: a new opportunity February 2, 2011

L Chip Design Styles Custom and Semi-Custom Hand-drawn transistors (+ some standard cells) High volume, best possible performance: used for most advanced microprocessors Standard-Cell-Based ASICs High volume, moderate performance: Graphics chips, network chips, cell-phone chips Field-Programmable Gate Arrays Prototyping Low volume, low-moderate performance applications Different design styles have vastly different costs February 2, 2011

L01-20http://csg.csail.mit.edu/6.375 Exponential growth: Moore’s Law Intel 8080A, Mhz, 6K transistors, 6u Intel 8086, 1978, 33mm 2 10Mhz, 29K transistors, 3u Intel 80286, 1982, 47mm Mhz, 134K transistors, 1.5u Intel 386DX, 1985, 43mm 2 33Mhz, 275K transistors, 1u Intel 486, 1989, 81mm 2 50Mhz, 1.2M transistors,.8u Intel Pentium, 1993/1994/1996, 295/147/90mm 2 66Mhz, 3.1M transistors,.8u/.6u/.35u Intel Pentium II, 1997, 203mm 2 /104mm 2 300/333Mhz, 7.5M transistors,.35u/.25u Shown with approximate relative sizes February 2, 2011

L Intel Penryn (2007) Dual core Quad-issue out-of-order superscalar processors 6MB shared L2 cache 45nm technology Metal gate transistors High-K gate dielectric 410 Million transistors 3+? GHz clock frequency Could fit over processors on same size die. February 2, 2011

L But Design Effort is Growing Nvidia Graphics Processing Units Front-end is designing the logic (RTL) Back-end is fitting all the gates and wires on the chip; meeting timing specifications; wiring up power, ground, and clock Transistors (M) Relative staffing on front-end Relative staffing on back-end 9x growth in back-end staff 5x growth in front-end staff February 2, 2011

L Design Cost Impacts Chip Cost An Altera study Non-Recurring Engineering (NRE) costs for a 90nm ASIC is ~ $30M 59% chip design (architecture, logic & I/O design, product & test engineering) 30% software and applications development 11% prototyping (masks, wafers, boards) If we sell 100,000 units, NRE costs add $30M/100K = $300 per chip! Alternative: Use FPGAs Hand-crafted IBM-Sony-Toshiba Cell microprocessor achieves 4GHz in 90nm, but at the development cost of >$400M February 2, 2011

L Field-Programmable Gate Arrays (FPGAs) Arrays mass-produced but programmed by customer after fabrication Can be programmed by loading SRAM bits, or loading FLASH memory Each cell in array contains a programmable logic function Array has programmable interconnect between logic functions Overhead of programmability makes arrays expensive and slow as compared to ASICs However, much cheaper than an ASIC for small volumes because NRE costs do not include chip development costs (only include programming) February 2, 2011

L FPGA Pros and Cons Advantages Dramatically reduce the cost of errors Little physical design work Remove the reticle costs from each design Disadvantages (as compared to an ASIC) [Kuon & Rose, FPGA2006] Switching power around ~12X worse Performance up 3-4X worse Area 20-40X greater Still requires tremendous design effort at RTL level February 2, 2011

L The new opportunity “Big” FPGAs have become widely available A multicore can be emulated on one FPGA but the programming model is RTL and not too many people design hardware Enable the use of FPGAs via Bluespec February 2, 2011

L Fun: Design systems that you never thought you would design in a course February 2, 2011

L Some Bluespec/FPGA projects at MIT Video decoder – H.264 AirBlue – A new platform to experiment with cross-layer wireless protocols Cycle-accurate performance models Intel’s Hasim IBM’s PowerPC Hardware software co-generation February 2, 2011

L H.264 Video Decoder Chun-Chieh Lin, K Elliott Fleming [MEMOCODE 2008] Used everywhere - cell phones, DVDs, HD-DVDs Initial Design Eight man-months 8K lines of Bluespec  in contrast to 80K lines of C standard Decoded Major architectural explorations over 3 months High performance designs (4.2 mm sq in 180nm)  Low cost designs  (2.2mm sq), (2.4mm sq) Can be refined further to run on FPGAs February 2, 2011

L AirBlue: A platform for Cross-Layer Wireless Protocol development Cross-layer protocols (i.e., jointly optimizing PHY and MAC layers) are the hottest area of research in wireless Several cross-layer experiments (e.g., SoftPhy) have already been conducted on full-speed a/g implementation Fits in Nokia N95 phones Each new protocol required less than 100 lines of code Now building AirBlue2.0 With Prof Hari Balakrishanan February 2, 2011

L IBM: PowerPC Prototype K. Ekanadham, Jessica Tseng (IBM) Asif Khan, M. Vijayaraghavan (MIT) Goal: Implement a multithreaded, multicore, in-order PowerPC on an FPGA platform and boot Linux on it in 12 months Team: 2(IBM) + 2(MIT) + Linux and FPGA help The team accomplished the goal (Nov 2008) - Bluespec PowerPC boots Linux on FPGAs in 10min; - 100M instructions to reach “Hello World”; - 15K lines of Bluespec generated 90K lines of Verilog IBM synthesized the generated Verilog using their tools in 40nm library – ran at 500MHz on the first try! February 2, 2011

L Phase II: IBM/MIT Collaboration March 2009 – Goal: Produce a cycle-accurate and highly parameterized model of multithreaded, multicore PowerPC to run on FPGAs demonstrate 1000X speedup and flexibility by running the models on FPGAs Use cheaper and widely available FPGA boards Xilinx 110 as opposed to 330 Target open source distribution The model is currently able to boot 32-bit Linux on FPGAs and runs at 4.4 MIPS February 2, 2011

L The Course Philosophy Effective abstractions to reduce design effort High-level design language rather than logic gates Control specified with Guarded Atomic Actions rather than with finite state machines Guarded module interfaces automatically ensure correctness of composition of existing modules Design discipline to avoid bad design points Decoupled units rather than tightly coupled state machines Design space exploration to find good designs Architecture choice has largest impact on solution quality We learn by doing actual designs February 2, 2011

L The course has no text book but … Lecture slides (with animation) Make sure you sure you understand the lectures before exploring other materials Small Example suite (from Bluespec Inc) A series of small examples (currently over 70), focusing on one topic at a time. Good entry for learning the language by yourself bluespec/Home/Small-Examples bluespec/Home/Small-Examples bluespec.com  Resources  Wiki  Small Examples Bluespec System Verilog Reference manual It is a reference, not a tutorial bluespec.com  Resources  Wiki  BSV Documentation  Reference Manual Bluespec System Verilog Users guide How to use all the tools for developing BSV programs bluespec.com  Resources  Wiki  BSV Documentation  User Guide February 2, 2011