Evolution of Chip Design ECE 111 Spring 2011. A Brief History 1958: First integrated circuit – Flip-flop using two transistors – Built by Jack Kilby at.

Slides:



Advertisements
Similar presentations
CSCE 432/832 High Performance ---- An Introduction to Multicore Memory Hierarchy Dongyuan Zhan CS252 S05.
Advertisements

Introduction to Microprocessors and Microcomputers.
Instructor: Sazid Zaman Khan Lecturer, Department of Computer Science and Engineering, IIUC.
Lecture 2: Modern Trends 1. 2 Microprocessor Performance Only 7% improvement in memory performance every year! 50% improvement in microprocessor performance.
Performance Characterization of the Tile Architecture Précis Presentation Dr. Matthew Clark, Dr. Eric Grobelny, Andrew White Honeywell Defense & Space,
Integrated Digital Electronics Module 3B2 Lectures 1-8 Engineering Tripos Part IIA David Holburn January 2006.
TigerSHARC and Blackfin Different Applications. Introduction Quick overview of TigerSHARC Quick overview of Blackfin low power processor Case Study: Blackfin.
VLSI Trends. A Brief History  1958: First integrated circuit  Flip-flop using two transistors  From Texas Instruments  2011  Intel 10 Core Xeon Westmere-EX.
Some Thoughts on Technology and Strategies for Petaflops.
Room: E-3-31 Phone: Dr Masri Ayob TK 2123 COMPUTER ORGANISATION & ARCHITECTURE Lecture 4: Computer Performance.
ELEC 6200, Fall 07, Oct 29 McPherson: Vector Processors1 Vector Processors Ryan McPherson ELEC 6200 Fall 2007.
1 Digital Space Anant Agarwal MIT and Tilera Corporation.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 2 - Technology.
EE314 Basic EE II Silicon Technology [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]
ELEN468 Lecture 11 ELEN468 Advanced Logic Design Lecture 1Introduction.
1Hot Chips 2000Imagine IMAGINE: Signal and Image Processing Using Streams William J. Dally, Scott Rixner, Ujval J. Kapasi, Peter Mattson, Jinyung Namkoong,
ECE2030 Introduction to Computer Engineering Lecture 1: Overview
Computer Organization and Assembly language
Computer performance.
1 Copyright © 2011, Elsevier Inc. All rights Reserved. Appendix E Authors: John Hennessy & David Patterson.
Design and Implementation of VLSI Systems (EN1600) lecture01 Sherief Reda Division of Engineering, Brown University Spring 2008 [sources: Weste/Addison.
1 VLSI and Computer Architecture Trends ECE 25 Fall 2012.
EE141 © Digital Integrated Circuits 2nd Introduction 1 EE4271 VLSI Design Dr. Shiyan Hu Office: EERC 518 Adapted and modified from Digital.
Lecture 03: Fundamentals of Computer Design - Trends and Performance Kai Bu
CSE477 L01 Introduction.1Irwin&Vijay, PSU, 2002 ECE484 VLSI Digital Circuits Fall 2014 Lecture 01: Introduction Adapted from slides provided by Mary Jane.
The Tile Processor: A 64-Core Multicore for Embedded Processing Anant Agarwal Tilera Corporation HPEC 2007.
Company LOGO High Performance Processors Miguel J. González Blanco Miguel A. Padilla Puig Felix Rivera Rivas.
1 Lecture 1: CS/ECE 3810 Introduction Today’s topics:  Why computer organization is important  Logistics  Modern trends.
TILEmpower-Gx36 - Architecture overview & performance benchmarks – Presented by Younghyun Jo 2013/12/18.
EECS 318 CAD Computer Aided Design LECTURE 1: Introduction.
1 Recap (from Previous Lecture). 2 Computer Architecture Computer Architecture involves 3 inter- related components – Instruction set architecture (ISA):
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
History of Microprocessor MPIntroductionData BusAddress Bus
J. Christiansen, CERN - EP/MIC
Egle Cebelyte. Random Access Memory is simply the storage area where all software is loaded and works from; also called working memory storage.
The TILE-Gx Processor: Enabling HPC through Massive-Scale Manycore Bob Doud Director of Processor Strategy, Tilera Corp. HPEC, September 2011.
1 The First Computer [Adapted from Copyright 1996 UCB]
EE3A1 Computer Hardware and Digital Design
Introduction to ICs and Transistor Fundamentals Brief History Transistor Types Moore’s Law Design vs Fabrication.
Computer Organization & Assembly Language © by DR. M. Amer.
Present – Past -- Future
Sam Sandbote CSE 8383 Advanced Computer Architecture The IBM Cell Architecture Sam Sandbote CSE 8383 Advanced Computer Architecture April 18, 2006.
EE586 VLSI Design Partha Pande School of EECS Washington State University
February 12, 1999 Architecture and Circuits: 1 Interconnect-Oriented Architecture and Circuits William J. Dally Computer Systems Laboratory Stanford University.
Computer Architecture Introduction Lynn Choi Korea University.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR Moore’s Law n Gordon Moore: co-founder of Intel. n Predicted that number of transistors.
EE141 © Digital Integrated Circuits 2nd Introduction 1 Principle of CMOS VLSI Design Introduction Adapted from Digital Integrated, Copyright 2003 Prentice.
INTRODUCTION. This course is basically about silicon chip fabrication, the technologies used to manufacture ICs.
Overview of VLSI 魏凱城 彰化師範大學資工系. VLSI  Very-Large-Scale Integration Today’s complex VLSI chips  The number of transistors has exceeded 120 million 
DR. SIMING LIU SPRING 2016 COMPUTER SCIENCE AND ENGINEERING UNIVERSITY OF NEVADA, RENO Session 3 Computer Evolution.
EE141 © Digital Integrated Circuits 2nd Introduction 1 EE5900 Advanced Algorithms for Robust VLSI CAD Dr. Shiyan Hu Office: EERC 731 Adapted.
CS203 – Advanced Computer Architecture
Lecture # 10 Processors Microcomputer Processors.
FPGA Technology Overview Carl Lebsack * Some slides are from the “Programmable Logic” lecture slides by Dr. Morris Chang.
Computer Organization IS F242. Course Objective It aims at understanding and appreciating the computing system’s functional components, their characteristics,
VU-Advanced Computer Architecture Lecture 1-Introduction 1 Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 1.
Hardware Architecture
EE141 © Digital Integrated Circuits 2nd Introduction 1 EE4271 VLSI Design Dr. Shiyan Hu Office: EERC 731 Adapted and modified from Digital.
SPRING 2012 Assembly Language. Definition 2 A microprocessor is a silicon chip which forms the core of a microcomputer the concept of what goes into a.
Itanium® 2 Processor Architecture
Lynn Choi School of Electrical Engineering
RAM, CPUs, & BUSES Egle Cebelyte.
VLSI INTRODUCTION - Prof. Rakesh K. Jha
Architecture & Organization 1
Electronics for Physicists
CS775: Computer Architecture
Technology and Historical Perspective: A peek of the microprocessor Evolution 11/14/2018 cpeg323\Topic1a.ppt.
Architecture & Organization 1
A High Performance SoC: PkunityTM
Electronics for Physicists
Presentation transcript:

Evolution of Chip Design ECE 111 Spring 2011

A Brief History 1958: First integrated circuit – Flip-flop using two transistors – Built by Jack Kilby at Texas Instruments 2010 – Intel Core i7  processor 2.3 billion transistors – 64 Gb Flash memory > 16 billion transistors Courtesy Texas Instruments [Trinh09] © 2009 IEEE. Source: David Harris, CMOS VLSI Design Lecture Slides

Annual Sales >10 19 transistors manufactured in 2008 – 1 billion for every human on the planet Source: David Harris, CMOS VLSI Design Lecture Slides

Feature Size Minimum feature size shrinking 30% every 2-3 years Source: David Harris, CMOS VLSI Design Lecture Slides

NRE Mask Costs Source: MIT Lincoln Labs, M. Fritze, October 2002

Subwavelength Lithography Challenges Source: Raul Camposano, 2003

The Designer’s Escalating Problem Source: Raul Camposano, 2003

Wire Delays and Noise Problems Dramatically Complicate Design Unstructured “Place and Route” Standard Cell Methodologies will Breakdown 1 cycle180 nm 45 nm

ASIC NRE Costs Not Justified for Many Applications Forecast: By 2010, a complex ASIC will have an NRE Cost of over $40M = $28M (NRE Design Cost) + $12M (NRE Mask Cost) Many “ASIC” applications will not have the volume to justify a $40M NRE cost e.g. a $30 IC with a 33% margin would require sales of 4M units (x $10 profit/IC) just to recoup $40M NRE Cost

Case For Programmable Solutions Can “amortized” high NRE costs across many applications – e.g. microprocessors, DSPs, FPGAs Complex ASICs today require 18+ months vs. ~4 months for same function on DSP – e.g. Voice-over-IP chip vs. Voice-over-IP on a DSP – “Design time” gap will widen dramatically Many applications simply requires “programmability”, e.g. cell phones – multiple modes – evolving standards – evolving features, differentiation …

But … Advance applications and algorithms (e.g. latest video games, broadband wireless …) require enormous computation power – 100s to 1000s of GOPS And very high efficiency – 100s of MOPS/mW (GOPS/W) – 10s of GOPs/$ Existing microprocessors, DSPs, and FPGAs don’t come close

Why are Conventional Processor Architectures Inefficient? e.g. Intel Itanium II – 6-Way Integer Unit < 2% die area – Cache logic > 50% die area Most of chip there to keep these 6 Integer Units at “peak” rate Main issue is external DRAM latency (50ns) to internal clock (0.25ns) is 200:1 Can “in theory” fit >300 ALUs (tens of thousands in future) in same die area, but how to keep them “busy”? INT6 Cache logic

Why are ASICs so Efficient? Parallelism (Millions of gates operating in parallel) Locality (Fed by dedicated “local” wires & memories) Source: Bill Dally, 2003

20MIPS cpu in 1987 Few thousand gates Source: Anant Agarwal, MIT, NOCS 2009 Keynote

The billion transistor chip of 2007 Source: Anant Agarwal, MIT, NOCS 2009 Keynote

Tilera’s TILEPro64™ Processor Power per tile (depending on app)170 – 300 mW Core power for h.264 encode (64 tiles) 12W Clock speed Up to 866 MHz I/O bandwidth40 Gbps Main Memory bandwidth200 Gbps Multicore Performance (90nm) Number of tiles64 Cache-coherent distributed cache5 MB 750MHz (32, 16, 8 bit) BOPS Bisection bandwidth2 Terabits per second Power Efficiency I/O and Memory Bandwidth Programming ANSI standard C SMP Linux programming Stream programming Product reality Source: Anant Agarwal, MIT, NOCS 2009 Keynote

PCIe 1 MAC PHY PCIe 1 MAC PHY PCIe 0 MAC PHY PCIe 0 MAC PHY Serdes Flexible IO GbE 0 GbE 1 Flexible IO UART, HPI JTAG, I2C, SPI UART, HPI JTAG, I2C, SPI DDR2 Memory Controller 3 DDR2 Memory Controller 0 DDR2 Memory Controller 2 DDR2 Memory Controller 1 XAUI MAC PHY 0 XAUI MAC PHY 0 Serdes XAUI MAC PHY 1 XAUI MAC PHY 1 Serdes Tile Processor Block Diagram A Complete System on a Chip PROCESSOR P2 Reg File P1P0 CACHE L2 CACHE L1IL1D ITLBDTLB 2D DMA STN MDNTDN UDNIDN SWITCH Source: Anant Agarwal, MIT, NOCS 2009 Keynote

What Does the Future Look Like? Corollary of Moore’s law: Number of cores will double every 18 months ‘05‘08‘11‘ ‘02 16 Research Industry (Cores minimally big enough to run a self respecting OS!) 1K cores by 2014! Are we ready? Source: Anant Agarwal, MIT, NOCS 2009 Keynote

Massively Parallel Processing On-a-Chip 2 GB/s 544 GB/s Registers SRAM 32 GB/s DDR Interface 64 Tiles x 8 ALUs = GHz, 1000 GOPS = 1 TOPS Parallelism + Locality DDR DRAM Bandwidth Hierarchy is Key Source: Bill Dally, 2003

IBM/Sony/Toshiba Cell Processor Used in Playstation GHz 64-bit Dual-Threaded PowerPC 8 SIMD Engines x 7 ALUs = GHz = 256 GFLOPS Terabit on-chip ring network Terabit external memory and chip-to-chip IO 90nm process 234 million transistors 221 mm 2 die 0.5 Tb/s Memory I/O 0.5 Tb/s Chip I/O SIMD Engine 7 ALUs 64-bit Dual-Thread PowerPC Tb/s Ring Network

NVIDIA GeForce Clusters x 16 ALUs = 128 ALUs 32-bit on-chip CPU Terabit external memory IO 1.35 GHz clock 90nm process 681 million transistors 32-bit CPU 0.7 Tb/s Memory I/O 8 clusters x 16 ALUs = 128 ALUs