The Stanford Smart Memories: A 90nm, 55M transistor, 61mm², 8-core chip multiprocessor VLSI technology scaling is driving changes Designs are getting complex.

Slides:



Advertisements
Similar presentations
The Raw Architecture Signal Processing on a Scalable Composable Computation Fabric David Wentzlaff, Michael Taylor, Jason Kim, Jason Miller, Fae Ghodrat,
Advertisements

Evolution of Chip Design ECE 111 Spring A Brief History 1958: First integrated circuit – Flip-flop using two transistors – Built by Jack Kilby at.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
Graduate Computer Architecture I Lecture 15: Intro to Reconfigurable Devices.
CPU Processor Speed Timeline Speed =.02 Mhz Year= 1972 Transistors= 3500 It takes 66, CPU’s to equal 1 i7.
Design and Implementation of VLSI Systems (EN0160) Sherief Reda Division of Engineering, Brown University Spring 2007.
Spring 08, Jan 15 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Introduction Vishwani D. Agrawal James J. Danaher.
Spring 07, Jan 16 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Introduction Vishwani D. Agrawal James J. Danaher.
1 Multi - Core fast Communication for SoPC Multi - Core fast Communication for SoPC Technion – Israel Institute of Technology Department of Electrical.
Chapter Hardwired vs Microprogrammed Control Multithreading
Define Embedded Systems Small (?) Application Specific Computer Systems.
Chapter 17 Parallel Processing.
Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar.
Climate Machine Update David Donofrio RAMP Retreat 8/20/2008.
Automobiles The Scale Vector-Thread Processor Modern embedded systems Multiple programming languages and models Multiple distinct memories Multiple communication.
Ethernet Bomber Ethernet Packet Generator for network analysis Oren Novitzky & Rony Setter Advisor: Mony Orbach Started: Spring 2008 Part A final Presentation.
Mahesh Sukumar Subramanian Srinivasan. Introduction Face detection - determines the locations of human faces in digital images. Binary pattern-classification.
A Flexible Architecture for Simulation and Testing (FAST) Multiprocessor Systems John D. Davis, Lance Hammond, Kunle Olukotun Computer Systems Lab Stanford.
Single-Chip Multi-Processors (CMP) PRADEEP DANDAMUDI 1 ELEC , Fall 08.
Presenter : Ching-Hua Huang 2012/11/3 Implementation and Prototyping of a Complex Multi-Project System-on-a-Chip Chun-Ming Huang, Chien-Ming Wu, Chih-Chyau.
Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Spring 2009.
Design and Implementation of VLSI Systems (EN1600) lecture01 Sherief Reda Division of Engineering, Brown University Spring 2008 [sources: Weste/Addison.
Prof. JunDong Cho VADA Lab. Project.
CCSE251 Introduction to Computer Organization
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
1 A+ Guide to Managing and Maintaining Your PC, Fifth Edition Hardware Needs Software to Work Hardware  Physical components of a computer  Visible part.
University of Michigan Electrical Engineering and Computer Science 1 Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-Thread Applications.
Presenter: Hong-Wei Zhuang On-Chip SOC Test Platform Design Based on IEEE 1500 Standard Very Large Scale Integration (VLSI) Systems, IEEE Transactions.
Winter 2004 Class Representation For Advanced VLSI Course Instructor : Dr S.M.Fakhraie Presented by : Naser Sedaghati Major Reference : Design and Implementation.
Comparing Memory Systems for Chip Multiprocessors Leverich et al. Computer Systems Laboratory at Stanford Presentation by Sarah Bird.
1 Integration Verification: Re-Create or Re-Use? Nick Gatherer Trident Digital Systems.
CHIPIX65/RD53 collaboration
Section 10: Advanced Topics 1 M. Balakrishnan Dept. of Comp. Sci. & Engg. I.I.T. Delhi.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
A 240ps 64b Carry-Lookahead Adder in 90nm CMOS Faezeh Montazeri Advanced VLSI Course Presentation University of Tehran December.
A Detailed Discussion of SRAM Niels Asmussen Maggie Hamill William Hunt.
Field Programmable Gate Arrays (FPGAs) An Enabling Technology.
1 The First Computer [Adapted from Copyright 1996 UCB]
Slow Wires, Hot Chips, and Leaky Transistors: New Challenges in the New Millenium Norm Jouppi Compaq - WRL Disclaimer: The views expressed herein are the.
Performance and Power Analysis of Globally Asynchronous Locally Synchronous Multiprocessor Systems Zhiyi Yu, Bevan M. Baas VLSI Computation Lab, ECE department,
Development of Programmable Architecture for Base-Band Processing S. Leung, A. Postula, Univ. of Queensland, Australia A. Hemani, Royal Institute of Tech.,
Authors – Jeahyuk huh, Doug Burger, and Stephen W.Keckler Presenter – Sushma Myneni Exploring the Design Space of Future CMPs.
Data Management for Decision Support Session-4 Prof. Bharat Bhasker.
CHIPWORKS CONFIDENTIAL All content © 2013, Chipworks Inc. All rights reserved. Intel® Atom™ Z3000 Processor (Code name Bay Trail) Part number E3845 (Quad.
JOP Java Optimized Processor DI Martin Schöberl. Content Targets Java Virtal Machine Three different architectures Datapath of JOP3 First results.
Baring It All to Software: Raw Machines E. Waingold, M. Taylor, D. Srikrishna, V. Sarkar, W. Lee, V. Lee, J. Kim, M. Frank, P. Finch, R. Barua, J. Babb,
Seok-jae, Lee VLSI Signal Processing Lab. Korea University
A 1.2V 26mW Configurable Multiuser Mobile MIMO-OFDM/-OFDMA Baseband Processor Motivations –Most are single user, SISO, downlink OFDM solutions –Training.
Implementing Tile-based Chip Multiprocessors with GALS Clocking Styles Zhiyi Yu, Bevan Baas VLSI Computation Lab, ECE Department University of California,
Real-Time System-On-A-Chip Emulation.  Introduction  Describing SOC Designs  System-Level Design Flow  SOC Implemantation Paths-Emulation and.
CS203 – Advanced Computer Architecture
Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.
Microprocessor Design Process
Computer Organization IS F242. Course Objective It aims at understanding and appreciating the computing system’s functional components, their characteristics,
VU-Advanced Computer Architecture Lecture 1-Introduction 1 Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 1.
A MIPS R2000 Implementation
ELEC 7770 Advanced VLSI Design Spring 2016 Introduction
SmartCell: A Coarse-Grained Reconfigurable Architecture for High Performance and Low Power Embedded Computing Xinming Huang Depart. Of Electrical and Computer.
Morgan Kaufmann Publishers
A Quantitative Analysis of Stream Algorithms on Raw Fabrics
Circuits and Interconnects In Aggressively Scaled CMOS
Electronics for Physicists
ELEC 7770 Advanced VLSI Design Spring 2014 Introduction
Intel® Atom™ Z3000 Processor (Code name Bay Trail) Part number E3845 (Quad core) 22 nm System on Chip (SoC) Process Technology November 2013.
ELEC 7770 Advanced VLSI Design Spring 2012 Introduction
ELEC 7770 Advanced VLSI Design Spring 2010 Introduction
Alpha 21264: Microarchitecture and Performance
A High Performance SoC: PkunityTM
Electronics for Physicists
Quad ConnexArrayTM Evaluation & Prototype Platform
Presentation transcript:

The Stanford Smart Memories: A 90nm, 55M transistor, 61mm², 8-core chip multiprocessor VLSI technology scaling is driving changes Designs are getting complex and expensive Gate and wire delay balance is changing Current architectures are hard to sustain Communication speed is not scaling Poor modularity Variety of programming models emerge Streams, Multi-Thread, Transactional Memory Per-application optimizations

Design Methodologies Use of Tensilica™ cores Hierarchical verification Proc  Tile  Quad  4-Quads Use of Relaxed Scoreboards for efficient system verification Design emulation using Bee2 Design & Physical Implementation Single Quad – 8 Processors in 4 Tiles + 1 Protocol Controller Physical Statistics & Floor Plan 55M transistors, 2.5M instances ST CMOS 90nm Multi-Vt Nominal Operation: 1.0V core, 1.8V IO MHz variable speed IO clock Fully fine-grained clock gated 7.8mm

Testing Platform SM a b c d Bring-up Test Platform a)SM test chip b)Custom ‘daughter’ board c)Control FPGA on Bee2 board d)Custom double-ended DIMM cards State Of The Testing System configuration Proc ’ s running programs So far so good … testing continues Related Publications Mai_isca’00, Mai_isscc’04, Labonte_pact’04, Leverich_isca’07, Solomatnikov_dac’07, Shacham_micro’08, Firoozshahian_isca’09 Test Chip Bee2 FPGA Board First Heartbeat