Evaluating the Raw microprocessor Michael Bedford Taylor Raw Architecture Group Computer Science and AI Laboratory Massachusetts Institute of Technology.

Slides:



Advertisements
Similar presentations
The Raw Architecture Signal Processing on a Scalable Composable Computation Fabric David Wentzlaff, Michael Taylor, Jason Kim, Jason Miller, Fae Ghodrat,
Advertisements

Future of Microprocessors
Intel Microprocessor Features Pentium  IIPentium  IIIPentium  4 MPR Issue June 1997April 2000Dec 2001 Clock Speed 266 MHz1GHz2GHz Pipeline Stages 12/14.
THE RAW MICROPROCESSOR: A COMPUTATIONAL FABRIC FOR SOFTWARE CIRCUITS AND GENERAL- PURPOSE PROGRAMS Taylor, M.B.; Kim, J.; Miller, J.; Wentzlaff, D.; Ghodrat,
Khaled A. Al-Utaibi  Computers are Every Where  What is Computer Engineering?  Design Levels  Computer Engineering Fields  What.
1 Architectural Complexity: Opening the Black Box Methods for Exposing Internal Functionality of Complex Single and Multiple Processor Systems EECC-756.
Presenter: Jeremy W. Webb Course: EEC 289Q: Reconfigurable Computing Course Instructor: Professor Soheil Ghiasi Processor Architectures At A Glance: M.I.T.
CGRA QUIZ. Quiz What is the fundamental drawback of fine-grained architecture that led to exploration of coarse grained reconfigurable architectures?
Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
CSC457 Seminar YongKang Zhu December 6 th, 2001 About Network Processor.
Spartan II Features  Plentiful logic and memory resources –15K to 200K system gates (up to 5,292 logic cells) –Up to 57 Kb block RAM storage  Flexible.
The Raw Processor: A Scalable 32 bit Fabric for General Purpose and Embedded Computing Presented at Hotchips 13 On August 21, 2001 by Michael Bedford Taylor.
Single-Chip Multiprocessor Nirmal Andrews. Case for single chip multiprocessors Advances in the field of integrated chip processing. - Gate density (More.
A tiled processor architecture prototype: the Raw microprocessor October 02.
Supporting Systolic and Memory Communication in iWarp (Borkar et al. 1990) presented by Vasily Volkov CS258, Spring 2008, UC Berkeley.
MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group Computer Laboratory University of Cambridge, UK.
1 Digital Space Anant Agarwal MIT and Tilera Corporation.
Scalar Operand Networks for Tiled Microprocessors Michael Taylor Raw Architecture Project MIT CSAIL (now at UCSD)
Parallel Computer Architectures
Introduction to FPGA and DSPs Joe College, Chris Doyle, Ann Marie Rynning.
Interconnection and Packaging in IBM Blue Gene/L Yi Zhu Feb 12, 2007.
Octavo: An FPGA-Centric Processor Architecture Charles Eric LaForest J. Gregory Steffan ECE, University of Toronto FPGA 2012, February 24.
Gigabit Routing on a Software-exposed Tiled-Microprocessor
February 12, 1998 Aman Sareen DPGA-Coupled Microprocessors Commodity IC’s for the Early 21st Century by Aman Sareen School of Electrical Engineering and.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
1 Hardware Support for Collective Memory Transfers in Stencil Computations George Michelogiannakis, John Shalf Computer Architecture Laboratory Lawrence.
K.C.RAVINDRAN,GRAPES-3 EXPERIMENT,OOTY 1 Development of fast electronics for the GRAPES-3 experiment at Ooty K.C. RAVINDRAN On Behalf of GRAPES-3 Collaboration.
CSE 494: Electronic Design Automation Lecture 2 VLSI Design, Physical Design Automation, Design Styles.
J. Christiansen, CERN - EP/MIC
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR FPGA Fabric n Elements of an FPGA fabric –Logic element –Placement –Wiring –I/O.
Safe Overclocking Safe Overclocking of Tightly Coupled CGRAs and Processor Arrays using Razor © 2012 Guy Lemieux Alex Brant, Ameer Abdelhadi, Douglas Sim,
Computer Maintenance Unit Subtitle: CPU’s UNT in partnership with TEA, Copyright ©. All rights reserved1.
Jump to first page One-gigabit Router Oskar E. Bruening and Cemal Akcaba Advisor: Prof. Agarwal.
Radix-2 2 Based Low Power Reconfigurable FFT Processor Presented by Cheng-Chien Wu, Master Student of CSIE,CCU 1 Author: Gin-Der Wu and Yi-Ming Liu Department.
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 ECSE-6600: Internet Protocols Informal Quiz #14 Shivkumar Kalyanaraman: GOOGLE: “Shiv RPI”
Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.
Lecture 13: Logic Emulation October 25, 2004 ECE 697F Reconfigurable Computing Lecture 13 Logic Emulation.
EE3A1 Computer Hardware and Digital Design
Fabric System Architecture Design: two distinct board designs; replicate and connect –modularity allows us to build any configuration of size 2 n Board.
BMOW is a Custom CPU Design Like your PC’s Pentium, but much simpler Closest cousin is the MOS 6502 used in the Apple II, C-64, and Atari VCS =
Computer and Information Sciences College / Computer Science Department CS 206 D Computer Organization and Assembly Language.
Computer performance issues* Pipelines, Parallelism. Process and Threads.
The Raw Architecture A Concrete Perspective Michael Bedford Taylor Raw Architecture Group Laboratory for Computer Science Massachusetts Institute of Technology.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Raw Status Update Chips & Fabrics James Psota M.I.T. Computer Architecture Workshop 9/19/03.
High-Bandwidth Packet Switching on the Raw General-Purpose Architecture Gleb Chuvpilo Saman Amarasinghe MIT LCS Computer Architecture Group January 9,
Creating a Scalable Microprocessor: A 16-issue Multiple-Program-Counter Microprocessor With Point-to-Point Scalar Operand Network Michael Bedford Taylor.
Implementing Tile-based Chip Multiprocessors with GALS Clocking Styles Zhiyi Yu, Bevan Baas VLSI Computation Lab, ECE Department University of California,
Evaluating The Raw Microprocessor: Scalability and Versatility Michael Taylor Walter Lee, Jason Miller, David Wentzlaff, Ian Bratt, Ben Greenwald, Henry.
On-chip Parallelism Alvin R. Lebeck CPS 220/ECE 252.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
Microprocessor Design Process
SPRING 2012 Assembly Language. Definition 2 A microprocessor is a silicon chip which forms the core of a microcomputer the concept of what goes into a.
Computer Maintenance Unit Subtitle: CPU’s Trade & Industrial Education
Lynn Choi School of Electrical Engineering
Multicores from the Compiler's Perspective A Blessing or A Curse?
Assembly Language for Intel-Based Computers, 5th Edition
Phnom Penh International University (PPIU)
Microprocessor.
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
How does an SIMD computer work?
Michael Bedford Taylor, Walter Lee, Saman Amarasinghe, Anant Agarwal
Intel® Atom™ Z3000 Processor (Code name Bay Trail) Part number E3845 (Quad core) 22 nm System on Chip (SoC) Process Technology November 2013.
Morgan Kaufmann Publishers Computer Organization and Assembly Language
Computer Evolution and Performance
RAW Scott J Weber Diagrams from and summary of:
Michael Bedford Taylor, Walter Lee, Saman Amarasinghe, Anant Agarwal
Presentation transcript:

Evaluating the Raw microprocessor Michael Bedford Taylor Raw Architecture Group Computer Science and AI Laboratory Massachusetts Institute of Technology

Evaluating the Raw microprocessor Brief Overview of Raw Architecture Avenues of Evaluation Empirical- Comparison with P3 Analytical- Modeling Large scale ILP Experiential- Experimental Systems

The Raw Architecture Divide the silicon into an array of identical, programmable tiles. (A signal can get through a small amount of logic and to the next tile in one cycle.)

Raw Architecture Compute Processor Routers On-chip networks

Raw Architecture Compute Processor Routers On-chip networks

IFRFD ATL M1M2 FP E U TV F4WB r26 r27 r25 r24 Input FIFOs from Static Router r26 r27 r25 r24 Output FIFOs to Static Router Inside the compute processor – networks are integrated directly into the bypass paths

pval5=seed.0*6.0 pval4=pval5+2.0 tmp3.6=pval4/3.0 tmp3=tmp3.6 v3.10=tmp3.6-v2.7 v3=v3.10 v2.4=v2 pval3=seed.o*v2.4 tmp2.5=pval3+2.0 tmp2=tmp2.5 pval6=tmp1.3-tmp2.5 v2.7=pval6*5.0 v2=v2.7 seed.0=seed pval1=seed.0*3.0 pval0=pval1+2.0 tmp0.1=pval0/2.0 tmp0=tmp0.1 v1.2=v1 pval2=seed.0*v1.2 tmp1.3=pval2+2.0 tmp1=tmp1.3 pval7=tmp1.3+tmp2.5 v1.8=pval7*3.0 v1=v1.8 v0.9=tmp0.1-v1.8 v0=v0.9 pval5=seed.0*6.0 pval4=pval5+2.0 tmp3.6=pval4/3.0 tmp3=tmp3.6 v3.10=tmp3.6-v2.7 v3=v3.10 v2.4=v2 pval3=seed.o*v2.4 tmp2.5=pval3+2.0 tmp2=tmp2.5 pval6=tmp1.3-tmp2.5 v2.7=pval6*5.0 v2=v2.7 seed.0=seed pval1=seed.0*3.0 pval0=pval1+2.0 tmp0.1=pval0/2.0 tmp0=tmp0.1 v1.2=v1 pval2=seed.0*v1.2 tmp1.3=pval2+2.0 tmp1=tmp1.3 pval7=tmp1.3+tmp2.5 v1.8=pval7*3.0 v1=v1.8 v0.9=tmp0.1-v1.8 v0=v0.9 Raw’s bypass-integrated on-chip networks serve as a Scalar Operand Network, or SON. Multiple Raw tiles Program graph

Empirical Evaluation Comparison to P3 ParameterRaw (IBM ASIC)P3 (Intel) Litho180 nm ProcessCMOS 7SFP858 Metal LayersCu 6Al 6 FO1 Delay23 ps11 ps Dielectric k Design StyleStandard CellFull custom Initial Freq425 MHz MHz Die Area331 mm mm 2

Analytical Evaluation Scalar Operand Network Research (SONs). (See HPCA 2003 and future.)

Scalar Operand Network The network and the associated algorithms that are responsible for matching operands and operations In space.

SON Performance Metric: 5-tuple conventional distributed multiprocessor Superscalar (not scalable)

Raw: a new point in the region. conventional distributed multiprocessor Raw SON Superscalar SON (not scalable)

Impact of Receive Occupancy, 64 tiles, i.e.,

Experiential Evaluation (i.e., Real Hardware, Real Systems) Systems Online or in Pipeline Workstation Microphone Array Fabric System (Software Radio on Raw) (IP Routing on Raw)

Raw Chip Specifications IBM SA27E Process 180 nm, 6-metal copper ASIC process 16 Tile RAW Processor 18.23mm x 18.23mm 1657 pin CCGA package 1152 HSTL signal pins Clock and Power 420MHz (actual) 10 watts (power save mode) 18 watts typical 35 watts max

.. twenty-eight 32-bit buses connecting Raw Chip to I/O and Memory System Raw Motherboard

2 Microphone Board 2 Microphones 1 A-to-D 1 CPLD 2 Connectors

1020 Element Microphone Array

Fabric System Architecture Design: two distinct board types Board 1: Quad Raw Board Board 2: I/O & Memory Board Replicate and connect

Summary