Presenter: Jeremy W. Webb Course: EEC 289Q: Reconfigurable Computing Course Instructor: Professor Soheil Ghiasi Processor Architectures At A Glance: M.I.T.

Slides:



Advertisements
Similar presentations
POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:
Advertisements

The Raw Architecture Signal Processing on a Scalable Composable Computation Fabric David Wentzlaff, Michael Taylor, Jason Kim, Jason Miller, Fae Ghodrat,
THE RAW MICROPROCESSOR: A COMPUTATIONAL FABRIC FOR SOFTWARE CIRCUITS AND GENERAL- PURPOSE PROGRAMS Taylor, M.B.; Kim, J.; Miller, J.; Wentzlaff, D.; Ghodrat,
A reconfigurable system featuring dynamically extensible embedded microprocessor, FPGA, and customizable I/O Borgatti, M. Lertora, F. Foret, B. Cali, L.
1 Advanced VLSI Course Class Presentation An Asynchronous Array of Simple Processors for DSP Applications Rasoul Yousefi Electrical and Computer Engineering.
CGRA QUIZ. Quiz What is the fundamental drawback of fine-grained architecture that led to exploration of coarse grained reconfigurable architectures?
Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.
CSC457 Seminar YongKang Zhu December 6 th, 2001 About Network Processor.
The Raw Processor: A Scalable 32 bit Fabric for General Purpose and Embedded Computing Presented at Hotchips 13 On August 21, 2001 by Michael Bedford Taylor.
PipeRench: A Coprocessor for Streaming Multimedia Acceleration Seth Goldstein, Herman Schmit et al. Carnegie Mellon University.
Processor Technology and Architecture
Spring 08, Jan 15 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Introduction Vishwani D. Agrawal James J. Danaher.
Spring 07, Jan 16 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Introduction Vishwani D. Agrawal James J. Danaher.
ECE 526 – Network Processing Systems Design
Embedded DRAM for a Reconfigurable Array S.Perissakis, Y.Joo 1, J.Ahn 1, A.DeHon, J.Wawrzynek University of California, Berkeley 1 LG Semicon Co., Ltd.
SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu
Chapter 6 Memory and Programmable Logic Devices
Router Architectures An overview of router architectures.
Router Architectures An overview of router architectures.
COM181 Computer Hardware Ian McCrumRoom 5B18,
Evaluating the Raw microprocessor Michael Bedford Taylor Raw Architecture Group Computer Science and AI Laboratory Massachusetts Institute of Technology.
Gigabit Routing on a Software-exposed Tiled-Microprocessor
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
Paper Review Building a Robust Software-based Router Using Network Processors.
February 12, 1998 Aman Sareen DPGA-Coupled Microprocessors Commodity IC’s for the Early 21st Century by Aman Sareen School of Electrical Engineering and.
Tiled Processing Systems
Computer Architecture ECE 4801 Berk Sunar Erkay Savas.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
04/04/20071 Image Understanding Architecture: Exploiting Potential Parallelism in Machine Vision.
SHAPES scalable Software Hardware Architecture Platform for Embedded Systems Hardware Architecture Atmel Roma, INFN Roma, ST Microelectronics Grenoble,
Architectures for mobile and wireless systems Ese 566 Report 1 Hui Zhang Preethi Karthik.
Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:
A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.
Introduction to CMOS VLSI Design Lecture 22: Case Study: Intel Processors David Harris Harvey Mudd College Spring 2004.
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
Tinoosh Mohsenin and Bevan M. Baas VLSI Computation Lab, ECE Department University of California, Davis Split-Row: A Reduced Complexity, High Throughput.
Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.
Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.
Lecture 10: Logic Emulation October 8, 2013 ECE 636 Reconfigurable Computing Lecture 13 Logic Emulation.
Lecture 13: Logic Emulation October 25, 2004 ECE 697F Reconfigurable Computing Lecture 13 Logic Emulation.
Performance and Power Analysis of Globally Asynchronous Locally Synchronous Multiprocessor Systems Zhiyi Yu, Bevan M. Baas VLSI Computation Lab, ECE department,
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Reconfigurable Architectures Forces that drive.
AEEC405 – Microprocessor Architecture. Some Information Instructor Details Main Book.
COARSE GRAINED RECONFIGURABLE ARCHITECTURES 04/18/2014 Aditi Sharma Dhiraj Chaudhary Pruthvi Gowda Rachana Raj Sunku DAY
Hardware Benchmark Results for An Ultra-High Performance Architecture for Embedded Defense Signal and Image Processing Applications September 29, 2004.
DSP Architectures Additional Slides Professor S. Srinivasan Electrical Engineering Department I.I.T.-Madras, Chennai –
February 12, 1999 Architecture and Circuits: 1 Interconnect-Oriented Architecture and Circuits William J. Dally Computer Systems Laboratory Stanford University.
A Programmable Single Chip Digital Signal Processing Engine MAPLD 2005 Paul Chiang, MathStar Inc. Pius Ng, Apache Design Solutions.
Reconfigurable architectures ESE 566. Outline Static and Dynamic Configurable Systems –Static SPYDER, RENCO –Dynamic FIREFLY, BIOWATCH PipeRench: Reconfigurable.
Baring It All to Software: Raw Machines E. Waingold, M. Taylor, D. Srikrishna, V. Sarkar, W. Lee, V. Lee, J. Kim, M. Frank, P. Finch, R. Barua, J. Babb,
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
A New Class of High Performance FFTs Dr. J. Greg Nash Centar ( High Performance Embedded Computing (HPEC) Workshop.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.
Raw Status Update Chips & Fabrics James Psota M.I.T. Computer Architecture Workshop 9/19/03.
High-Bandwidth Packet Switching on the Raw General-Purpose Architecture Gleb Chuvpilo Saman Amarasinghe MIT LCS Computer Architecture Group January 9,
1 Versatile Tiled-Processor Architectures The Raw Approach Rodric M. Rabbah with Ian Bratt, Krste Asanovic, Anant Agarwal.
Creating a Scalable Microprocessor: A 16-issue Multiple-Program-Counter Microprocessor With Point-to-Point Scalar Operand Network Michael Bedford Taylor.
Lecture 17: Dynamic Reconfiguration I November 10, 2004 ECE 697F Reconfigurable Computing Lecture 17 Dynamic Reconfiguration I Acknowledgement: Andre DeHon.
Implementing Tile-based Chip Multiprocessors with GALS Clocking Styles Zhiyi Yu, Bevan Baas VLSI Computation Lab, ECE Department University of California,
A Low-Area Interconnect Architecture for Chip Multiprocessors Zhiyi Yu and Bevan Baas VLSI Computation Lab ECE Department, UC Davis.
Philipp Gysel ECE Department University of California, Davis
VU-Advanced Computer Architecture Lecture 1-Introduction 1 Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 1.
Lynn Choi School of Electrical Engineering
ECE354 Embedded Systems Introduction C Andras Moritz.
Packet Switching on Raw
ESE532: System-on-a-Chip Architecture
Morgan Kaufmann Publishers
Final Project presentation
COMS 361 Computer Organization
Computer Architecture
Presentation transcript:

Presenter: Jeremy W. Webb Course: EEC 289Q: Reconfigurable Computing Course Instructor: Professor Soheil Ghiasi Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP

November 4, 2004Jeremy W. Webb Outline Overview of M.I.T. Raw processor Overview of UC Davis AsAP processor Raw vs. AsAP Conclusion References

November 4, 2004Jeremy W. Webb M.I.T. Raw Processor Raw Project Goals M.I.T. raw Architecture Workstation (Raw) Architecture Raw Processor Tile Array What’s in a Raw Tile? Raw Processor Tile –Inside the Compute Processor –Raw’s Networking Routing Resources –Raw Inter-processor Communication M.I.T. Raw Contributions M.I.T. Raw Novel Features

November 4, 2004Jeremy W. Webb Raw Project Goals 1.Create an architecture that scales to 100’s-1000’s of functional units, by exploiting custom-chip like features while being “general purpose”. [8,9] 2.Support standard general purpose abstractions like context switching, caching, and instruction virtualization.

November 4, 2004Jeremy W. Webb M.I.T. raw Architecture Workstation (Raw) Architecture Composed of a replicated processor tile. [8] 8 stage Pipelined MIPS-like 32-bit processor [7] Static and Dynamic Routers Any tile output can be routed off the edge of the chip to the I/O pins. Chip Bandwidth (16-tile version). –Single channel (32-bit) bandwidth of MHz. –14 channels for a total chip bandwidth of MHz.

November 4, 2004Jeremy W. Webb Raw Processor Tile Array ALU RF I$ PC D$ I$ PC D$ I$ PC D$ I$ PC D$ I$ PC D$ I$ PC D$ I$ PC D$ I$ PC D$ I$ PC D$ I$ PC D$ I$ PC D$ I$ PC D$ I$ PC D$ I$ PC D$ I$ PC D$ I$ PC D$ [8]

November 4, 2004Jeremy W. Webb What’s in a Raw tile? 8 stage Pipelined MIPS-like 32-bit processor [7] Pipelined Floating Point Unit 32KB Data Cache 32KB Instruction Memory Interconnect Routers

November 4, 2004Jeremy W. Webb Raw Processor Tile Compute Processor Routers On-chip networks

November 4, 2004Jeremy W. Webb Inside the Compute Processor IFRFD ATL M1M2 FP E U TV F4WB r26 r27 r25 r24 Input FIFOs from Static Router r26 r27 r25 r24 Output FIFOs to Static Router

November 4, 2004Jeremy W. Webb Raw’s Networking Routing Resources 2 Dynamic Networks [7] –Fire and Forget –Header encodes destination –2 Stage router pipeline 2 Static Networks –Software configurable crossbar –Interlocked and Flow Controlled –5 Stage static router pipeline –3 cycle nearest-neighbor ALU to ALU communication latency –No header overhead, but requires knowledge of communication patterns at compile time

November 4, 2004Jeremy W. Webb Raw Inter-processor Communication [5,6]

November 4, 2004Jeremy W. Webb M.I.T. Raw Contributions Raw’s communication facilitates exploitation of new forms of parallelism in Signal Processing applications [7]

November 4, 2004Jeremy W. Webb M.I.T. Raw Novel Features Dynamic and Static Network Routers. Scalability of Raw chips. Fabricated Raw chips can be placed in an array to further increase the system computing performance. Exposes the complete details of the underlying HW architecture to the SW system.

November 4, 2004Jeremy W. Webb UC Davis AsAP Processor AsAP Project Goals UC Davis Asynchronous Array of simple Processors (AsAP) Architecture Asynchronous Array of simple Processors What’s in an AsAP Tile? AsAP Single Processor Tile AsAP Contributions AsAP Novel Features

November 4, 2004Jeremy W. Webb AsAP Project Goals 1.Well matched with DSP system workloads. 2.High-throughput. 3.Energy-efficient. 4.Address the opportunities and challenges of future VLSI fabrication technologies. AsAP’s proposed architecture targets four key goals: [3]

November 4, 2004Jeremy W. Webb UC Davis Asynchronous Array of simple Processors (AsAP) Architecture Composed of a replicated processor tile. 9-stage pipelined reduced complexity DSP processor [2] Four nearest neighbor inter-processor communication. Individual processor tile can operate at different frequencies than its neighbors. [2] Off chip access to the I/O pins must be reached by routing to boundary processors. Chip Bandwidth –Single channel (16-bit) bandwidth of MHz. The array topology of AsAP is well-suited for applications that are composed of a series of independent tasks. [2] –Each of these tasks can be assigned to one or more processors.

November 4, 2004Jeremy W. Webb Asynchronous Array of simple Processors

November 4, 2004Jeremy W. Webb What’s in an AsAP tile? 16-bit fixed point datapath single issue CPU [1] –Instructions for AsAP processors are 32-bits wide. [2] ALU, MAC Small Instruction/Data Memories –64-entry instruction memory and a 128-word data memory. [2] Hardware address generation –Each processor has 4 address generators that calculate addresses for data memory. [2] Local programmable clock oscillator 2 Input and 1 Output 16-bits wide and 32-words deep dual-clock FIFOs. [2] ~1.1mm 2 /processor in 0.18  m CMOS 800 MHz targeted operation

November 4, 2004Jeremy W. Webb AsAP Single Processor Tile

November 4, 2004Jeremy W. Webb AsAP Inter-processor Communication Each processor output is hard-wired to its four nearest neighbors input multiplexers. At power-up the input multiplexers are configured. As input FIFOs fill up the sourcing neighbor can be halted by asserting corresponding hold signal.

November 4, 2004Jeremy W. Webb AsAP Contributions Provides parallel execution of independent tasks by providing many, parallel, independent processing engines [3] AsAP specifies a homogenous 2-D array of very simple processors –Single-issue pipelined CPUs Independent tasks are mapped across processors and executed in parallel Allows efficient exploitation of Application-level parallelism.

November 4, 2004Jeremy W. Webb AsAP Novel Features AsAP –Many processing elements –High clock rates –Possibly many processors inactive –Activity localized to increase energy efficiency and performance Active Routing Inactive (off)

November 4, 2004Jeremy W. Webb ParameterIBM SA-27E (Raw) [4,5,6] UC Davis AsAP (estimated) [1] Litho180 nm Design StyleStd Cell ASICFull Custom Clk Freq (MHz) BW per I/O Bus13.6 Gb/s12.8 Gb/s # tiles/chip64405 CPU type8-stage MIPS (32-bit floating point) 9-stage reduced complexity DSP (16-bit fixed point) Die Area331 mm 2 ~445 mm 2 Tile Area~5 mm mm 2 Raw vs. AsAP

November 4, 2004Jeremy W. Webb Conclusion The M.I.T. Raw and UC Davis AsAP processors set out to accomplish similar goals, and to some extent have accomplished them. While Raw has a smaller number of processors per chip and more memory, AsAP has a larger number of processors per chip with the ability to distribute the memory hogging tasks over multiple processors. This will certainly allow these processors to compete in many of the same markets.

November 4, 2004Jeremy W. Webb References [1] Michael J. Meeuwsen, Omar Sattari, Bevan M. Baas, “A Full-rate Software Implementation of an IEEE a Compliant Digital Baseband Transmitter,” In Proceedings of the IEEE Workshop on Signal Processing Systems (SIPS '04), October [2] Omar Sattari, "Fast Fourier Transforms on a Distributed Digital Signal Processor," Masters Thesis, Technical Report ECE-CE , Computer Engineering Research Laboratory, ECE Department, University of California, Davis, Davis, CA, [3]Bevan M. Baas, "A Parallel Programmable Energy-Efficient Architecture For Computationally-Intensive DSP Systems," In Signals, Systems and Computers, Conference Record of the Thirty-Seventh Asilomar Conference, November [4] Michael Taylor, “Evaluating the Raw Microprocessor,” presented at the Boston Architecture Research Conference, January 30, 2004 [5] Michael Bedford Taylor, “Design Decisions in the Implementation of a Raw Architecture Workstation,” MS Thesis, Massachusetts Institute of Technology, Cambridge, MA, September, [6] Michael Bedford Taylor, “The Raw Processor Specification (LATEST),” Comprehensive specification for the Raw processor, Cambridge, MA, Continuously Updated [7] David Wentzlaff, Michael Bedford Taylor, et al., “The Raw Architecture: Signal Processing on a Scalable Composable Computation Fabric,” High Performance Embedded Computing Workshop, 2001 [8]Michael Taylor, "Evaluating The Raw Microprocessor: Scalability and Versatility," Presented at the International Symposium on Computer Architecture, June 21, 2004 [9]M.I.T. raw Architecture Workstation website: