Technische universiteit eindhoven ‘Nothing is built on stone; all is built on sand, but we must build as if the sand were stone.’ Jorge Luis Borges (Argentine.

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Control path Recall that the control path is the physical entity in a processor which: fetches instructions, fetches operands, decodes instructions, schedules.
CA 714CA Midterm Review. C5 Cache Optimization Reduce miss penalty –Hardware and software Reduce miss rate –Hardware and software Reduce hit time –Hardware.
1/1/ / faculty of Electrical Engineering eindhoven university of technology Architectures of Digital Information Systems part 5: Special and weird ‘processor’
University of Michigan Electrical Engineering and Computer Science 1 A Distributed Control Path Architecture for VLIW Processors Hongtao Zhong, Kevin Fan,
TU/e Processor Design 5Z0321 Processor Design 5Z032 Computer Systems Overview Chapter 1 Henk Corporaal Eindhoven University of Technology 2011.
DH2T 34 Computer Architecture 1 LO2 Lesson Two CPU and Buses.
Extending the Unified Parallel Processing Speedup Model Computer architectures take advantage of low-level parallelism: multiple pipelines The next generations.
CSCI 8150 Advanced Computer Architecture Hwang, Chapter 1 Parallel Computer Models 1.2 Multiprocessors and Multicomputers.
PipeRench: A Coprocessor for Streaming Multimedia Acceleration Seth Goldstein, Herman Schmit et al. Carnegie Mellon University.
Processor Architectures and Program Mapping TU/e 5kk10 Henk Corporaal Jef van Meerbergen Bart Mesman Exploiting DLP SIMD architectures.
Chapter 5: Computer Systems Organization Invitation to Computer Science, Java Version, Third Edition.
Streaming Supercomputer Strawman Architecture November 27, 2001 Ben Serebrin.
Technische universiteit eindhoven ‘Nothing is built on stone; all is built on sand, but we must build as if the sand were stone.’ Jorge Luis Borges (Argentine.
1 Sec (2.1) Computer Architectures. 2 For temporary storage of information, the CPU contains cells, or registers, that are conceptually similar to main.
Multithreaded ASC Kevin Schaffer and Robert A. Walker ASC Processor Group Computer Science Department Kent State University.
The Imagine Stream Processor Flexibility with Performance March 30, 2001 William J. Dally Computer Systems Laboratory Stanford University
Cluster Prefetch: Tolerating On-Chip Wire Delays in Clustered Microarchitectures Rajeev Balasubramonian School of Computing, University of Utah July 1.
Fall 2008Introduction to Parallel Processing1 Introduction to Parallel Processing.
1Hot Chips 2000Imagine IMAGINE: Signal and Image Processing Using Streams William J. Dally, Scott Rixner, Ujval J. Kapasi, Peter Mattson, Jinyung Namkoong,
Author: D. Brooks, V.Tiwari and M. Martonosi Reviewer: Junxia Ma
CS 7810 Lecture 24 The Cell Processor H. Peter Hofstee Proceedings of HPCA-11 February 2005.
Network-on-Chip: Communication Synthesis Department of Computer Science Texas A&M University.
Embedded Computer Architecture 5KK73 MPSoC Platforms Part2: Cell Bart Mesman and Henk Corporaal.
Interconnection Networks: Introduction
Single-Chip Multi-Processors (CMP) PRADEEP DANDAMUDI 1 ELEC , Fall 08.
Introduction to Interconnection Networks. Introduction to Interconnection network Digital systems(DS) are pervasive in modern society. Digital computers.
Lecture 2: Field Programmable Gate Arrays September 13, 2004 ECE 697F Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays.
A Reconfigurable Processor Architecture and Software Development Environment for Embedded Systems Andrea Cappelli F. Campi, R.Guerrieri, A.Lodi, M.Toma,
CAD for Physical Design of VLSI Circuits
A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.
Chapter 5: Computer Systems Organization Invitation to Computer Science, Java Version, Third Edition.
RICE UNIVERSITY ‘Stream’-based wireless computing Sridhar Rajagopal Research group meeting December 17, 2002 The figures used in the slides are borrowed.
Egle Cebelyte. Random Access Memory is simply the storage area where all software is loaded and works from; also called working memory storage.
Architectural and Physical Design Optimization for Efficient Intra-Tile Communication Liza Rodriguez Aurelio Morales EEL Embedded Systems Dept.
Lecture 8: Processors, Introduction EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014,
RICE UNIVERSITY DSP architectures for wireless communications Sridhar Rajagopal Department of Electrical and Computer Engineering Rice University, Houston.
General Concepts of Computer Organization Overview of Microcomputer.
Caltech CS184b Winter DeHon 1 CS184b: Computer Architecture [Single Threaded Architecture: abstractions, quantification, and optimizations] Day9:
Polygon Rendering on a Stream Architecture John D. Owens, William J. Dally, Ujval J. Kapasi, Scott Rixner, Peter Mattson, Ben Mowery Concurrent VLSI Architecture.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 1 Bundled Execution.
Ben Gaudette Michael Pfeister CSE 520 Spring 2010.
M U N - February 17, Phil Bording1 Computer Engineering of Wave Machines for Seismic Modeling and Seismic Migration R. Phillip Bording February.
Computer Organization. This module surveys the physical resources of a computer system.  Basic components  CPU  Memory  Bus  I/O devices  CPU structure.
WJD Feb 3, 19981Tomorrow's Computing Engines Tomorrow’s Computing Engines February 3, 1998 Symposium on High-Performance Computer Architecture William.
1 Energy-Efficient Register Access Jessica H. Tseng and Krste Asanović MIT Laboratory for Computer Science, Cambridge, MA 02139, USA SBCCI2000.
February 12, 1999 Architecture and Circuits: 1 Interconnect-Oriented Architecture and Circuits William J. Dally Computer Systems Laboratory Stanford University.
RICE UNIVERSITY Flexible wireless communication architectures Sridhar Rajagopal Department of Electrical and Computer Engineering Rice University, Houston.
A Common Machine Language for Communication-Exposed Architectures Bill Thies, Michal Karczmarek, Michael Gordon, David Maze and Saman Amarasinghe MIT Laboratory.
Lecture 3: Computer Architectures
Seok-jae, Lee VLSI Signal Processing Lab. Korea University
The Imagine Stream Processor Ujval J. Kapasi, William J. Dally, Scott Rixner, John D. Owens, and Brucek Khailany Presenter: Lu Hao.
RICE UNIVERSITY Flexible wireless communication architectures Sridhar Rajagopal Department of Electrical and Computer Engineering Rice University, Houston.
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
CIT 140: Introduction to ITSlide #1 CSC 140: Introduction to IT Operating Systems.
Lecture 19: SRAM.
Variable Word Width Computation for Low Power
Architecture & Organization 1
Introduction to Computer Engineering
How does an SIMD computer work?
Laxmi Narayan Bhuyan SIMD Architectures Laxmi Narayan Bhuyan
Defect Tolerance for Nanocomputer Architecture
Stream Architecture: Rethinking Media Processor Design
Architecture & Organization 1
Compiler Supports and Optimizations for PAC VLIW DSP Processors
Introduction to Computer Engineering
Introduction to Computer Engineering
Introduction to Computer Engineering
Introduction to Computer Engineering
Presentation transcript:

technische universiteit eindhoven ‘Nothing is built on stone; all is built on sand, but we must build as if the sand were stone.’ Jorge Luis Borges (Argentine writer ) Department of Electrical Engineering Electronic Systems Modeling of Architectures Platform-based Design 5KK70 Henk Corporaal Bart Mesman Hamed Fatemi 2010

Platform-based Design 5KK70 Electronic Systems 2 Outline We will look at models for Area, Delay and Energy Processor structure Register files - Register cell Model (area, power, delay) details for several register file configurations Apply this to the Imagine architecture Stream register file Network

Platform-based Design 5KK70 Electronic Systems 3 Processor Single processor Instruction Memory (IM) Controller Processing Element (PE) Register File (RF) ALU Data Memory (DM) SIMD Multiple PEs VLIW Multiple ALUs Multi-Processor Several processors Connected by a bus or network IM Controller RFALUDM Network PE

Platform-based Design 5KK70 Electronic Systems 4 Register File (RF) Area model Assume: p = number of ports For large RF row decoder small compared to cell area 1-Bit area = w*h (tracks) Schematic of 1 register cell If p is large 1-bit

Platform-based Design 5KK70 Electronic Systems 5 Register file (RF) Delay model Delay (d): Wire Propagation delay Fan-in/out delay Wire propagation dominates the delay with a large number of ports R = number of registers Register file - assuming square layout - R registers of b bits Note: for N FUs (ALUs), p ~ 3N, R ~ N → d ~ N 3/2

Platform-based Design 5KK70 Electronic Systems 6 Register file (RF) Power model Register file Power (P): Proportional to the capacitance that must be switched for each access In each access every bit-line and one word-line  bit-line capacitance Each port drives (bR) 1/2 bit lines Each bit line has length (h+p) (bR) 1/2 If p is large: power is dominated by wire capacitance Note: for N FUs (ALUs), p ~ 3N, R ~ N → P ~ N 3

Platform-based Design 5KK70 Electronic Systems 7 Register File organization Processor with one level register Central (shared register file) DRF (distributed register file): ALU 1 ALU N ALU 1ALU N

Platform-based Design 5KK70 Electronic Systems 8 Comparing Area model of Central and Distributed RF Central (shared) RF : 2 read ports, one write port per ALU R= rN: number of registers of b bits r: number of register per ALU N: number of ALUs DRF : Only 2 ports: one read, one write This would give A(1 RF) ~ N Area of switch has same area cost complexity Square layout & organization of the DRF, including 2N*N crossbar

Platform-based Design 5KK70 Electronic Systems 9 Delay and Power models of central versus distributed RF Assume N ALUs Central RF: #registers R=rN #ports p =3N Large N DRF: Constant #registers per ALU #ports p=2 (also constant!) DRF has a fixed delay and power (per RF) Wire propagation determines delay and power (for large N) For large N

Platform-based Design 5KK70 Electronic Systems 10 Register File Register (memory) storage and communication between ALUs are critical parts for area, energy and performance in media processor. Hierarchical register storage

Platform-based Design 5KK70 Electronic Systems 11 2-levels register files (Hierarchical) Central: RF1 serves the ALUs, while RF2 is used to cover the memory latency Overall tendency for Area is the same as having one level RF ALU 1 ALU N RF2 (level 2) RF1 (level 1) DRF: ALU 1ALU N RF2 (level 2) RF1 (level 1)

Platform-based Design 5KK70 Electronic Systems 12 Register Files Processor with stream register files: Replace each port into the memory staging RF with a stream buffer All stream buffers share a single port into the memory staging RF, allowing that single physical port to act as many logical ports. Central: ALU 1 ALU N

Platform-based Design 5KK70 Electronic Systems 13 Register Files DRF: The payoff the transformation into a stream architecture is that we can achieve an area proportional to N^2, since R2 (memory storage) only needs 1 port. We also have to add in the area of the stream buffers, which grows as N^2 with a very small constant. ALU 1ALU N

Platform-based Design 5KK70 Electronic Systems 14 Results area per ALU (Normalized to 1 ALU)

Platform-based Design 5KK70 Electronic Systems 15 Results Local delay

Platform-based Design 5KK70 Electronic Systems 16 Results Power overhead

Platform-based Design 5KK70 Electronic Systems 17 Imagine Architecture Die Photo of ImagineCell placement of Imagine

Platform-based Design 5KK70 Electronic Systems 18 Imagine Floorplan 22 million transistors 500 MHz Area, Energy, Delay models Clusters, Micro- controller, SRF, Network Interface

Platform-based Design 5KK70 Electronic Systems 19 Stream register File

Platform-based Design 5KK70 Electronic Systems 20 Network: Area of network grows with (like DRF switch) : More details in khailany paper [2003]

Platform-based Design 5KK70 Electronic Systems 21 Exploration Intra-cluster scaling

Platform-based Design 5KK70 Electronic Systems 22 Exploration Inter-cluster scaling

Platform-based Design 5KK70 Electronic Systems 23 end More details: Scott Rixner, William J. Dally, Brucek Khailany, Peter Mattson, Ujval J.Kapasi, and John D. Owens. Register Organization for Media Processing. In Proceedings of the 6th International Symposium on High-Performance Computer Architecture (HPCA), pages 375–386, Toulouse, France, January IEEE Computer Society. Brucek Khailany, William Dally, Scott Rixner, Ujval Kapasi, John Owens, and Brian Towles. Exploring the vlsi scalability of stream processors. In Proceedings of the Ninth Symposium on High Performance Computer Architecture (HPCA), pages 153– 164, Anaheim, California, USA, February IEEE Computer Society.