Supporting Systolic and Memory Communication in iWarp (Borkar et al. 1990) presented by Vasily Volkov CS258, Spring 2008, UC Berkeley.

Slides:



Advertisements
Similar presentations
Computer Architecture
Advertisements

Communication-Avoiding Algorithms Jim Demmel EECS & Math Departments UC Berkeley.
System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
1 (Review of Prerequisite Material). Processes are an abstraction of the operation of computers. So, to understand operating systems, one must have a.
EECC756 - Shaaban #1 lec # 1 Spring Systolic Architectures Replace single processor with an array of regular processing elements Orchestrate.
Decoders/DeMUXs CS370 – Spring Decoder: single data input, n control inputs, 2 outputs control inputs (called select S) represent Binary index of.
Khaled A. Al-Utaibi  Computers are Every Where  What is Computer Engineering?  Design Levels  Computer Engineering Fields  What.
Multiple Processor Systems
1 Sec (2.1) Computer Architectures. 2 For temporary storage of information, the CPU contains cells, or registers, that are conceptually similar to main.
Introduction to Systems Architecture Kieran Mathieson.
University College Cork IRELAND Hardware Concepts An understanding of computer hardware is a vital prerequisite for the study of operating systems.
Processor Architecture Kieran Mathieson. Outline Memory CPU Structure Design a CPU Programming Design Issues.
General Purpose Node-to-Network Interface in Scalable Multiprocessors CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley.
ECE 526 – Network Processing Systems Design
PSU CS 106 Computing Fundamentals II Sample Architectures HM 4/14/2008.
Parallel Computer Architectures
2. Methods for I/O Operations
CH01: Architecture & Organization 1 Architecture is those attributes visible to the programmer  Instruction set, number of bits used for data representation,
Chapter 5 Array Processors. Introduction  Major characteristics of SIMD architectures –A single processor(CP) –Synchronous array processors(PEs) –Data-parallel.
CS-334: Computer Architecture
9/20/6Lecture 3 - Instruction Set - Al1 Address Decoding for Memory and I/O.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
Synchronization and Communication in the T3E Multiprocessor.
William Stallings Computer Organization and Architecture 6 th Edition Chapter 1 Introduction.
1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.
Computer Architecture and Organization Introduction.
1 Presenter: Min Yu,Lo 2015/10/9 Lauri Matilainen, Erno Salminen, Timo D. Hamalainen, and Marko Hannikainen International Conference on Embedded.
Chapter One Introduction to Pipelined Processors.
Top Level View of Computer Function and Interconnection.
EKT 422 Computer Architecture
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
1 Introduction CEG 4131 Computer Architecture III Miodrag Bolic.
Computer Architecture Lecture 2 System Buses. Program Concept Hardwired systems are inflexible General purpose hardware can do different tasks, given.
EEE440 Computer Architecture
Supporting Systolic and Memory Communication in iWarp CS258 Paper Summary Computer Science Jaein Jeong.
Introduction Computer System “An electronic device, operating under the control of instructions stored in its own memory unit, that can accept data (input),
ECEG-3202 Computer Architecture and Organization Chapter 3 Top Level View of Computer Function and Interconnection.
Part 3.  What are the general types of parallelism that we already discussed?
E X C E E D I N G E X P E C T A T I O N S L3-CPU IS 4490 N-Tier Client/Server Architectures Dr. Hoganson Kennesaw State University Layer 3 - CPU CPU has.
Chapter 1 Introduction.  Architecture is those attributes visible to the programmer ◦ Instruction set, number of bits used for data representation, I/O.
Dr Mohamed Menacer College of Computer Science and Engineering, Taibah University CE-321: Computer.
1 Design of an MIMD Multimicroprocessor for DSM A Board Which turns PC into a DSM Node Based on the RM Approach 1 The RM approach is essentially a write-through.
William Stallings Computer Organization and Architecture Chapter 1 Introduction.
Chapter 3 System Buses.  Hardwired systems are inflexible  General purpose hardware can do different tasks, given correct control signals  Instead.
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Amdahl’s Law & I/O Control Method 1. Amdahl’s Law The overall performance of a system is a result of the interaction of all of its components. System.
Chapter 1 Introduction.   In this chapter we will learn about structure and function of computer and possibly nature and characteristics of computer.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
William Stallings Computer Organization and Architecture 8th Edition
William Stallings Computer Organization and Architecture 7th Edition
buses, crossing switch, multistage network.
Text Book Computer Organization and Architecture: Designing for Performance, 7th Ed., 2006, William Stallings, Prentice-Hall International, Inc.
William Stallings Computer Organization and Architecture 8th Edition
William Stallings Computer Organization and Architecture 7th Edition
Streaming Sensor Data Fjord / Sensor Proxy Multiquery Eddy
buses, crossing switch, multistage network.
William Stallings Computer Organization and Architecture 8th Edition
William Stallings Computer Organization and Architecture 7th Edition
William Stallings Computer Organization and Architecture
Presentation transcript:

Supporting Systolic and Memory Communication in iWarp (Borkar et al. 1990) presented by Vasily Volkov CS258, Spring 2008, UC Berkeley

Fine-grain parallelism: how? Borrow ideas from systolic arrays! Systolic arrays: a multiprocessor architecture Replication of PEs, not unsimilar to SIMD Fine-grain communication, pipeline-style Requires special algorithms, special-purpose hardware The idea: direct PE-to-PE communication (inexpensive?!) conventional systolic

Traditional (memory) communication Decoupled computation/communication Legacy of networkless stations?

Systolic communication Do not get memory involved Requires special CPU support

iWarp system Both systolic and memory communication – Systolic communication = performance – Memory communication = general purpose Parallel with vector processors: – They usually have both vector and scalar units – And get best of both Will this idea be similarly successful? – Was manufactured by Intel – But not anymore

Outline of the base system 8x8 mesh or torus (can be scaled to 32x32) Distributed memory Custom network, custom nodes Communication layer implemented in hardware – On the same chip with CPU Parallel systemiWarp Cell iWarp Component

Program access to communication Network input/output queues are accessible via CPU registers (“gates”) Reading from gate pops data from the input queue, writing – inserts into the output queue One instruction can involve up to 4 communication operations! (e.g. D=C+A*B) Reading = polling (vs. interrupts in MDP) Stall if input queue is empty or if output queue is full Option to spill queues to memory

Bandwidth reservation Logical channels (aka virtual channels) – Multiplexed over physical buses (roundrobin) – Idle and blocked virtual channels don’t participate Two routing modes – Route messages individually Logical channels are acquired and released for transporting each message – Route via an established connection (pathway) Acquire a sequence of logical channels first Use these resources for transport