Manolis Katevenis FORTH and University of Crete, Greece Interprocessor Communication seen as load/store instruction generalization.

Slides:



Advertisements
Similar presentations
System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
Advertisements

CSCI 8150 Advanced Computer Architecture Hwang, Chapter 1 Parallel Computer Models 1.2 Multiprocessors and Multicomputers.
Trends in Cluster Architecture Steve Lumetta David Culler University of California at Berkeley Computer Science Division.
Fundamental Design Issues for Parallel Architecture Todd C. Mowry CS 495 January 22, 2002.
1 Multiprocessors. 2 Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) bad.
3.5 Interprocess Communication Many operating systems provide mechanisms for interprocess communication (IPC) –Processes must communicate with one another.
Haoyuan Li CS 6410 Fall /15/2009.  U-Net: A User-Level Network Interface for Parallel and Distributed Computing ◦ Thorsten von Eicken, Anindya.
Active Messages: a Mechanism for Integrated Communication and Computation von Eicken et. al. Brian Kazian CS258 Spring 2008.
3.5 Interprocess Communication
Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
7/2/2015 slide 1 PCOD: Scalable Parallelism (ICs) Per Stenström (c) 2008, Sally A. McKee (c) 2011 Scalable Multiprocessors What is a scalable design? (7.1)
1 Today I/O Systems Storage. 2 I/O Devices Many different kinds of I/O devices Software that controls them: device drivers.
CS533 Concepts of OS Class 16 ExoKernel by Constantia Tryman.
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
The Origin of the VM/370 Time-sharing system Presented by Niranjan Soundararajan.
Module I Overview of Computer Architecture and Organization.
A Scalable, Cache-Based Queue Management Subsystem for Network Processors Sailesh Kumar, Patrick Crowley Dept. of Computer Science and Engineering.
Input/Output. Input/Output Problems Wide variety of peripherals —Delivering different amounts of data —At different speeds —In different formats All slower.
Synchronization and Communication in the T3E Multiprocessor.
Disk Access. DISK STRUCTURE Sector: Smallest unit of data transfer from/to disk; 512B 2/4/8 adjacent sectors transferred together: Blocks Read/write heads.
Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Jonathan Walpole (based on a slide set from Vidhya Sivasankaran)
CS533 Concepts of Operating Systems Jonathan Walpole.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
Operating System 4 THREADS, SMP AND MICROKERNELS
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Principles of I/0 hardware.
August 15, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 12: Multiprocessors: Non-Uniform Memory Access * Jeremy R. Johnson.
1 Multiprocessor and Real-Time Scheduling Chapter 10 Real-Time scheduling will be covered in SYSC3303.
CSE 661 PAPER PRESENTATION
I/O Computer Organization II 1 Interconnecting Components Need interconnections between – CPU, memory, I/O controllers Bus: shared communication channel.
1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.
CS533 - Concepts of Operating Systems 1 The Mach System Presented by Catherine Vilhauer.
LRPC Firefly RPC, Lightweight RPC, Winsock Direct and VIA.
Introduction Chapter 1. Definition of a Distributed System (1) A distributed system is: A collection of independent computers that appears to its users.
6.894: Distributed Operating System Engineering Lecturers: Frans Kaashoek Robert Morris
Memory Coherence in Shared Virtual Memory System ACM Transactions on Computer Science(TOCS), 1989 KAI LI Princeton University PAUL HUDAK Yale University.
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
Chapter 6 Storage and Other I/O Topics. Chapter 6 — Storage and Other I/O Topics — 2 Introduction I/O devices can be characterized by Behaviour: input,
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
Virtual Memory Review Goal: give illusion of a large memory Allow many processes to share single memory Strategy Break physical memory up into blocks (pages)
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Presented by: Pierre LaBorde, Jordan Deveroux, Imran Ali, Yazen Ghannam, Tzu-Wei.
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
EE 382 Processor DesignWinter 98/99Michael Flynn 1 EE382 Processor Design Winter 1998 Chapter 8 Lectures Multiprocessors, Part I.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
1 load [2], [9] Transfer contents of memory location 9 to memory location 2. Illegal instruction.
ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Interactions with Microarchitectures and I/O Copyright 2004 Daniel.
Definition of Distributed System
Parallel Computers Definition: “A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems fast.”
Lecture 18: Coherence and Synchronization
12.4 Memory Organization in Multiprocessor Systems
Computer Architecture
CMSC 611: Advanced Computer Architecture
CS 258 Reading Assignment 4 Discussion Exploiting Two-Case Delivery for Fast Protected Messages Bill Kramer February 13, 2002 #
Multiprocessor Introduction and Characteristics of Multiprocessor
Parallel Processing Architectures
Peng Liu Lecture 14 I/O Peng Liu
High Performance Computing
Design Alternatives for SAS: The Beauty of Mobile Homes
CSC3050 – Computer Architecture
Lecture 23: Virtual Memory, Multiprocessors
CSE 471 Autumn 1998 Virtual memory
Wireless Embedded Systems
Lecture 19: Coherence and Synchronization
Lecture 18: Coherence and Synchronization
COT 5611 Operating Systems Design Principles Spring 2014
Virtual Memory 1 1.
Multiprocessors and Multi-computers
Presentation transcript:

Manolis Katevenis FORTH and University of Crete, Greece Interprocessor Communication seen as load/store instruction generalization

Interprocessor Communication - M. Katevenis 2 Summary Central importance of Interprocessor Communication Must go from low-speed I/O to high-speed p2p commun. Data Transfer Primitives: Remote DMA, Remote Queues Cache operations on top of Network Interface primitives? Combined Translation/Routing Tables – Data Migration

Interprocessor Communication - M. Katevenis 3 Communication Primitives: Intra-Node, Inter-Node Processor to Memory communication: –Load/Store primitives Interprocessor communication: –Read/Write (send/receive) data transfer primitives

Interprocessor Communication - M. Katevenis 4 Remote DMA as generalization of single-word Instr’ns block size is chosen so as to reduce overheads relative to data payload

Interprocessor Communication - M. Katevenis 5 Remote DMA is for One-to-One Communication Independent (unsynchronized) transfers have to occur into distinct memory regions Expensive when many potential senders but few actual ones –buffer reservation cost, polling-for-completion overhead

Interprocessor Communication - M. Katevenis 6 Multi-Party Synchronization: Remote Queues Atomic enqueue into shared space, unlike dedicated sp. with RDMA Space reserved only for # of actual senders – not potential senders Speeds up polling / waiting on multiple receive channels Appropriate for synchronization (remote enqueue) & job dispatching (remote dequeue) – generalization of atomic operations

Interprocessor Communication - M. Katevenis 7 Cache Operations related to RDMA Network interface as close to the processor as (L1) cache Messages or RDMA commands composed via store instr’ns Cache line eviction is a case of Remote Write DMA Is there potential for the cache controller to use the primitives supplied by the network interface?

Interprocessor Communication - M. Katevenis 8 Cache Read Misses are like Remote Read DMA’s Network Interface as close to the processor as (L1) cache Should the NI be combined with the cache controller? Should portions of cache coherence protocols be left to the software, with NI hardware assistance for the rest?

Interprocessor Communication - M. Katevenis 9 Network Routing as Generalization of Address Decoding (a) Physical address decoding in a uniprocessor (b) geographical address routing in a multiprocessor

Interprocessor Communication - M. Katevenis 10 Address Translation for Transparent Data Migration Two methods to support data migration: (a) Translation table: first consult an indirection/routing table/directory (b) Cache style: search multiple places in parallel

Interprocessor Communication - M. Katevenis 11 Progressive Translation: Localize Migration Updates Packets carry global virtual addresses Tables provide physical route (address) for the next few steps When page 9 migrates within D, only tables in that domain need updating Variable-size-page translation tables look like internet routing tables (longest-prefix matches if we want small-page-within-big-region migration) Tables that partition the system, for protection against untrusted operating systems, look like internet firewalls

Interprocessor Communication - M. Katevenis 12 Conclusions Hardware should provide “Primitives” – not “solutions” –few, simple, general-purpose, flexibly combinable primitives Data transport primitive: Remote DMA Synchronization primitive: Remote Enqueue Cache operation on top of Network Interface primitives? Translation/Routing Tables for Data Migration support