Caltech CS184 Spring2003 -- DeHon 1 CS184b: Computer Architecture (Abstractions and Optimizations) Day 14: May 7, 2003 Fast Messaging.

Slides:



Advertisements
Similar presentations
System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
Advertisements

More on Processes Chapter 3. Process image _the physical representation of a process in the OS _an address space consisting of code, data and stack segments.
AMLAPI: Active Messages over Low-level Application Programming Interface Simon Yau, Tyson Condie,
UNIT-IV Computer Network Network Layer. Network Layer Prepared by - ROHIT KOSHTA In the seven-layer OSI model of computer networking, the network layer.
Lecture Objectives: 1)Explain the limitations of flash memory. 2)Define wear leveling. 3)Define the term IO Transaction 4)Define the terms synchronous.
Multiple Processor Systems
Chapter 7 Protocol Software On A Conventional Processor.
ECE 526 – Network Processing Systems Design Software-based Protocol Processing Chapter 7: D. E. Comer.
VIA and Its Extension To TCP/IP Network Yingping Lu Based on Paper “Queue Pair IP, …” by Philip Buonadonna.
Architectural Support for OS March 29, 2000 Instructor: Gary Kimura Slides courtesy of Hank Levy.
Figure 2.8 Compiler phases Compiling. Figure 2.9 Object module Linking.
I/O Hardware n Incredible variety of I/O devices n Common concepts: – Port – connection point to the computer – Bus (daisy chain or shared direct access)
CS335 Networking & Network Administration Tuesday, May 11, 2010.
Active Messages: a Mechanism for Integrated Communication and Computation von Eicken et. al. Brian Kazian CS258 Spring 2008.
3.5 Interprocess Communication
Exokernel: An Operating System Architecture for Application-Level Resource Management Dawson R. Engler, M. Frans Kaashoek, and James O’Toole Jr. M.I.T.
TCP: Software for Reliable Communication. Spring 2002Computer Networks Applications Internet: a Collection of Disparate Networks Different goals: Speed,
Ethan Kao CS 6410 Oct. 18 th  Active Messages: A Mechanism for Integrated Communication and Control, Thorsten von Eicken, David E. Culler, Seth.
CS533 Concepts of OS Class 16 ExoKernel by Constantia Tryman.
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Gursharan Singh Tatla Transport Layer 16-May
INPUT/OUTPUT ARCHITECTURE By Truc Truong. Input Devices Keyboard Keyboard Mouse Mouse Scanner Scanner CD-Rom CD-Rom Game Controller Game Controller.
System Calls 1.
3/11/2002CSE Input/Output Input/Output Control Datapath Memory Processor Input Output Memory Input Output Network Control Datapath Processor.
Protocol Layering Chapter 10. Looked at: Architectural foundations of internetworking Architectural foundations of internetworking Forwarding of datagrams.
Caltech CS184 Spring DeHon 1 CS184b: Computer Architecture (Abstractions and Optimizations) Day 16: May 6, 2005 Fast Messaging.
CS533 Concepts of Operating Systems Jonathan Walpole.
LWIP TCP/IP Stack 김백규.
LWIP TCP/IP Stack 김백규.
Networks – Network Architecture Network architecture is specification of design principles (including data formats and procedures) for creating a network.
Three fundamental concepts in computer security: Reference Monitors: An access control concept that refers to an abstract machine that mediates all accesses.
Operating Systems Lecture 2 Processes and Threads Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard. Zhiqing Liu School of.
Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.
1 The Internet and Networked Multimedia. 2 Layering  Internet protocols are designed to work in layers, with each layer building on the facilities provided.
Sami Al-wakeel 1 Data Transmission and Computer Networks The Switching Networks.
Internetworking Internet: A network among networks, or a network of networks Allows accommodation of multiple network technologies Universal Service Routers.
Caltech CS184b Winter DeHon 1 CS184b: Computer Architecture [Single Threaded Architecture: abstractions, quantification, and optimizations] Day14:
I/O Computer Organization II 1 Interconnecting Components Need interconnections between – CPU, memory, I/O controllers Bus: shared communication channel.
Processes and Process Control 1. Processes and Process Control 2. Definitions of a Process 3. Systems state vs. Process State 4. A 2 State Process Model.
CS533 - Concepts of Operating Systems 1 The Mach System Presented by Catherine Vilhauer.
Processes CSCI 4534 Chapter 4. Introduction Early computer systems allowed one program to be executed at a time –The program had complete control of the.
LRPC Firefly RPC, Lightweight RPC, Winsock Direct and VIA.
CALTECH cs184c Spring DeHon CS184c: Computer Architecture [Parallel and Multithreaded] Day 1: April 3, 2001 Overview and Message Passing.
Data Communications and Computer Networks Chapter 4 CS 3830 Lecture 19 Omar Meqdadi Department of Computer Science and Software Engineering University.
Introduction Contain two or more CPU share common memory and peripherals. Provide greater system throughput. Multiple processor executing simultaneous.
Operating Systems Unit 2: – Process Context switch Interrupt Interprocess communication – Thread Thread models Operating Systems.
I/O Software CS 537 – Introduction to Operating Systems.
UDP: User Datagram Protocol Chapter 12. Introduction Multiple application programs can execute simultaneously on a given computer and can send and receive.
Major OS Components CS 416: Operating Systems Design, Spring 2001 Department of Computer Science Rutgers University
Embedded Real-Time Systems Processing interrupts Lecturer Department University.
Xen and the Art of Virtualization
CS161 – Design and Architecture of Computer
CS 286 Computer Organization and Architecture
Final Review CS144 Review Session 9 June 4, 2008 Derrick Isaacson
Transport Layer Unit 5.
CS 31006: Computer Networks – The Routers
Chapter 3: Open Systems Interconnection (OSI) Model
CS 258 Reading Assignment 4 Discussion Exploiting Two-Case Delivery for Fast Protected Messages Bill Kramer February 13, 2002 #
Computer System Overview
Architectural Support for OS
Basic Mechanisms How Bits Move.
Translation Buffers (TLB’s)
CSE 451: Operating Systems Autumn 2003 Lecture 2 Architectural Support for Operating Systems Hank Levy 596 Allen Center 1.
CSE 451: Operating Systems Autumn 2001 Lecture 2 Architectural Support for Operating Systems Brian Bershad 310 Sieg Hall 1.
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
CSE 451: Operating Systems Winter 2003 Lecture 2 Architectural Support for Operating Systems Hank Levy 412 Sieg Hall 1.
Architectural Support for OS
Chapter 13: I/O Systems “The two main jobs of a computer are I/O and [CPU] processing. In many cases, the main job is I/O, and the [CPU] processing is.
Presentation transcript:

Caltech CS184 Spring DeHon 1 CS184b: Computer Architecture (Abstractions and Optimizations) Day 14: May 7, 2003 Fast Messaging

Caltech CS184 Spring DeHon 2 Today Message Handling Requirements Active Messages Processor/Network Interface Integration

Caltech CS184 Spring DeHon 3 What does message handler have to do? Send –allocate buffer to compose outgoing message –figure out destination address routing? –Format header info –compose data for send –put out on network copy to privileged domain check permissions copy to network device checksums?

Caltech CS184 Spring DeHon 4 What does message handler have to do? Receive –queue result –copy buffer from queue to privileged memory –check message intact (checksum) –message arrived right place? –Reorder messages? –Filter out duplicate messages? –Figure out which process/task gets message –check privileges –allocate space for incoming data –copy data to buffer in task –hand off to task –decode message type –dispatch on message type –handle message

Caltech CS184 Spring DeHon Message Handling nCube/2 160  s (360ns/byte) CM-5 86  s (120ns/byte) (compare processors with 33ns cycle) –86  30  2600 cycles

Caltech CS184 Spring DeHon 6 Functions: Destination Addressing/Routing Prefigure and put in pointer/object –hoist out of inner loop of computation –just a lookup TLB-like translation table –provide hardware support for common case

Caltech CS184 Spring DeHon 7 Functions: Allocation Why? In general, –messages may be arbitrarily long many dynamic sized objects... –may not be expecting message ? Remote procedure invocation Remote memory request Hand off to OS –OS user/consumption asynchronous to user process (lifetime unclear)

Caltech CS184 Spring DeHon 8 Functions: Allocation Pre-allocate outside of messaging –when expecting a message –? Standard sizes for common cases? Avoid copying –use shared memory issues with synchronization –direct from user space

Caltech CS184 Spring DeHon 9 Functions: Ordering Network may reorder messages –multiple paths with different lengths, congestion Not all tasks require ordering –(may be ordered at higher level in computation) –dataflow firing example What requires ordering?

Caltech CS184 Spring DeHon 10 Functions: Idempotence Failed Acknowledgments –may lead to multiple delivery of same message Idempotent operations –bit-set, bit-unset, Non-idempotent operations –increment, exchange How make idempotent? –TCP example

Caltech CS184 Spring DeHon 11 Functions: Protection Don’t want messages from other processes/entities –give away information (get) –destroy state (put) –perform operation (transfer funds)

Caltech CS184 Spring DeHon 12 Functions: Protection How manage? –Treat network as IO OS mediates (can) trust message stamps on network –Give network to user messaging hardware tags with process id filter messages on process tags (can) trust message stamps because of hardware –Cryptographic packet encoding

Caltech CS184 Spring DeHon 13 Functions: Checksum Message could be corrupted in transit –likely with high-bit rate, long interconnect (multiple chips…multiple boxes…) Wrong bits –in address –in message id –in data Typically solve in hardware

Caltech CS184 Spring DeHon 14 What does message handler have to do? Send –allocate buffer to compose outgoing message –figure out destination address routing? –Format header info –compose data for send –put out on network copy to privileged domain check permissions copy to network device checksums Not all messages require Hardware support Avoid (don’t do it)

Caltech CS184 Spring DeHon 15 What does message handler have to do? Receive –queue result –copy buffer from queue to privilege memory –check message intact (checksum) –message arrived right place? –Reorder messages? –Filter out duplicate messages? –Figure out which process/task gets message –check privileges –allocate space for incoming data –copy data to buffer in task –hand off to task –decode message type –dispatch on message type –handle message Not all messages require Hardware support Avoid (don’t do it)

Caltech CS184 Spring DeHon 16 End-to-End Variant of the primitives argument Applications/tasks have different requirements/needs Attempt to provide in the network –mismatch –unnecessary Network should be minimal –let application do just what it needs

Caltech CS184 Spring DeHon 17 Active Messages Message contains PC of code to run –destination –message handler PC –data Receiver pickups PC and runs [similar to J-Machine, conv. CPU]

Caltech CS184 Spring DeHon 18 Active Message Dogma Integrate the data directly into the computation Short Runtime –get back to next message, allows to run directly Non-blocking No allocation Runs to completion...Make fast case common

Caltech CS184 Spring DeHon 19 User Level NI Access Avoids context switch Viable if hardware manages process filtering –Single process owns network? –Process tags used to filter into queues (compare process ID tags in virtually mapped cache…register renaming in SMT)

Caltech CS184 Spring DeHon 20 Hardware Support I Checksums Routing ID and route mapping Process ID stamping/checking Low-level formatting

Caltech CS184 Spring DeHon 21 What does AM handler do? Send –compose message destination receiving PC data –copy/queue to NI Receive –pickup PC –dispatch to PC –handler dequeues data into place in computation –[maybe more depending on application] idempotence ordering synchronization

Caltech CS184 Spring DeHon 22 Example: PUT Handler Message: –remote node id –put handler (PC) 0 –remote addr 1 –data length 2 –(flag addr) 3 –data 4 No allocation Idempotent Reciever: poll –r1 = packet_pres –beq r1 0 poll –r2=packet(0) // handler –branch r2 put_handler –r3=packet(1) // addr –r4=packet(2) // len –r6=packet+4 // first datum –r5=r6+r4 // end mdata –*r3=packet(r6) –r6++ –blt r6,r5 mdata –consume packet –goto poll

Caltech CS184 Spring DeHon 23 Example: GET Handler Message Request 1.remote node 2.get handler 3.local addr 4.data length 5.(flag addr) 6.local node 7.remote addr Message Reply can just be a PUT message –put into specified local address

Caltech CS184 Spring DeHon 24 Example: GET Handler get_handler –out_packet(0)=packet(6) –out_packet(1)=put_handler –out_packet(2)=packet(3) –out_packet(3)=packet(4) –r6=4 –r7=packet(7) –r5=packet(4) –r5=r5+4 mdata –packet(r6)=*r7 –r6++ –r7++ –blt r6,r5 mdata –consume packet –send message –goto poll Message Request 1.remote node 2.get handler 3.local addr 4.data length 5.(flag addr) 6.local node 7.remote addr

Caltech CS184 Spring DeHon 25 Active Message Results CM5 (user-level messaging) –send 1.6  s [50 instructions] –receive/dispatch 1.7  s nCube/2 (OS must intervene) –send 11  s [21 instructions] –receive 15  s [34 instructions] Myrinet (GM) –6.5  s end-to-end GMs –1-2  s host processing time

Caltech CS184 Spring DeHon 26 AM as Primitive Model Value of Active Messages –articulates a model of what primitive messaging needs to be –identify key components –then can optimize against how much hardware to support? What should be in hardware/software? What are common cases? –Should get special treatment?

Caltech CS184 Spring DeHon 27 Big Ideas Primitives/Mechanisms –End-to-end/common case don’t burden everything with features needed by only some tasks Abstract Model Separate essential/useful work from overhead

Caltech CS184 Spring DeHon 28 [MSB-1] Ideas Minimize Overhead –moving data around is not value added operation –minimize copying Overlap Compute and Communication –queue –don’t force send/receive rendevousz Get the OS out of the way of common operations