CS 258 Reading Assignment 4 Discussion Exploiting Two-Case Delivery for Fast Protected Messages Bill Kramer February 13, 2002 #

Slides:



Advertisements
Similar presentations
Remus: High Availability via Asynchronous Virtual Machine Replication
Advertisements

System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
More on Processes Chapter 3. Process image _the physical representation of a process in the OS _an address space consisting of code, data and stack segments.
AMLAPI: Active Messages over Low-level Application Programming Interface Simon Yau, Tyson Condie,
Institute of Computer Science Foundation for Research and Technology – Hellas Greece Computer Architecture and VLSI Systems Laboratory Exploiting Spatial.
Fast Communication Firefly RPC Lightweight RPC  CS 614  Tuesday March 13, 2001  Jeff Hoy.
1 Cheriton School of Computer Science 2 Department of Computer Science RemusDB: Transparent High Availability for Database Systems Umar Farooq Minhas 1,
Kernel-Kernel Communication in a Shared- memory Multiprocessor Eliseu Chaves, et. al. May 1993 Presented by Tina Swenson May 27, 2010.
Virtual Synchrony Jared Cantwell. Review Multicast Causal and total ordering Consistent Cuts Synchronized clocks Impossibility of consensus Distributed.
ECE 526 – Network Processing Systems Design Software-based Protocol Processing Chapter 7: D. E. Comer.
ECE669 L20: Evaluation and Message Passing April 13, 2004 ECE 669 Parallel Computer Architecture Lecture 20 Evaluation and Message Passing.
Contiki A Lightweight and Flexible Operating System for Tiny Networked Sensors Presented by: Jeremy Schiff.
1 Soft Timers: Efficient Microsecond Software Timer Support For Network Processing Mohit Aron and Peter Druschel Rice University Presented By Jonathan.
User Level Interprocess Communication for Shared Memory Multiprocessor by Bershad, B.N. Anderson, A.E., Lazowska, E.D., and Levy, H.M.
I/O Hardware n Incredible variety of I/O devices n Common concepts: – Port – connection point to the computer – Bus (daisy chain or shared direct access)
Haoyuan Li CS 6410 Fall /15/2009.  U-Net: A User-Level Network Interface for Parallel and Distributed Computing ◦ Thorsten von Eicken, Anindya.
Active Messages: a Mechanism for Integrated Communication and Computation von Eicken et. al. Brian Kazian CS258 Spring 2008.
Processes 1 CS502 Spring 2006 Processes Week 2 – CS 502.
Soft Timers: Efficient Microsecond Software Timer Support For Network Processing Mohit Aron and Peter Druschel Rice University Presented by Reinette Grobler.
USER LEVEL INTERPROCESS COMMUNICATION FOR SHARED MEMORY MULTIPROCESSORS Presented by Elakkiya Pandian CS 533 OPERATING SYSTEMS – SPRING 2011 Brian N. Bershad.
I/O Systems CS 3100 I/O Hardware1. I/O Hardware Incredible variety of I/O devices Common concepts ◦Port ◦Bus (daisy chain or shared direct access) ◦Controller.
1 Last Class: Introduction Operating system = interface between user & architecture Importance of OS OS history: Change is only constant User-level Applications.
Threads CS 416: Operating Systems Design, Spring 2001 Department of Computer Science Rutgers University
COM S 614 Advanced Systems Novel Communications U-Net and Active Messages.
CS533 Concepts of OS Class 16 ExoKernel by Constantia Tryman.
1 OS & Computer Architecture Modern OS Functionality (brief review) Architecture Basics Hardware Support for OS Features.
High-Performance Networks for Dataflow Architectures Pravin Bhat Andrew Putnam.
1 Lecture 4: Threads Operating System Fall Contents Overview: Processes & Threads Benefits of Threads Thread State and Operations User Thread.
Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.
CS533 Concepts of Operating Systems Jonathan Walpole.
LiNK: An Operating System Architecture for Network Processors Steve Muir, Jonathan Smith Princeton University, University of Pennsylvania
Hardware Definitions –Port: Point of connection –Bus: Interface Daisy Chain (A=>B=>…=>X) Shared Direct Device Access –Controller: Device Electronics –Registers:
Xen I/O Overview. Xen is a popular open-source x86 virtual machine monitor – full-virtualization – para-virtualization para-virtualization as a more efficient.
1 Soft Timers: Efficient Microsecond Software Timer Support For Network Processing Mohit Aron and Peter Druschel Rice University Presented By Oindrila.
LimitLESS Directories: A Scalable Cache Coherence Scheme By: David Chaiken, John Kubiatowicz, John Kubiatowicz, Anant Agarwal Anant Agarwal Presented by:
Optimizing Charm++ Messaging for the Grid Gregory A. Koenig Parallel Programming Laboratory Department of Computer.
Computer Network Lab. Korea University Computer Networks Labs Se-Hee Whang.
LRPC Firefly RPC, Lightweight RPC, Winsock Direct and VIA.
6.894: Distributed Operating System Engineering Lecturers: Frans Kaashoek Robert Morris
Chapter 8 System Management Semester 2. Objectives  Evaluating an operating system  Cooperation among components  The role of memory, processor,
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Low Overhead Real-Time Computing General Purpose OS’s can be highly unpredictable Linux response times seen in the 100’s of milliseconds Work around this.
CSCI/CMPE 4334 Operating Systems Review: Exam 1 1.
Soft Timers : Efficient Microsecond Software Timer Support for Network Processing - Mohit Aron & Peter Druschel CS533 Winter 2007.
LimitLess Directories: A Scalable Cache Coherence Scheme By: David Chaiken, John Kubiatowicz, John Kubiatowicz, Anant Agarwal Anant Agarwal.
Chapter 13: I/O Systems.
Kernel Design & Implementation
Module 12: I/O Systems I/O hardware Application I/O Interface
Processes and threads.
Operating Systems CMPSC 473
CS 6560: Operating Systems Design
The Multikernel: A New OS Architecture for Scalable Multicore Systems
Anton Burtsev February, 2017
Presented by Kristen Carlson Accardi
B. N. Bershad, T. E. Anderson, E. D. Lazowska and H. M
Direct Memory Access Disk and Network transfers: awkward timing:
Operating Systems.
Fast Communication and User Level Parallelism
Architectural Support for OS
Presented by Neha Agrawal
CSE 451: Operating Systems Autumn 2003 Lecture 2 Architectural Support for Operating Systems Hank Levy 596 Allen Center 1.
Presented by: SHILPI AGARWAL
CSE 451: Operating Systems Autumn 2001 Lecture 2 Architectural Support for Operating Systems Brian Bershad 310 Sieg Hall 1.
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Operating System , Fall 2000 EA101 W 9:00-10:00 F 9:00-11:00
CSE 451: Operating Systems Winter 2003 Lecture 2 Architectural Support for Operating Systems Hank Levy 412 Sieg Hall 1.
Architectural Support for OS
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Chapter 13: I/O Systems.
CS 5204 Operating Systems Lecture 5
Presentation transcript:

CS 258 Reading Assignment 4 Discussion Exploiting Two-Case Delivery for Fast Protected Messages Bill Kramer February 13, 2002 #

The Basic Problem How to provide “efficient, direct application access to network hardware …while gaining … the benefits of memory based network interfaces” Direct access to the network interface provides a “fast path” for data to the application Can introduce protection and deadlock issues Solution is to transparently switch between different modes of operation Two case delivery One with direct application control of network interfaces One with buffering of messages Use Virtual buffering Efficient use of physical memory Afford VM protection

Why does this work? The direct fast path is the common case. The solution is to optimize the common case Make efficient the less common case of buffering Have to virtualize and protect the access to the network interface Have to do this with reasonable efficiency Uses a light weight messages protocol – User Direct Messaging (UDM) to have every incoming message invoke a user data handler.

Related Work Discussion Model Active Messages Remote Queues Direct Network Devices CM-5 J-Machine *T Interface Alewife CNI Memory Based Interfaces Provides low overhead when network hardware not expensive and latency is not an issue Hybrids

User Direct Messaging Model Messages and operations Operations are atomic No partial packets Inject, extract, peek, etc. Messages are asynchronous and unacknowledged Once a message is in the network, it can assume to eventually get delivered Atomicity Low overhead, virtualized interrupt disable Allows user code to use interrupt driven and/or polling reception Timeout to protect against deadlock and abuse

Two Case Delivery Receipt notification interrupts and traps or polling Direct Access Path Extract either reads from the network device or the buffer depending on where messages are Transparently switched Receipt notification interrupts and traps or polling Physical and virtual atomicity Physical – disable the actual queue Virtual – buffer messages in memory but does not perturb the active process

Two Case Delivery Protection Applied at the receiver Uses GID to label messages GID labels groups of processes Hardware enforced Interrupt Disable revocable bounds how long a user application controls the device Timer counts how long an application controls the interface Messages that are blocked too long – lack of progress Reserved second network to avoid deadlock May be much slower alternative network May be virtual channel reserved on primary physical network

Virtual Buffering Objectives Overhead is due to extra copies Transparency – same semantics as fact case Protection and context switching Three reasons to switch from fast to buffered Page faults in handler Atomicity timeouts Scheduler quantum expirations Management of physical memory Using VM allows virtually unlimited buffer pool Guarantee delivery Avoids deadlock to buffering space Allocates only on demand Overhead is due to extra copies Buffered null message takes 2.7 times as long as fast path

Performance and Experiments Used 5 applications 3 assumed to be computational intensive Ran on simulator with 2 nodes Average cycles between messages ran from 615 to 14,200 Average cycles in handler 149 to 478 Hardware measurements indicated times within 20% of simulation estimates in paper Uses the gang schedule feature of FUGU to introduce “skew” Creates message mismatches and forces buffer mode Used one custom benchmark to assess the sensitivity of performance to overhead in the buffered case

Performance Conclusions Applications tend to avoid buffered mode If there is frequent synchronization of sends and receives Physical memory requirements is low Virtual buffering improves performance Avoids unnecessary use of physical memory Buffered vs. direct trades of scheduling flexibility Buffered case limits throughput of certain program styles Buffered overhead changes the tradeoff point Application demand for buffering is relatively small and increases gracefully

Discussion Are there sufficient mechanisms to assure message protections and deadlock prevention? The 5 benchmark are more computational based. Will there be different issues with other workloads? Are you convinced that the majority of applications require only a small number of physical pages? Is there sufficient evidence that this implementation will scale to large numbers of processors Performance results done on 2 and 4 processors How does one assess this mechanism opposite other mechanisms based on the data provided?