Split-C for the New Millennium

Slides:



Advertisements
Similar presentations
Threads, SMP, and Microkernels
Advertisements

Database Architectures and the Web
Chorus and other Microkernels Presented by: Jonathan Tanner and Brian Doyle Articles By: Jon Udell Peter D. Varhol Dick Pountain.
 2004 Deitel & Associates, Inc. All rights reserved. 1 Chapter 3 – Process Concepts Outline 3.1 Introduction 3.1.1Definition of Process 3.2Process States:
AMLAPI: Active Messages over Low-level Application Programming Interface Simon Yau, Tyson Condie,
Distributed Processing, Client/Server, and Clusters
VIA and Its Extension To TCP/IP Network Yingping Lu Based on Paper “Queue Pair IP, …” by Philip Buonadonna.
Processes CSCI 444/544 Operating Systems Fall 2008.
Figure 2.8 Compiler phases Compiling. Figure 2.9 Object module Linking.
UCB Millennium and the Vineyard Cluster Architecture Phil Buonadonna University of California, Berkeley
Active Messages: a Mechanism for Integrated Communication and Computation von Eicken et. al. Brian Kazian CS258 Spring 2008.
Realizing the Performance Potential of the Virtual Interface Architecture Evan Speight, Hazim Abdel-Shafi, and John K. Bennett Rice University, Dep. Of.
Split-C for the New Millennium Andrew Begel, Phil Buonadonna, David Gay
Using Two Queues. Using Multiple Queues Suspended Processes Processor is faster than I/O so all processes could be waiting for I/O Processor is faster.
COM S 614 Advanced Systems Novel Communications U-Net and Active Messages.
CS364 CH08 Operating System Support TECH Computer Science Operating System Overview Scheduling Memory Management Pentium II and PowerPC Memory Management.
The Structure of Processes (Chap 6 in the book “The Design of the UNIX Operating System”)
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
CE Operating Systems Lecture 3 Overview of OS functions and structure.
Processes. Chapter 3: Processes Process Concept Process Scheduling Operations on Processes Cooperating Processes Interprocess Communication Communication.
CS533 - Concepts of Operating Systems 1 The Mach System Presented by Catherine Vilhauer.
The influence of system calls and interrupts on the performances of a PC cluster using a Remote DMA communication primitive Olivier Glück Jean-Luc Lamotte.
The Client-Server Model And the Socket API. Client-Server (1) The datagram service does not require cooperation between the peer applications but such.
The Mach System Silberschatz et al Presented By Anjana Venkat.
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
Processes. Process Concept Process Scheduling Operations on Processes Interprocess Communication Communication in Client-Server Systems.
Page Replacement Implementation Issues Text: –Tanenbaum ch. 4.7.
Operating Systems: Summary INF1060: Introduction to Operating Systems and Data Communication.
Communication in Distributed Systems. . The single most important difference between a distributed system and a uniprocessor system is the interprocess.
WORKING OF SCHEDULER IN OS
Last Class: Introduction
PROTECTION.
Lecture 3 Process.
Processes and threads.
Virtual Memory CSSE 332 Operating Systems
Chapter 3: Process Concept
Chapter 11: File System Implementation
Prof. Leonardo Mostarda University of Camerino
Operating System Concepts
CS533 Concepts of Operating Systems
Chapter 2 Processes and Threads Today 2.1 Processes 2.2 Threads
Chapter 3 – Process Concepts
File System Implementation
Processes Overview: Process Concept Process Scheduling
Chapter 3: Process Concept
William Stallings Computer Organization and Architecture
#01 Client/Server Computing
Client-Server Interaction
IPC and RPC.
Transport Layer Unit 5.
CMSC 611: Advanced Computer Architecture
Architecture of Parallel Computers CSC / ECE 506 Summer 2006 Scalable Programming Models Lecture 11 6/19/2006 Dr Steve Hunter.
Chapter 4: Processes Process Concept Process Scheduling
Lecture 2: Processes Part 1
Threads and Data Sharing
Recap OS manages and arbitrates resources
Chapter 2: The Linux System Part 2
EECE.4810/EECE.5730 Operating Systems
Prof. Leonardo Mostarda University of Camerino
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Chapter 15: File System Internals
Operating Systems: A Modern Perspective, Chapter 6
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Process-to-Process Delivery: UDP, TCP
COMP755 Advanced Operating Systems
Last Class: Communication in Distributed Systems
#01 Client/Server Computing
Presentation transcript:

Split-C for the New Millennium Andrew Begel, Phil Buonadonna, David Gay {abegel,philipb,dgay}@cs.berkeley.edu

Introduction Project Goals Berkeley’s new Millennium cluster 16 2-way Intel 400 Mhz PII SMPs Myrinet NICs Virtual Interface Architecture (VIA) user-level network Active Messages Split-C Project Goals Implement Active Messages over VIA Implement and measure Split-C over VIA

Network Interface Controller VI Architecture Virtual Address Space RM RM RM VI Consumer VI Send Q Recv Q Descriptor Descriptor Send Doorbell Receive Doorbell Descriptor Descriptor Descriptor Descriptor Status Status Network Interface Controller

Active Messages Paradigm for message-based communication Concept: Overlap communication/computation Implementation Two-phase request/reply pairs Endpoints: Processes Connection to a Virtual Network Bundles: Collection of process endpoints Operations AM_Map(), AM_Request(), AM_Reply(), AM_Poll() Credit based flow-control scheme

AM-VIA Components Send Recv VI Queue (VIQ) VI Logical channel for AM message type VI & independent Send/Receive Queues Independent request credit scheme (counter n) n < k Data (2*k) Data (2*k +1) Send Recv Dxs (2*k) Dxs (2*k +1) VI

AM-VIA Components VI Queue (VIQ) MAP Object Logical channel for AM message type VI & independent Send/Receive Queues Independent request credit scheme (counter n) MAP Object Container for 3 VIQ’s Short,Medium,Long MAP Object

AM-VIA Components VI Queue (VIQ) MAP Object Logical channel for AM message type VI & independent Send/Receive Queues Independent request credit scheme (counter n) MAP Object Container for 3 VIQ’s Short,Medium,Long Single Registered Memory Region MAP Object

AM-VIA Integration Endpoints: Collection of MAP objects Virtual network emulated by point-to-point connections Bundle: Pair of VI Completion Queues Send/Receive Proc A Proc B Proc C

AM-VIA Operations Map Send operations Receive operations Polling Allocates VI and registered memory resources and establishes connections. Send operations Copies data into a free send buffer posts descriptor. Receive operations Short/Long messages: copies data and invokes handler Medium: invokes handler w/ pointer to data buffer Polling Request/Reply marshalling Empties completion queue into Request/Reply FIFO queues Process single Request and/or Reply on each iteration Recycles send descriptors

Design Tradeoffs Logical Channels for Short/Medium/Long messages Balances resources (VI’s, buffering) and reliability Fine grained credit scheme Requires advanced knowledge of reply size. Requires request-reply marshalling upon receipt Data Copying Simplest/Robust means to buffer management Zero copy on medium receives requires k+1 buffering. Completion Queue/Bundle Straightforward implementation of bundle May overflow on high communication volume Prevents endpoint migration

Reflections AMVIA Implementation VI Architecture shortcomings Robust. Works for wide variety of AM applications Performance suffers due to subtle architectural differences VI Architecture shortcomings Lack of support for mapping a VI to a user context VI Naming complicates IPC on the same host Active Message shortcomings Memory Ownership semantics prevent true zero-copy for medium messages Both benefit from some direct hardware support VIA: Hardware doorbell management AM: Distinction of request/reply messages

Split-C C-based shared address space, parallel language Distributed memory, explicit global pointers Split-phase global read/writes: l := r r :- l r := l sync() store_sync() process address Process 0 1 0xdeadbeef ~~ ~~ * ||----|| / | || /-------\/ (oo) (__) Process 1

Implementing Split-C Split-C implemented as a modified gcc compiler Split-phase reads, writes translated to library calls Just need to implement a library Essential library calls: get char sync put int + bulk store_sync store ... Four implementations: Split-C over AMVIA Split-C over reliable VIA Split-C over unreliable VIA Split-C over shared memory + AMVIA x

Split-C over AMVIA Process 0 Process 1 Establish connection between every pair of processes Simple requests/replies to implement get, put, store, e.g.: p0: get(loc, <0x1, 0xbeef>) request "get"(1, loc, 0xbeef) p1 p0 continues program execution (__) (oo) /-------\/ / | || * ||----|| ~~ ~~ Process 2 AM connection

Split-C over AMVIA Process 0 Process 1 Establish connection between every pair of processes Simple requests/replies to implement get, put, store, e.g.: p0: get(loc, <0x1, 0xbeef>) request "get"(1, loc, 0xbeef) p1 p0 continues program execution p1: receive request "get"(…) reply "getr"(loc, a-cow) p0 (__) (oo) /-------\/ / | || * ||----|| ~~ ~~ (__) (oo) /-------\/ / | || * ||----|| ~~ ~~ Process 2 AM connection

Split-C over AMVIA Process 0 Process 1 Establish connection between every pair of processes Simple requests/replies to implement get, put, store, e.g.: p0: get(loc, <0x1, 0xbeef>) request "get"(1, loc, 0xbeef) p1 p0 continues program execution p1: receive request "get"(…) reply "getr"(loc, a-cow) p0 p0: receive reply "getr"(…) store cow at loc (__) (oo) /-------\/ / | || * ||----|| ~~ ~~ (__) (oo) /-------\/ / | || * ||----|| ~~ ~~ Process 2 AM connection

Split-C over Reliable VIA Goal: Reduce send and receive overhead for Split-C operations Method 1: Specialise AMVIA for Split-C library support only short, medium messages remove all dynamic dispatch (AM calls, handler dispatch) reduce message size Method 2: Allow reply-free requests (for stores) reply to every nth store request, rather than every one n = 1/4 of maximum credits

Split-C over Unreliable VIA Replace request/reply mechanism of Split-C over reliable VIA Sliding-window + credit-based protocol Acknowledge processed requests/replies reply-free requests handled automatically Timeouts detected in polling routine (unimplemented) Ack Process Request 99 99 100 100 1 2 3 Stores Request Process Ack 100 101 1 2 3 3

Split-C over Shared Memory Process 1 Local Memory Process 2 Local Memory P1’s view of Process 2 P2’s view of Process 1 Address Spaces on Host mm4.millennium.berkeley.edu P1’s address space P2’s address space How can two processes on the same host communicate? Loopback through network Multi-Protocol VIA Multi-Protocol AM Shared Memory Split-C Each process maps the address space of every other process on the same host into its own. Heap is allocated with Sys V IPC Shared Memory. Data segment is mmapped via /proc file system. Stack is too dynamic to map.

Split-C Microbenchmarks Split-C Store Performance (Short and Bulk Messages) (smaller numbers are better)

Figure : Split-C application performance (bigger is better) Split-C Application Benchmarks Figure : Split-C application performance (bigger is better)

Reflections The specialization of the communications layer for Split-C reduced send and receive overhead. This overhead reduction appears to correlate with increased application performance and scaling. Sharing a process’s address space should be much easier than it is in Linux.

AM(v2) Architecture Components Network Endpoints reply_hndlr_a() reply_hndlr_b() request_hndlr_a() request_hndlr_b() ... ... Network

AM(v2) Architecture Components Endpoints Virtual Networks Proc A Proc B Proc C

AM(v2) Architecture Components Endpoints Virtual Networks Bundles Proc A Components Endpoints Virtual Networks Bundles Proc B Proc C

AM(v2) Architecture Components Operations Credit based flow control Proc A Components Endpoints Virtual Networks Bundles Operations Request / Reply Short, Med, Long Create, Map, Free Poll, Wait Credit based flow control Proc B Proc C

Active Messages Split-phase remote procedure calls Proc A Proc B Concept: Overlap communication/computation Proc A Proc B Request Request Handler Reply Reply Handler