Douglas Thain and Miron Livny Computer Sciences Department University of Wisconsin-Madison

Slides:



Advertisements
Similar presentations
1 Communication in Distributed Systems REKs adaptation of Tanenbaums Distributed Systems Chapter 2.
Advertisements

Lecture 101 Lecture 10: Kernel Modules and Device Drivers ECE 412: Microcomputer Laboratory.
Remote Procedure Call Design issues Implementation RPC programming
Chorus and other Microkernels Presented by: Jonathan Tanner and Brian Doyle Articles By: Jon Udell Peter D. Varhol Dick Pountain.
Tam Vu Remote Procedure Call CISC 879 – Spring 03 Tam Vu March 06, 03.
Remote Procedure CallCS-4513, D-Term Remote Procedure Call CS-4513 Distributed Computing Systems (Slides include materials from Operating System.
Today’s topic: –File operations –I/O redirection –Inter-process communication through pipes.
The ‘system-call’ ID-numbers How can Linux applications written in assembly language create and access files?
Douglas Thain Computer Sciences Department University of Wisconsin-Madison (In Bologna for June 2000) Remote.
CS490T Advanced Tablet Platform Applications Network Programming Evolution.
Chapter 15 – Part 2 Networks The Internal Operating System The Architecture of Computer Hardware and Systems Software: An Information Technology Approach.
Reliable I/O on the Grid Douglas Thain and Miron Livny Condor Project University of Wisconsin.
Threads 1 CS502 Spring 2006 Threads CS-502 Spring 2006.
Douglas Thain and Miron Livny Computer Sciences Department University of Wisconsin-Madison
Design and Implementation of the Log-based File I/O Library for Sandboxing CSE 542 Operating Systems Fall 2005 Qi Liao and Xuwen Yu.
CS-341 Dick Steflik Introduction. C++ General purpose programming language A superset of C (except for minor details) provides new flexible ways for defining.
The Condor Data Access Framework GridFTP / NeST Day 31 July 2001 Douglas Thain.
CSE 451 Section 4 Project 2 Design Considerations.
Lecture 17 FS APIs and vsfs. File and File Name What is a File? Array of bytes. Ranges of bytes can be read/written. File system consists of many files,
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
Zach Miller Condor Project Computer Sciences Department University of Wisconsin-Madison Flexible Data Placement Mechanisms in Condor.
POSIX: Files Introduction to Operating Systems: Discussion 1 Read Solaris System Interface Guide: Ch. 5.1 Basic File I/O.
System Calls 1.
PVM. PVM - What Is It? F Stands for: Parallel Virtual Machine F A software tool used to create and execute concurrent or parallel applications. F Operates.
CS 390- Unix Programming Environment CS 390 Unix Programming Environment Topics to be covered: Distributed Computing Fundamentals.
Chapter 2: Operating-System Structures. 2.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 2: Operating-System Structures Operating.
File System Review bottomupcs.com J. Kubiatowicz, UC Berkeley.
1 The Roadmap to New Releases Todd Tannenbaum Department of Computer Sciences University of Wisconsin-Madison
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
UNIX Files File organization and a few primitives.
The Roadmap to New Releases Derek Wright Computer Sciences Department University of Wisconsin-Madison
Derek Wright Computer Sciences Department University of Wisconsin-Madison MPI Scheduling in Condor: An.
CSE 451: Operating Systems Winter 2015 Module 22 Remote Procedure Call (RPC) Mark Zbikowski Allen Center 476 © 2013 Gribble, Lazowska,
Page 1 Remote Procedure Calls Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content of this presentation.
1 Developing Application in Distributed Computing Environment (DCE)
Remote Procedure Calls CS587x Lecture Department of Computer Science Iowa State University.
Operating Systems CSE 411 CPU Management Sept Lecture 10 Instructor: Bhuvan Urgaonkar.
Remote Procedure Call RPC
CSCI 330 UNIX and Network Programming Unit VII: I/O Management I.
Chapter 13 : Symbol Management in Linking
Full and Para Virtualization
January 7, 2003Serguei Mokhov, 1 File I/O System Calls Reference COMP 229, 444, 5201 Revision 1.2 Date: July 21, 2004.
File I/O open close lseek read and write – unbuffered I/O dup and dup2.
Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.
Operating Systems: Summary INF1060: Introduction to Operating Systems and Data Communication.
Jaime Frey Computer Sciences Department University of Wisconsin-Madison Condor and Virtual Machines.
OS interface: file and I/O system calls File operations in C/C++? –fopen(), fread(), fwrite(), fclose(), fseek() in C f.open(…), f.close(…) in C++ I/O.
© Oxford University Press 2011 DISTRIBUTED COMPUTING Sunita Mahajan Sunita Mahajan, Principal, Institute of Computer Science, MET League of Colleges, Mumbai.
1 COMP 3500 Introduction to Operating Systems Project 4 – Processes and System Calls Overview Dr. Xiao Qin Auburn University
File table: a list of opened files Each entry contains: – Index: file descriptors – Pointer to the file in memory – Access mode File descriptor is a positive.
7-Nov Fall 2001: copyright ©T. Pearce, D. Hutchinson, L. Marshall Oct lecture23-24-hll-interrupts 1 High Level Language vs. Assembly.
Protecting Memory What is there to protect in memory?
Protecting Memory What is there to protect in memory?
Condor: Job Management
OS Virtualization.
Remote I/O in Condor.
CSE 451: Operating Systems Winter 2006 Module 20 Remote Procedure Call (RPC) Ed Lazowska Allen Center
Making Virtual Memory Real: The Linux-x86-64 way
DISTRIBUTED COMPUTING
CSE 451: Operating Systems Autumn 2003 Lecture 16 RPC
CSE 451: Operating Systems Winter 2007 Module 20 Remote Procedure Call (RPC) Ed Lazowska Allen Center
CSE 333 – Section 3 POSIX I/O Functions.
Chapter 15 – Part 2 Networks The Internal Operating System
CSE 451: Operating Systems Winter 2004 Module 19 Remote Procedure Call (RPC) Ed Lazowska Allen Center
CSE 333 – Section 3 POSIX I/O Functions.
CSE 451: Operating Systems Spring 2012 Module 22 Remote Procedure Call (RPC) Ed Lazowska Allen Center
CSE 451: Operating Systems Autumn 2009 Module 21 Remote Procedure Call (RPC) Ed Lazowska Allen Center
CSE 451: Operating Systems Autumn 2010 Module 21 Remote Procedure Call (RPC) Ed Lazowska Allen Center
CSE 451: Operating Systems Winter 2003 Lecture 16 RPC
CSE 451: Operating Systems Messaging and Remote Procedure Call (RPC)
Presentation transcript:

Douglas Thain and Miron Livny Computer Sciences Department University of Wisconsin-Madison Bypass: A tool for building distributed systems

Building distributed systems is hard.

Bypass makes building split execution systems easy. Bypass is to split execution systems as yacc is to compilers

Problem: Unfriendly Machines › Many systems can distribute your jobs to available machines scattered around the world. (rsh, Condor, Globus, etc...) › But... the machines you have access to may not be properly equipped to run your job.

Problem: Unfriendly Machines › An unfriendly machine…  allows you to login under some identity.  allows you to execute your program.  might not have your files or a shared file system!  might not have space for your output!  might be a different architecture or OS! (If you want to use a lot of machines, you can’t be picky!)

home machine foreign machine HELP! core dumped

Solution: Split Execution › General strategy:  An agent process traps some of the application's standard library calls.  Some of the calls can be executed at the foreign machine.  Some of the calls are sent via RPC back to the home machine.  A shadow process at the home machine executes the RPCs and sends the results back to the agent.

Solution: Split Execution Shadow Kernel Local system calls Home Machine Agent Application Kernel Local system calls Trapped system calls Foreign Machine Remote system calls

Split Execution is an Open Research Topic › We want to explore many possibilities:  Foreign machine could be partially friendly – has some needed resources, but not all.  Data may be buffered and cached at both the agent and the shadow.  What procedure calls to trap depends on the application and the services needed.  Some procedure calls could be routed to third parties such as file servers.  …

Problem: Split Execution is Hard › One example of many: Trapping stat()  Different data types: struct stat, struct stat64 Depending on system, integer elements are 2->8 bytes  Multiple entry points: stat, _stat, __libc_stat  Surprises: #define stat(a,b) _fxstat(VERSION,a,b)

Solution: Bypass › Bypass takes a specification of a split execution system and produces a matched shadow and agent. › Bypass hides all of the ugly details of trapping, type conversion, and RPCs. › Bypass lets you:  split any dynamically-linked application.  transparently use heterogeneous systems.  trap calls with minimal overhead.  control execution paths with plain C++.

home machine foreign machine Just like home!

Bypass Language › Declare what procedures to trap in C++ › Annotate pointer types with data flow.  Direction: in, out, or in out  Binary data: give expression yielding the number of bytes to send/receive. › Give two function bodies:  agent_action  shadow_action

ssize_t write ( int fd, in "length" const void *data, size_t length ) agent_action {{ if( fd==1 ) { return bypass_shadow_write(fd,data,length); } else { return write(fd,data,length); } }} shadow_action {{ printf("remote data: %s", data ); }} ;

Agent Action › Any arbitrary C++ code. › When the program invokes write(), the agent_action is executed at the home machine. › Within the agent_action:  write() - Invoke the original write() at the foreign machine.  bypass_shadow_write() - Invoke the shadow_action via RPC.

Shadow Action › Any arbitrary C++ code. › If the agent decides to invoke the RPC to the shadow, the shadow_action is executed at the home machine. › Within the shadow_action:  write() - Invoke write() at the home machine.

Using Bypass › Run "bypass" to read the specification and produce C++ source code: % bypass -agent -shadow simple.bypass › The shadow is compiled into a plain executable. › The agent is compiled into a shared library.

Using Bypass › The dynamic linker is used to force the agent into an executable at run-time: setenv LD_PRELOAD simple_agent.so › Procedure calls are “trapped” merely by putting the agent first in the link list. › This method can be used on any dynamically- linked program: tcsh, netscape, emacs…

Example Application: Complete Remote I/O › Trap all the standard I/O calls, and send them to the home machine unmodified: open(in string char *path, int flags, int mode); close(int fd); int read(int fd, out “length” void *data, int length ); int write(int fd, in “length” void *data, int length ); int lseek(int fd, off_t offset, int whence );

Complete Remote I/O Shadow Kernel open, close, read, write, lseek Home Machine Agent Application Kernel all other calls Trapped system calls Foreign Machine open, close read, write, lseek

Example Application: Remote Console › Trap only read and write, and send operations on standard files back to a single shadow process. int read( int fd, in “length” void *data, int length ) agent_action {{ if( fd<3 ) { bypass_remote_read( fd, data,length ); } else { return read(fd,data,length); } }};

Remote Console Shadow Kernel read, write Home Machine Agent Application Kernel all other calls Trapped system calls Foreign Machine Agent Application Kernel all other calls Trapped system calls Foreign Machine Agent Application Kernel all other calls Trapped system calls Foreign Machine Standard I/O reads and writes

Example Application: Attach New Filesystem › Trap standard I/O calls and replace them with calls to a user-level filesystem library, such as Globus GASS. int open( in string const char *path, int flags, int mode ) agent_action {{ return globus_gass_open( path, flags, mode ); }}; int close( int fd ) agent_action {{ return globus_gass_close( fd ); }};

Attach New Filesystem Agent Application Kernel all other calls Trapped system calls Foreign Machine Globus Library more system calls open close THE GRID

Bypass can be used by Real Users! › Bypass works on unmodified executables.  (Real Users are not willing/able to rewrite/recompile their programs.) › Bypass requires no special privileges.  (Real Users do not have the root password) › Thus, Bypass allows a Real User to make good use of a remote cluster without begging the administrator to configure it to his/her needs.

Performance › Overhead of trapping a system call is very small: 1-4 us  The "trapping mechanism" simply interposes a few extra function calls.  Small compared to the expense of a real system call (about 10-70us) › Remote procedure calls are, as expected, much slower: about 1 ms under the best conditions.

Related Work › “Classic” RPC and XDR:  Define standard integer sizes, endianness, etc.  Start by defining external protocol, then produce programming interface which is not always convenient: struct read_results * read_1( int fd, int length );

Related Work › Bypass:  We are stuck with existing interfaces, so annotate them to produce a protocol: int read( int fd, out “length” void *data, int length );  Do “best effort” conversion to/from external data format: off_t is 4 bytes on some platforms, 8 bytes on others. A conversion might fail!  Define canonical values for source-level symbols: O_CREAT has different values on Linux and Solaris!

Related Work › Hunt and Brubacher, “Detours”  Trap library calls on NT using binary rewriting – can be applied to any executable.  Make original procedure available through special “trampoline” call.  Bypass leaves the original entry point intact, so subroutines need not be re-written to use the trampoline.

Related Work › Alexandrov, et al., “UFO”  Use a kernel-level facility to trap all of a process’ system calls and translate some of them into WWW operations.  The kernel mechanism is secure and can be applied to any process.  But… it has a high (7x) trapping overhead and cannot be applied to procedures that are not true system calls.

Related Work › Bypass:  Trapping overhead is very small and can be performed on procedures that are not necessarily system calls.  But… can only be applied to dynamically- linked executables, and is not suitable as a security mechanism.

Related/Future Work › A complete remote execution system needs both methods:  The program owner provides a lightweight mechanism for creating a correct split execution environment.  The machine owner provides a heavyweight mechanism to defend itself from a (possibly) malicious program.

Complete System Shadow Kernel open, close, read, write, lseek Home Machine Agent Application Kernel all other calls Foreign Machine open, close read, write, lseek Sandbox

Future Work › Multiple agents applied to one application  How to select and invoke the correct agent action? › Signal handling  Flow of control is backwards. › Other implementations  Binary rewriting.  Build specialized linker that understands multiple definitions of symbols.

Further Questions? › Douglas Thain  › Miron Livny  › Bypass Web Page  › Questions now?