6.894: Distributed Operating System Engineering Lecturers: Frans Kaashoek Robert Morris TA: Jinyang Li
Operating System Software that turns silicon into something useful –Provides applications with a programming interface –Manages hardware resources on behalf of applications
Distributed Operating System The holy grail: transparency –provide applications with a virtual machine consisting of many processors distributed around the network. Distributed OS engineering is difficult: –Failures –High-degree of concurrency –Long latencies –New classes of security attacks
Client/Server Architecture A modular architecture to structure distributed systems –Clients request services from servers –Client and servers communicate with messages –Servers are typically trusted Other architectures –Peer-to-peer (decentralized) –Single address space
6.894 topics Client-server components –Remote procedure call, threads, address spaces, etc. Storage –File systems, transactions Security –Confidentiality, authentication, etc. Scalable servers
6.894 is an advanced Perform actual systems research –Perform a research project –Study recent research papers Design systems for real workloads –New abstractions, protocols, datastructures, algorithms, etc. Build a real system (lab) –Real enough that you can use it
Internet video-on-demand server Example to study issues and overview Requirements: –Low and high-quality video –Many users, spread around the Internet –Last mile bandwidth may be low –Access control
Client and server structure Client() { fd = connect(“server”); write (fd, “video.mpg”); while (!eof(fd)){ read (fd, buf); display (buf); } Server() { while (1) { cfd = accept(); read (cfd, name); fd = open (name); while (!eof(fd)) { read(fd, block); write (cfd, block); } close (cfd); close (fd); }}
Performance “analysis” Server capacity: –Network (100 Mbit/s) –Disk (20 Mbyte/s) Obtained performance: one client stream Server is limited by software structure If a video is 200 Kbit/s, server should be able to support more than one client.
Better single-server performance Goal: run at server’s hardware speed –Disk or network should be bottleneck Method: –Pipeline blocks of each request –Multiplex requests from multiple clients Two implementation approaches: –Multithreaded server –Asynchronous I/O
Multithreaded server server() { while (1) { cfd = accept(); read (cfd, name); fd = open (name); while (!eof(fd)) { read(fd, block); write (cfd, block); } close (cfd); close (fd); }} for (i = 0; i < 10; i++) fork (server); When waiting for I/O, thread scheduler runs another thread All shared data must protected by locks Release locks when blocking
Asynchronous I/O struct callback { bool (*is_ready)(); void (*cb)(arg); void *arg; } main() { while (1) { for (c = each callback) { if (c->is_ready()) c->handler(c->arg); } Code is structured as a collection of handlers Handlers are nonblocking Create new handlers for blocking operations When operation completes, call handler
Asychronous server init() { on_accept(accept_cb); } accept_cb() { on_readable(cfd,name_cb); } on_readable(fd, fn) { c = new callback(test_readable, fn, fd); add c to callback list; } name_cb(cfd) { read(cfd,name); fd = open(name); on_readable(fd, read_cb); } read_cb(cfd, fd) { read(fd, block); on_writeeable(fd, write_cb); } write_cb(cfd, fd) { write(cfd, block); on_readable(fd, read_cb); }
Multithreaded vs. Async Hard to program –Locking code –Need to know what blocks Coordination explicit State stored on thread’s stack –Memory allocation implicit Context switch may be expensive Multiprocessors Hard to program –Callback code –Need to know what blocks Coordination implicit State passed around explicitly –Memory allocation explicit Lightweight context switch Uniprocessors
Coordination example Threaded server: –Thread for network interface –Interrupt wakes up network thread –Protected (locks and conditional variables) shared buffer shared between server threads and network thread Asynchronous I/O –Poll for packets How often to poll? –Or, interrupt generates an event Be careful: disable interrupts when manipulating callback queue.
Scheduling: polling vs. interrupts Maintain peak performance under heavy load –Interrupts model can lead to livelock Solution: –Use interrupts under low load (good latency) –Use polling under heavy load (good throughput) Polling is typically more efficient than interrupts –Fits naturally into asynchronous I/O model
Other design issues Disk scheduling –Elevator algorithm Memory management –File system buffer cache Address spaces (VM management) –Fault isolate different servers Efficient local communication? Efficient transfers between disk and networks –Avoid copies
More than one processor Problem: single machine may not scale to enough clients Solutions: –Multiprocessors Helps when CPU is bottleneck –Server clusters Helps when bandwidth between server and backbone is high –Distributed server clusters Helps when bandwidth between client and distant server is low
Clusters Naming transparency –Server cluster transparent to client? Server selection –Metrics: CPU load, presence of data Consistency –Partition data Availability –More processors can decrease reliability –Replicate data (makes consistency more difficult)
Distributed clusters Replication policies Data distribution Consistency Network monitoring and modeling Global load balancing Tradeoff between accuracy, latency, and network load
Making it secure: access control Redo design: don’t add on –Firewalls: insecure and break many things CPU cycles is an issue –A secure HTTP server can do about connections a second Pulls in other global issues –Name to key binding –Key management infrastructure
Example summary Pipelining of disk and network requests –Need a lot of sophisticated software infrastructure Replication for reliability and performance –Need sophisticated protocols Difficult: We did it for one application –What if data changes rapidly? –Lack of abstractions!
6.894 lab: real systems Multi-finger (due next week) –Asynchronous I/O HTTP proxy –High-performance proxy –Cache, consistency, etc. Open-ended file system project –Research