Distributed Systems 2006 Retrofitting Reliability* *With material adapted from Ken Birman.

Slides:



Advertisements
Similar presentations
Distributed Systems Major Design Issues Presented by: Christopher Hector CS8320 – Advanced Operating Systems Spring 2007 – Section 2.6 Presentation Dr.
Advertisements

System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
Chapter 6: Process Synchronization
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.
Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.
IPC in Distributed OSes
Distributed Systems 2006 Group Communication I * *With material adapted from Ken Birman.
Distributed Processing, Client/Server, and Clusters
Virtual Synchrony Jared Cantwell. Review Multicast Causal and total ordering Consistent Cuts Synchronized clocks Impossibility of consensus Distributed.
Virtual Synchrony Ki Suh Lee Some slides are borrowed from Ken, Jared (cs ) and Justin (cs )
Distributed Operating Systems CS551 Colorado State University at Lockheed-Martin Lecture 4 -- Spring 2001.
CS514: Intermediate Course in Operating Systems Professor Ken Birman Vivek Vishnumurthy: TA.
Group Communications Group communication: one source process sending a message to a group of processes: Destination is a group rather than a single process.
Other File Systems: LFS and NFS. 2 Log-Structured File Systems The trend: CPUs are faster, RAM & caches are bigger –So, a lot of reads do not require.
Reliable Distributed Systems Applications – Part II.
Overview Distributed vs. decentralized Why distributed databases
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Distributed Resource Management: Distributed Shared Memory
Distributed Systems 2006 Group Membership * *With material adapted from Ken Birman.
Distributed Systems 2006 Virtual Synchrony* *With material adapted from Ken Birman.
Ken Birman Cornell University. CS5410 Fall
Computer Science Lecture 16, page 1 CS677: Distributed OS Last Class:Consistency Semantics Consistency models –Data-centric consistency models –Client-centric.
Lecture 8 Epidemic communication, Server implementation.
Distributed Shared Memory Systems and Programming
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon.
Final Year Project Presentation by Daire O’Neill 4EE.
Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Distributed Shared Memory Steve Ko Computer Sciences and Engineering University at Buffalo.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.
Reliable Communication in the Presence of Failures Based on the paper by: Kenneth Birman and Thomas A. Joseph Cesar Talledo COEN 317 Fall 05.
Consistent and Efficient Database Replication based on Group Communication Bettina Kemme School of Computer Science McGill University, Montreal.
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
ECE200 – Computer Organization Chapter 9 – Multiprocessors.
Introduction to DFS. Distributed File Systems A file system whose clients, servers and storage devices are dispersed among the machines of a distributed.
Ch 10 Shared memory via message passing Problems –Explicit user action needed –Address spaces are distinct –Small Granularity of Transfer Distributed Shared.
Distributed Shared Memory Based on Reference paper: Distributed Shared Memory, Concepts and Systems.
Cache Coherence Protocols 1 Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet Şenvar.
Distributed Shared Memory Presentation by Deepthi Reddy.
Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.
Event Ordering Greg Bilodeau CS 5204 November 3, 2009.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
DISTRIBUTED COMPUTING
CS603 Basics of underlying platforms January 9, 2002.
Distributed Systems CS Consistency and Replication – Part I Lecture 10, September 30, 2013 Mohammad Hammoud.
Replication (1). Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.
D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University.
1 Chapter 9 Distributed Shared Memory. 2 Making the main memory of a cluster of computers look as though it is a single memory with a single address space.
The Client Server Model And Software Design
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.
Threads. Readings r Silberschatz et al : Chapter 4.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.
Distributed shared memory u motivation and the main idea u consistency models F strict and sequential F causal F PRAM and processor F weak and release.
Replication & Fault Tolerance CONARD JAMES B. FARAON
Distributed Shared Memory
The University of Adelaide, School of Computer Science
Lecture 18: Coherence and Synchronization
Ivy Eva Wu.
Chapter 5: Process Synchronization
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT -Sumanth Kandagatla Instructor: Prof. Yanqing Zhang Advanced Operating Systems (CSC 8320)
CS514: Intermediate Course in Operating Systems
Outline Midterm results summary Distributed file systems – continued
Lecture 2 Part 2 Process Synchronization
The University of Adelaide, School of Computer Science
Chapter 6: Synchronization Tools
CS514: Intermediate Course in Operating Systems
Distributed Resource Management: Distributed Shared Memory
Lecture 19: Coherence and Synchronization
The University of Adelaide, School of Computer Science
Presentation transcript:

Distributed Systems 2006 Retrofitting Reliability* *With material adapted from Ken Birman

Distributed Systems Plan Tracking group membership: We’ll base it on 2PC and 3PC Fault-tolerant multicast: We’ll use membership Ordered multicast: We’ll base it on fault-tolerant multicast Tools for solving practical replication and availability problems: we’ll base them on ordered multicast Robust Web Services: We’ll build them with these tools 2PC and 3PC: Our first “tools” (lowest layer)

Distributed Systems Distributed Shared Memory A new goal: software Distributed Shared Memory (DSM) –Looks like a memory-mapped file (cf. Linux mmap) Data is automatically replicated, so all distributed processes see identical memory content

Distributed Systems So what’s the model? Application “maps” a region of memory While running, it sometimes –Acquires a read or write lock for memory –Then for a period of time reads or writes some part of the DSM (some “pages”) –Then releases the lock (This is just our distributed replication model in a new form…)

Distributed Systems To implement this DSM… We need a way to –Implement the mapping –Detect that a page has become dirty –Invoke our communication primitives when a lock is requested or released Idea for Linux –Use the Linux mapped file primitives and build a DSM “daemon” to send updates –Intercept Linux semaphore operations for synchronization So, for us, it reduces to how to handle replication…

Distributed Systems DSM with a daemon DSMD Wrapper intercepts mmap and semaphore operations and redirects those associated with the shared memory region to the DSMD. We’ll assume that the developer comes up with a sensible convention for associating semaphores either with entire mapped regions, or with pages of them mmap creates shared memory regions. The DSMD will multicast the contents of a page when the associated semaphore lock is released. Properties of the multicast and of the locking “protocol” determine the DSM properties seen by the programmer. The programmer doesn’t use multicast directly

Distributed Systems Design choices? Must pick a memory coherency model, i.e., type of consistency for DSM Strong consistency –The DSM behaves like a single non-replicated memory Weak consistency –The DSM can be highly inconsistent –Updates propagate after an unspecified and possibly long delay, and copies of the mapped region may differ Release consistency –Locking for mutual exclusion; consistent as long as locking is used Causal consistency –If DSM accesses a  b, then b will observe the results of a

Distributed Systems Best choice? We should probably pick release consistency or causal consistency –Strong consistency very expensive Can implement using our protocol for replication via asynchronous cbcast –If we obtain read as well as write locks, we get causal consistency The updates end up totally ordered along mutual exclusion paths –The primitive is strong enough to maintain this delivery ordering at all copies

Distributed Systems False sharing Suppose multiple independent objects map to the same page but have distinct locks –One issue designer must worry about In a traditional, hardware DSM page ends up “ping- ponging” between the machines – leads to trashing –In our solution, this just won’t work because of overhead Our mechanism requires that there be one lock per unit of memory transmitted (cf. partial overwrite) Let application developers avoid false sharing

Distributed Systems Summary We have looked at ways of using reliability technologies in existing systems –Wrappers and toolkits –Wrapping RPC servers –Unbreakable stream connections –Reliable DSM This a.o. course skips security which is essential –Other course(s) handle this... –Read Chapter 22 of [Birman, 2005]