Lecture 7 Data distribution Epidemic protocols. EECE 411: Design of Distributed Software Applications Epidemic algorithms: Basic Idea Idea Update operations.

Slides:



Advertisements
Similar presentations
CS 542: Topics in Distributed Systems Diganta Goswami.
Advertisements

Gossip Algorithms and Implementing a Cluster/Grid Information service MsSys Course Amar Lior and Barak Amnon.
Feb 7, 2001CSCI {4,6}900: Ubiquitous Computing1 Announcements.
Computer Systems/Operating Systems - Class 8
Computer Science Lecture 16, page 1 CS677: Distributed OS Last Class: Web Caching Use web caching as an illustrative example Distribution protocols –Invalidate.
EECE 411: Design of Distributed Software Applications Lecture 6 [Last time] Distributed object systems Java RMI Assignment 2 Garbage collection Data distribution.
OS Fall ’ 02 Introduction Operating Systems Fall 2002.
Threads Clients Servers Code Migration Software Agents Summary
3.5 Interprocess Communication Many operating systems provide mechanisms for interprocess communication (IPC) –Processes must communicate with one another.
Active Messages: a Mechanism for Integrated Communication and Computation von Eicken et. al. Brian Kazian CS258 Spring 2008.
3.5 Interprocess Communication
OS Spring’03 Introduction Operating Systems Spring 2003.
Chapter 1 and 2 Computer System and Operating System Overview
6/27/2015Page 1 This presentation is based on WS-Membership: Failure Management in Web Services World B. Ramamurthy Based on Paper by Werner Vogels and.
A. Frank - P. Weisberg Operating Systems Introduction to Tasks/Threads.
Lecture 12 Synchronization. EECE 411: Design of Distributed Software Applications Summary so far … A distributed system is: a collection of independent.
1 Distributed Systems: Distributed Process Management – Process Migration.
Lecture 8 Epidemic communication, Server implementation.
Multicast Communication Multicast is the delivery of a message to a group of receivers simultaneously in a single transmission from the source – The source.
Epidemic Algorithms for replicated Database maintenance Alan Demers et al Xerox Palo Alto Research Center, PODC 87 Presented by: Harshit Dokania.
Communication (II) Chapter 4
1 6.4 Distribution Protocols Different ways of propagating/distributing updates to replicas, independent of the consistency model. First design issue.
1 Chapter Client-Server Interaction. 2 Functionality  Transport layer and layers below  Basic communication  Reliability  Application layer.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.
Lecture 7 Data distribution Multicast Epidemic protocols.
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
Distributed Database Systems Overview
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Gossiping Steve Ko Computer Sciences and Engineering University at Buffalo.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Distributed Systems Principles and Paradigms Chapter 03 Processes 00 – 1.
Replication (1). Topics r Why Replication? r System Model r Consistency Models – How do we reason about the consistency of the “global state”? m Data-centric.
 Communication Distributed Systems IT332. Outline  Fundamentals  Layered network communication protocols  Types of communication  Remote Procedure.
Lecture 5: Threads process as a unit of scheduling and a unit of resource allocation processes vs. threads what to program with threads why use threads.
1 Client-Server Interaction. 2 Functionality Transport layer and layers below –Basic communication –Reliability Application layer –Abstractions Files.
1 Computer Systems II Introduction to Processes. 2 First Two Major Computer System Evolution Steps Led to the idea of multiprogramming (multiple concurrent.
Department of Computer Science and Software Engineering
Acknowledgement: These slides are adapted from slides provided in Thißen & Spaniol's course Distributed Systems and Middleware, RWTH Aachen Processes Distributed.
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.
Group Communication Theresa Nguyen ICS243f Spring 2001.
Processes. Process Concept Process Scheduling Operations on Processes Interprocess Communication Communication in Client-Server Systems.
EEC 688/788 Secure and Dependable Computing Lecture 9 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Processes and Threads MICROSOFT.  Process  Process Model  Process Creation  Process Termination  Process States  Implementation of Processes  Thread.
Operating Systems Unit 2: – Process Context switch Interrupt Interprocess communication – Thread Thread models Operating Systems.
CSE 486/586 CSE 486/586 Distributed Systems Gossiping Steve Ko Computer Sciences and Engineering University at Buffalo.
Multithreading vs. Event Driven in Code Development of High Performance Servers.
CSCI/CMPE 4334 Operating Systems Review: Exam 1 1.
1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
CSE 486/586 Distributed Systems Gossiping
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Distributed Systems – Paxos
湖南大学-信息科学与工程学院-计算机与科学系
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT -Sumanth Kandagatla Instructor: Prof. Yanqing Zhang Advanced Operating Systems (CSC 8320)
Providing Secure Storage on the Internet
EECS 498 Introduction to Distributed Systems Fall 2017
EEC 688/788 Secure and Dependable Computing
Fast Communication and User Level Parallelism
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Prof. Leonardo Mostarda University of Camerino
Replica Placement Model: We consider objects (and don’t worry whether they contain just data or code, or both) Distinguish different processes: A process.
B. Ramamurthy Based on Paper by Werner Vogels and Chris Re
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Presentation transcript:

Lecture 7 Data distribution Epidemic protocols

EECE 411: Design of Distributed Software Applications Epidemic algorithms: Basic Idea Idea Update operations are initially performed at one node A node passes its updated state to a limited number of ‘peers’; which, in-turn, pass the update to other peers Eventually, each update will reach every node Update propagation is lazy, i.e., not immediate [Assumption: there are no write–write conflicts]

EECE 411: Design of Distributed Software Applications Preventing an incident like the Amazon S3 incident Verify message and state correctness – all kind of corruption errors may occur Add checksums to detect corruption of system state messages Verify invariants before processing state Engineer protocols to control the amount of messages they generate. Add rate limiters. Put additional monitoring and alarming for gossip rates and failures Have an emergency procedure to restore clear state in your system may be the solution of last resort. Make it work quickly.

EECE 411: Design of Distributed Software Applications Epidemic algorithms: Principle Basic idea: A node passes its updated state to a limited number of other peers (generally randomly chosen); these peers, in-turn, pass the update to other peers Update propagation is lazy, i.e., not immediate Eventually, each update should reach every node Anti-entropy: Each node regularly chooses another node at random, and exchanges state differences, leading to identical states at both afterwards [Variation] Gossiping: A replica which has just been updated (i.e., has been contaminated), tells a number of other replicas about its update (contaminating them as well). Advantages: reliability, asynchronous, autonomous nodes

EECE 411: Design of Distributed Software Applications Anti-Entropy Protocols Each node P selects another node Q from the system at random. Push: P only sends its updates to Q Pull: P only retrieves updates from Q Push-Pull: P and Q exchange mutual updates (after which they hold the same information).

EECE 411: Design of Distributed Software Applications Anti-entropy – Push and Pull Push Pull Susceptible (clean) node Infected node Rumor

EECE 411: Design of Distributed Software Applications Anti-Entropy Protocols Each node P selects another node Q from the system at random. Push: P only sends its updates to Q Pull: P only retrieves updates from Q Push-Pull: P and Q exchange mutual updates (after which they hold the same information). Observation: for push-pull it takes O(log(N)) rounds to disseminate updates to all N nodes one round = each node takes the initiative to start one exchange. Main properties: Reliability: a node failures do not impact the protocol Dissemination time & effort, scales well with the number of nodes

EECE 411: Design of Distributed Software Applications Gossiping Basic model: A node S that is ‘infected’ (i.e., having an update to report), contacts other randomly chosen nodes and ‘infects’ them Newly infected nodes proceed similarly Termination decision: If the contacted node already has the update S stops contacting other nodes with probability 1 / k. P the share of nodes that have not been reached P = e -(k+1)(1-p) KP 120.0% 26.0% 40.7% ln(P)

EECE 411: Design of Distributed Software Applications Deletion and Death Certificates Absence of item does not spread; On the contrary, it can get resurrected! Use of death certificates (DCs) – when a node receives a DC, old copy of data is deleted How long to maintain a DC? Simple strategy – hold DC for fixed amount of time

EECE 411: Design of Distributed Software Applications Example applications (I) Data dissemination: in p2p, wireless sensor networks, clusters Lots of scenarios Distributing updates: E.g., disconnected replicated list maintenance Demers et al., Epidemic algorithms for replicated database maintenance. SOSP’87Epidemic algorithms for replicated database maintenance Membership protocols: E.g., Amazon Dynamo service: DeCandia et. al, Dynamo: Amazon’s Highly Available Key- value Store, SOSP’07 Various p2p networks (e.g., Tribler)

EECE 411: Design of Distributed Software Applications Example applications (II) Data aggregation The problem: compute the average value for a large set of sensors Each sensor (node) maintains a variable x i. When two nodes gossip, they each reset their variable to x i, and x k ← (x i + x k )/2 Result: in the end each node will have computed the average avg = sum(x i) )/N.

EECE 411: Design of Distributed Software Applications Quiz-like questions Design an epidemic style protocol to calculate the number of sensors in a sensor network. Tradeoffs between a multicast overlay and an epidemic protocol.

EECE 411: Design of Distributed Software Applications Advantages of epidemic techniques Probabilistic model. Rigorous mathematical underpinnings. Good framework for reasoning about the spread of information through a system over time. Asynchronous communication pattern. Operate in a 'fire-and -forget' mode, where, even if the initial sender fails, surviving nodes will receive the update. Autonomous actions. Enable nodes to take actions based on the data received without the need for additional communication to reach agreement with partners; nodes can take decisions autonomously. Robust with respect to message loss & node failures. Once a message has been received by at least one of your peers it is almost impossible to prevent the spread of the information through the system.

EECE 411: Design of Distributed Software Applications Roadmap Recap the differences between processes and threads advantages/drawbacks for using one or the other Reasons why clients/servers in distributed applications may use multithreaded designs Tradeoffs between multi-threaded / single threaded / finite- state machine designs for servers. Other client and server design issues

EECE 411: Design of Distributed Software Applications Context switching (I) Context for ‘context switching’: Processor level: The minimal collection of values stored in the registers of a processor used for the execution of a series of instructions (e.g., stack pointer, addressing registers, program counter). Thread level : The minimal collection of values stored in registers and memory, used for the execution of a series of instructions (i.e., processor context, state). Process level : The minimal collection of values stored in registers and memory, used for the execution of a thread (i.e., thread context, but now also at least MMU register values).

EECE 411: Design of Distributed Software Applications Threads vs. Processes: Context switching (II) Observation 1: Threads share the same address space. Thread context switching could be done entirely independent of the operating system. Observation 2: Process switching is generally more expensive as it involves getting the OS in the loop, i.e., trapping to the kernel. Observation 3: Creating and destroying threads is much cheaper than doing so for processes. Threading support could be implemented either by OS or at the process level Q: What are the tradeoffs?

EECE 411: Design of Distributed Software Applications Threads & distributed systems: Server side issues Multithreaded servers: Main issues are performance and structure. Improve performance: Starting a thread to handle an incoming request is much cheaper than starting a new process. Having a single-threaded server prohibits simply scaling the server to a multiprocessor system. As with clients: reduce latency by reacting to next request while previous one is being processed. Better structure: Most servers have high I/O load. Using simple, well-understood blocking calls may simplify the overall structure. Multithreaded programs tend to be smaller and easier to understand due to simplified flow of control.

EECE 411: Design of Distributed Software Applications How to handle incoming requests? (iteratively vs. concurrently) Why multiple threads can be a good idea? Multithreaded File Server Example

EECE 411: Design of Distributed Software Applications How to handle incoming requests? (iteratively vs. concurrently) Main Choices: Iterative vs. concurrent Blocking vs. non-blocking I/O [Concurrent server with blocking I/O]: Processes vs. threads. [Concurrent: non-blocking I/O] Finite state machine based design Event driven programming

EECE 411: Design of Distributed Software Applications Summary so far Client and server design: processes focus Sequential vs. concurrent, Concurrent: Processes vs. threads Concurrent: blocking vs. non-blocking IO