Outline Announcement Midterm Review

Slides:



Advertisements
Similar presentations
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Distributed System Architectures.
Advertisements

Synchronization Chapter clock synchronization * 5.2 logical clocks * 5.3 global state * 5.4 election algorithm * 5.5 mutual exclusion * 5.6 distributed.
Definition of a Distributed System (1) A distributed system is: A collection of independent computers that appears to its users as a single coherent system.
CS 582 / CMPE 481 Distributed Systems
Ordering and Consistent Cuts Presented By Biswanath Panda.
20101 Synchronization in distributed systems A collection of independent computers that appears to its users as a single coherent system.
1 Distributed Process Management: Distributed Global States and Distributed Mutual Exclusion.
Lecture 12 Synchronization. EECE 411: Design of Distributed Software Applications Summary so far … A distributed system is: a collection of independent.
Computer Science Lecture 10, page 1 CS677: Distributed OS Last Class: Clock Synchronization Physical clocks Clock synchronization algorithms –Cristian’s.
Chapter 5.
CIS 720 Distributed algorithms. “Paint on the forehead” problem Each of you can see other’s forehead but not your own. I announce “some of you have paint.
© Oxford University Press 2011 DISTRIBUTED COMPUTING Sunita Mahajan Sunita Mahajan, Principal, Institute of Computer Science, MET League of Colleges, Mumbai.
Dionicio D. Gante, Genevev G. Reyes & Vanylive T. Galima DDistributed Operating Systems.
Jozef Goetz, Application Layer PART VI Jozef Goetz, Position of application layer The application layer enables the user, whether human.
CS425 /CSE424/ECE428 – Distributed Systems – Fall 2011 Material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra, K. Nahrstedt, N. Vaidya.
OS2- Sem ; R. Jalili Introduction Chapter 1.
Kyung Hee University 1/41 Introduction Chapter 1.
Presenter: Long Ma Advisor: Dr. Zhang 4.5 DISTRIBUTED MUTUAL EXCLUSION.
OS2- Sem1-83; R. Jalili Introduction Chapter 1. OS2- Sem1-83; R. Jalili Definition of a Distributed System (1) A distributed system is: A collection of.
Distributed Systems: Principles and Paradigms By Andrew S. Tanenbaum and Maarten van Steen.
Definition of a Distributed System (1) A distributed system is: A collection of independent computers that appears to its users as a single coherent system.
CIS825 Lecture 2. Model Processors Communication medium.
D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University.
Introduction Chapter 1. Definition of a Distributed System (1) A distributed system is: A collection of independent computers that appears to its users.
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
Distributed systems. distributed systems and protocols distributed systems: use components located at networked computers use message-passing to coordinate.
TEXT: Distributed Operating systems A. S. Tanenbaum Papers oriented on: 1.OS Structures 2.Shared Memory Systems 3.Advanced Topics in Communications 4.Distributed.
Process Synchronization Presentation 2 Group A4: Sean Hudson, Syeda Taib, Manasi Kapadia.
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
Distributed Operating Systems Spring 2004
Last Class: Introduction
Synchronization: Distributed Deadlock Detection
Distributed Shared Memory
Distributed Operating Systems
Definition of Distributed System
The Echo Algorithm The echo algorithm can be used to collect and disperse information in a distributed system It was originally designed for learning network.
Last Class: RPCs and RMI
CHAPTER 3 Architectures for Distributed Systems
Distributed Mutex EE324 Lecture 11.
Advanced Operating Systems
Outline Distributed Mutual Exclusion Distributed Deadlock Detection
COT 5611 Operating Systems Design Principles Spring 2012
Distributed Systems CS
Outline Communication Primitives - continued Theoretical Foundations
Outline Midterm results summary Distributed file systems – continued
Process-to-Process Delivery:
Outline Theoretical Foundations - continued Vector clocks - review
Distributed Mutual Exclusion
Logical Clocks and Casual Ordering
Outline Theoretical Foundations - continued Lab 1
Distributed Algorithms
Chapter 15 – Part 2 Networks The Internal Operating System
Outline Distributed Mutual Exclusion Introduction Performance measures
Event Ordering.
Outline Theoretical Foundations
Concurrency: Mutual Exclusion and Process Synchronization
CSE 486/586 Distributed Systems Mutual Exclusion
Distributed Systems CS
Chapter 15: File System Internals
Outline Review of Classical Operating Systems - continued
Outline Theoretical Foundations - continued
Distributed Systems CS
Introduction Chapter 1.
Process-to-Process Delivery: UDP, TCP
Distributed algorithms
Distributed Mutual eXclusion
CSE 486/586 Distributed Systems Mutual Exclusion
COT 5611 Operating Systems Design Principles Spring 2014
Outline Theoretical Foundations
Presentation transcript:

Outline Announcement Midterm Review Distributed File Systems – continued If we have time 12/3/2018 COP5611

Announcements Please turn in your homework #3 at the beginning of class The midterm will be on March 20 This coming Thursday It will be an open-book, open-note exam 12/3/2018 COP5611

Operating System An operating system is a layer of software on a bare machine that performs two basic functions Resource management To manage resources so that they are used in an efficient and fair manner User friendliness 12/3/2018 COP5611

Distributed Systems A distributed system is a collection of independent computers that appears to its users as a single coherent system Independent computers mean that they do not share memory or clock The computers communicate with each other by exchanging messages over a communication network 12/3/2018 COP5611

Distributed Systems – cont. 12/3/2018 COP5611

Distributed Systems – cont. Advantages The computing power of a group of cheap workstations can be enormous Decisive price/performance advantage over traditional time-sharing systems Resource sharing Enhanced performance Improved reliability and availability Modular expandability 12/3/2018 COP5611

Distributed System Architecture – cont. Distributed systems are often classified based on the hardware Multiprocessor systems Homogenous multi-computer systems Heterogeneous multi-computer systems 12/3/2018 COP5611

Distributed Operating Systems Hardware for distributed systems is important, but the software largely determines what a distributed system looks like to a user Distributed operating systems are much like the traditional operating systems Resource management User friendliness The key concept is transparency 12/3/2018 COP5611

Distributed Operating Systems – cont. In a truly distributed operating system, the user views the system as a virtual uniprocessor system even though physically it consists of multiple computers In other words, the use of multiple computers and accessing remote data and resources should be invisible to the user 12/3/2018 COP5611

Overview of Different Kinds of Distributed Systems Description Main Goal DOS Tightly-coupled operating system for multi-processors and homogeneous multicomputers Hide and manage hardware resources NOS Loosely-coupled operating system for heterogeneous multicomputers (LAN and WAN) Offer local services to remote clients Middleware Additional layer atop of NOS implementing general-purpose services Provide distribution transparency 12/3/2018 COP5611

Multicomputer Operating Systems General structure of a multicomputer operating system 12/3/2018 COP5611

Network Operating System 1-19 12/3/2018 COP5611

Middleware and Openness 1.23 In an open middleware-based distributed system, the protocols used by each middleware layer should be the same, as well as the interfaces they offer to applications. 12/3/2018 COP5611

Comparison Between Systems Item Distributed OS Network OS Middleware-based OS Multiproc. Multicomp. Degree of transparency Very High High Low Same OS on all nodes Yes No Number of copies of OS 1 N Basis for communication Shared memory Messages Files Model specific Resource management Global, central Global, distributed Per node Scalability Moderately Varies Openness Closed Open 12/3/2018 COP5611

Issues in Distributed Operating Systems Absence of global knowledge In a distributed system, due to the unavailability of a global memory and a global clock and due to unpredictable message delays, it is practically impossible to for a computer to collect up-to-date information about the global state of the distributed system Therefore a fundamental problem is to develop efficient techniques to implement a decentralized system wide control Another problem is how to order all the events 12/3/2018 COP5611

Issues in Distributed Operating Systems – cont. Naming Plays an important role in achieving location transparency A name service maps a logical name into a physical address by making use of a table lookup, an algorithm, or a combination of both In distributed systems, the tables may be replicated and stored at many places Consider naming in a distributed file system 12/3/2018 COP5611

Issues in Distributed Operating Systems – cont. Scalability Systems generally grow with time, especially distributed systems Scalability requires that the growth should not result in system unavailability or degraded performance This puts additional constraints on design approaches 12/3/2018 COP5611

Issues in Distributed Operating Systems – cont. Compatibility Refers to the interoperability among the resources in a system Three different levels Binary level All processors execute the same binary instruction repertoire Virtual binary level Execution level Same source code can be compiled and executed properly Protocol level A common set of protocols 12/3/2018 COP5611

Issues in Distributed Operating Systems – cont. Process synchronization The synchronization of processes in distributed systems is difficult because of the unavailability of shared memory It needs to synchronize processes running on different computers when they try to concurrently access a shared resource This is the mutual exclusion problem as in classical operating systems 12/3/2018 COP5611

Issues in Distributed Operating Systems – cont. Resource management Resource management needs to make both local and remote resources available to uses in an effective manner Data migration Distributed file system Distributed shared memory Computation migration Remote procedure call Distributed scheduling 12/3/2018 COP5611

Issues in Distributed Operating Systems – cont. Structuring The distributed operating system requires some additional constraints on the structure of the underlying operating system The collective kernel structure An operating system is structured as a collection of processes that are largely independent of each other Object-oriented operating system The operating system’s services are implemented as objects 12/3/2018 COP5611

Clients and Servers General interaction between a client and a server. 12/3/2018 COP5611

Layered Protocols Layers, interfaces, and protocols in the OSI model. 12/3/2018 COP5611

Network Layer The primary task of a network layer is routing The most widely used network protocol is the connection-less IP (Internet Protocol) Each IP packet is routed to its destination independent of all others A connection-oriented protocol is gaining popularity Virtual channel in ATM networks 12/3/2018 COP5611

Transport Layer This layer is the last part of a basic network protocol stack In other words, this layer can be used by application developers An important aspect of this layer is to provide end-to-end communication The Internet transport protocol is called TCP (Transmission Control Protocol) The Internet protocol also supports a connectionless transport protocol called UDP (Universal Datagram Protocol) 12/3/2018 COP5611

Sockets Socket primitives for TCP/IP. Primitive Meaning Socket Create a new communication endpoint Bind Attach a local address to a socket Listen Announce willingness to accept connections Accept Block caller until a connection request arrives Connect Actively attempt to establish a connection Send Send some data over the connection Receive Receive some data over the connection Close Release the connection 12/3/2018 COP5611

Sockets – cont. Connection-oriented communication pattern using sockets. 12/3/2018 COP5611

Socket Programming Review Server Design Issues IP TCP UDP Port Iterative vs. concurrent server Stateless vs. stateful server Multithreaded server 12/3/2018 COP5611

A Multithreaded Server 12/3/2018 COP5611

The Message Passing Model The message passing model provides two basic communication primitives Send and receive Send has two logical parameters, a message and its destination Receive has two logical parameters, the source and a buffer for storing the message 12/3/2018 COP5611

Semantics of Send and Receive Primitives There are several design issues regarding send and receive primitives Buffered or un-buffered Blocking vs. non-blocking primitives With blocking primitives, the send does not return control until the message has been sent or received and the receive does not return control until a message is copied to the buffer With non-blocking primitives, the send returns control as the message is copied and the receive signals its intention to receive a message and provide a buffer for it 12/3/2018 COP5611

Semantics of Send and Receive Primitives – cont. Synchronous vs. asynchronous primitives With synchronous primitives, a SEND primitive is blocked until a corresponding RECEIVE primitive is executed With asynchronous primitives, a SEND primitive does not block if there is no corresponding execution of a RECEIVE primitive The messages are buffered 12/3/2018 COP5611

Remote Procedure Call RPC is designed to hide all the details from programmers Overcome the difficulties with message-passing model It extends the conventional local procedure calls to calling procedures on remote computers 12/3/2018 COP5611

Steps of a Remote Procedure Call – cont. 12/3/2018 COP5611

Remote Procedure Call – cont. Design issues Structure Mostly based on stub procedures Binding Through a binding server The client specifies the machine and service required Parameter and result passing Representation issues By value and by reference 12/3/2018 COP5611

Remote Object Invocation Extend RPC principles to objects The key feature of an object is that it encapsulates data (called state) and the operations on those data (called methods) Methods are made available through an interface The separation between interfaces and the objects implementing these interfaces allows us to place an interface at one machine, while the object itself resides on another machine 12/3/2018 COP5611

Distributed Objects Common organization of a remote object with client-side proxy. 12/3/2018 COP5611

Inherent Limitations of a Distributed System Absence of a global clock In a centralized system, time is unambiguous In a distributed system, there exists no system wide common clock In other words, the notion of global time does not exist Impact of the absence of global time Difficult to reason about temporal order of events Makes it harder to collect up-to-date information on the state of the entire system 12/3/2018 COP5611

Inherent Limitations of a Distributed System Absence of shared memory An up-to-date state of the entire system is not available to any individual process This information, however, is necessary to reason about the system’s behavior, debugging, recovering from failures 12/3/2018 COP5611

Lamport’s Logical Clocks For a wide of algorithms, what matters is the internal consistency of clocks, not whether they are close to the real time For these algorithms, the clocks are often called logical locks Lamport proposed a scheme to order events in a distributed system using logical clocks 12/3/2018 COP5611

Lamport’s Logical Clocks – cont. Definitions Happened before relation Happened before relation () captures the causal dependencies between events It is defined as follows a  b, if a and b are events in the same process and a occurred before b. a  b, if a is the event of sending a message m in a process and b is the event of receipt of the same message m by another process If a  b and b  c, then a  c, i.e., “” is transitive 12/3/2018 COP5611

Lamport’s Logical Clocks – cont. Definitions – continued Causally related events Event a causally affects event b if a  b Concurrent events Two distinct events a and b are said to be concurrent (denoted by a || b) if a  b and b  a For any two events, either a  b, b  a, or a || b 12/3/2018 COP5611

Lamport’s Logical Clocks – cont. Implementation rules [IR1] Clock Ci is incremented between any two successive events in process Pi Ci := Ci + d ( d > 0) [IR2] If event a is the sending of message m by process Pi, then message m is assigned a timestamp tm = Ci(a). On receiving the same message m by process Pj, Cj is set to Cj := max(Cj, tm + d) 12/3/2018 COP5611

An Example 12/3/2018 COP5611

Total Ordering Using Lamport’s Clocks If a is any event at process Pi and b is any event at process Pj, then a => b if and only if either Where is any arbitrary relation that totally orders the processes to break ties 12/3/2018 COP5611

A Limitation of Lamport’s Clocks In Lamport’s system of logical clocks If a  b, then C(a) < C(b) The reverse if not necessarily true if the events have occurred on different processes 12/3/2018 COP5611

A Limitation of Lamport’s Clocks 12/3/2018 COP5611

Vector Clocks Implementation rules [IR1] Clock Ci is incremented between any two successive events in process Pi Ci[i] := Ci[i] + d ( d > 0) [IR2] If event a is the sending of message m by process Pi, then message m is assigned a timestamp tm = Ci(a). On receiving the same message m by process Pj, Cj is set to Cj[k] := max(Cj[k], tm[k]) 12/3/2018 COP5611

Vector Clocks – cont. 12/3/2018 COP5611

Vector Clocks – cont. Assertion At any instant, Events a and b are casually related if ta < tb or tb < ta. Otherwise, these events are concurrent In a system of vector clocks, 12/3/2018 COP5611

Causal Ordering of Messages The causal ordering of messages tries to maintain the same causal relationship that holds among “message send” events with the corresponding “message receive” events In other words, if Send(M1) -> Send(M2), then Receive(M1) -> Receive(M2) This is different from causal ordering of events 12/3/2018 COP5611

Causal Ordering of Messages – cont. 12/3/2018 COP5611

Causal Ordering of Messages – cont. The basic idea It is very simple Deliver a message only when no causality constraints are violated Otherwise, the message is not delivered immediately but is buffered until all the preceding messages are delivered 12/3/2018 COP5611

Birman-Schiper-Stephenson Protocol 12/3/2018 COP5611

Schiper-Eggli-Sando Protocol 12/3/2018 COP5611

Schiper-Eggli-Sando Protocol – cont. 12/3/2018 COP5611

Schiper-Eggli-Sando Protocol – cont. 12/3/2018 COP5611

Local State Local state More notations For a site Si, its local state at a given time is defined by the local context of the distributed application, denoted by LSi. More notations mij denotes a message sent by Si to Sj send(mij) and rec(mij) denote the corresponding sending and receiving event. 12/3/2018 COP5611

Definitions – cont. 12/3/2018 COP5611

Definitions – cont. 12/3/2018 COP5611

Global State – cont. 12/3/2018 COP5611

Definitions – cont. Strongly consistent global state: A global state is strongly consistent if it is consistent and transitless 12/3/2018 COP5611

Global State – cont. 12/3/2018 COP5611

Chandy-Lamport’s Global State Recording Algorithm 12/3/2018 COP5611

Cuts of a Distributed Computation A cut is a graphical representation of a global state A consistent cut is a graphical representation of a consistent global state Definition A cut of a distributed computation is a set C={c1, c2, ...., cn}, where ci is a cut event at site Si in the history of the distributed computation 12/3/2018 COP5611

Cuts of a Distributed Computation – cont. 12/3/2018 COP5611

Cuts of a Distributed Computation – cont. 12/3/2018 COP5611

Cuts of a Distributed Computation – cont. 12/3/2018 COP5611

Cuts of a Distributed Computation – cont. 12/3/2018 COP5611

Cuts of a Distributed Computation – cont. 12/3/2018 COP5611

The Critical Section Problem When processes (centralized or distributed) interact through shared resources, the integrity of the resources may be violated if the accesses are not coordinated The resources may not record all the changes A process may obtain inconsistent values The final state of the shared resource may be inconsistent 12/3/2018 COP5611

Mutual Exclusion One solution to the problem is that at any time at most only one process can access the shared resources This solution is known as mutual exclusion A critical section is a code segment in a process which shared resources are accessed A process can have more than one critical section There are problems which involve shared resources where mutual exclusion is not the optimal solution 12/3/2018 COP5611

The Structure of Processes Structure of process Pi repeat entry section critical section exit section reminder section until false; 12/3/2018 COP5611

Requirements of Mutual Exclusion Algorithms Freedom from deadlocks Two or more sites should not endlessly wait for messages Freedom from starvation A site would wait indefinitely to execute its critical section Fairness Requests are executed in the order based on logical clocks Fault tolerant It continues to work when some failures occur 12/3/2018 COP5611

Performance Measure for Distributed Mutual Exclusion The number of messages per CS invocation Synchronization delay The time required after a site leaves the CS and before the next site enters the CS System throughput 1/(sd+E), where sd is the synchronization delay and E the average CS execution time Response time The time interval a request waits for its CS execution to be over after its request messages have been sent out 12/3/2018 COP5611

Performance Measure for Distributed Mutual Exclusion 12/3/2018 COP5611

A Centralized Algorithm It is a simple solution One site, called the control site, is responsible for granting permission to the CS execution To request the CS, a site sends a REQUEST message to the control site When a site is done with CS execution, it sends a RELEASE message to the control site The control site queues up the requests for the CS and grant them permission 12/3/2018 COP5611

Distributed Solutions Non-token-based algorithms Use timestamps to order requests and resolve conflicts between simultaneous requests Lamport’s algorithm and Ricart-Agrawala Algorithm Token-based algorithms A unique token is shared among the sites A site is allowed to enter the CS if it possess the token and continues to hold the token until its CS execution is over; then it passes the token to the next site 12/3/2018 COP5611

Lamport’s Distributed Mutual Exclusion Algorithm This algorithm is based on the total ordering using Lamport’s clocks Each process keeps a Lamport’s logical clock Each process is associated with a unique id that can be used to break the ties In the algorithm, each process keeps a queue, request_queuei, which contains mutual exclusion requests ordered by their timestamp and associated id Ri of each process consists of all the processes The communication channel is assumed to be FIFO 12/3/2018 COP5611

Lamport’s Distributed Mutual Exclusion Algorithm – cont. 12/3/2018 COP5611

Lamport’s Distributed Mutual Exclusion Algorithm – cont. 12/3/2018 COP5611

Ricart-Agrawala Algorithm 12/3/2018 COP5611

A Simple Toke Ring Algorithm When the ring is initialized, one process is given the token The token circulates around the ring It is passed from k to k+1 (modulo the ring size) When a process acquires the token from its neighbor, it checks to see if it is waiting to enter its critical section If so, it enters its CS When exiting from its CS, it passes the token to the next Otherwise, it passes the token to the next 12/3/2018 COP5611

Suzuki-Kasami’s Algorithm Data structures Each site maintains a vector consisting the largest sequence number received so far from other sites The token consists of a queue of requesting sites and an array of integers, consisting of the sequence number of the request that a site executed most recently 12/3/2018 COP5611

Suzuki-Kasami’s Algorithm – cont. 12/3/2018 COP5611

Distributed Deadlock Detection In distributed systems, the system state can be represented by a wait-for graph (WFG) In WFG, nodes are processes and there is a directed edge from node P1 to node P2 if P1 is blocked and is waiting for P2 to release some resource The system is deadlocked if there is a directed cycle or knot in its WFG The problem is how to maintain the WFG and detect cycle/knot in the graph 12/3/2018 COP5611

Distributed Deadlock Detection – cont. Centralized detection algorithms Distributed deadlock algorithms Path-pushing Edge-chasing Diffusion computation Global state detection You need to know the basic ideas but not the details about those algorithms 12/3/2018 COP5611

Agreement Protocols In distributed systems, sites are often required to reach mutual agreement In distributed database systems, data managers must agree on whether to commit or to abort a transaction Reaching an agreement requires the sites have knowledge about values at other sites Agreement when the system is free from failures Agreement when the system is prone to failure 12/3/2018 COP5611

Agreement Problems There are three well known agreement problems Byzantine agreement problem Consensus problem Interactive consistency problem 12/3/2018 COP5611

Lamport-Shostak-Pease Algorithm 12/3/2018 COP5611

Lamport-Shostak-Pease Algorithm – cont. 12/3/2018 COP5611