Nov 8, 2003 SMU School of Engineering Group 2: Hammad, Mann, Nidgundi, & Zhang CSE 8343 Advanced Operating Systems Slide 1 Instructor: Dr. Khalil Distributed.

Slides:

Advertisements

Similar presentations

Distributed Systems Major Design Issues Presented by: Christopher Hector CS8320 – Advanced Operating Systems Spring 2007 – Section 2.6 Presentation Dr.

Advertisements

CS3771 Today: deadlock detection and election algorithms  Previous class Event ordering in distributed systems Various approaches for Mutual Exclusion.

Deadlock Prevention, Avoidance, and Detection

CS425 /CSE424/ECE428 – Distributed Systems – Fall 2011 Material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra, K. Nahrstedt, N. Vaidya.

1 Chapter 5 Concurrency: Mutual Exclusion and Synchronization Principals of Concurrency Mutual Exclusion: Hardware Support Semaphores Readers/Writers Problem.

Concurrency: Deadlock and Starvation Chapter 6. Deadlock Permanent blocking of a set of processes that either compete for system resources or communicate.

Concurrency: Mutual Exclusion and Synchronization Chapter 5.

Lecture 8: Asynchronous Network Algorithms

Uncoordinated Checkpointing The Global State Recording Algorithm Cristian Solano.

Deadlocks in Distributed Systems Ryan Clemens, Thomas Levy, Daniel Salloum, Tagore Kolluru, Mike DeMauro.

Distributed Process Management

Chapter 18 Distributed Process Management Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design.

Chapter 18 Distributed Process Management Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles,

1 Concurrency: Mutual Exclusion and Synchronization Chapter 5.

Chapter 6 Concurrency: Deadlock and Starvation

Operating System Concepts with Java – 7 th Edition, Nov 15, 2006 Silberschatz, Galvin and Gagne ©2007 Deadlocks  (How to Detect Them and Avoid Them) A:

Computer Systems/Operating Systems - Class 8

Distributed Systems Dinesh Bhat - Advanced Systems (Some slides from 2009 class) CS 6410 – Fall 2010 Time Clocks and Ordering of events Distributed Snapshots.

Chapter 5 Concurrency: Mutual Exclusion and Synchronization Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee.

1 Complexity of Network Synchronization Raeda Naamnieh.

1 Concurrency: Mutual Exclusion and Synchronization Chapter 5.

CS 582 / CMPE 481 Distributed Systems

What we will cover…  Distributed Coordination 1-1.

Causality & Global States. P1 P2 P Physical Time 4 6 Include(obj1 ) obj1.method() P2 has obj1 Causality violation occurs when order.

Ordering and Consistent Cuts Presented By Biswanath Panda.

© nCode 2000 Title of Presentation goes here - go to Master Slide to edit - Slide 1 Reliable Communication for Highly Mobile Agents ECE 7995: Term Paper.

Distributed Process Management

Witawas Srisa-an Chapter 6

20101 Synchronization in distributed systems A collection of independent computers that appears to its users as a single coherent system.

CPSC 4650 Operating Systems Chapter 6 Deadlock and Starvation

OS Fall’02 Concurrency: Principles of Deadlock Operating Systems Fall 2002.

©Brooks/Cole, 2003 Chapter 7 Operating Systems Dr. Barnawi.

1 Concurrency: Deadlock and Starvation Chapter 6.

Ordering and Consistent Cuts Presented by Chi H. Ho.

EEC-681/781 Distributed Computing Systems Lecture 11 Wenbing Zhao Cleveland State University.

1 Distributed Process Management: Distributed Global States and Distributed Mutual Exclusion.

Distributed process management: Distributed deadlock

1 Distributed Systems: Distributed Process Management – Process Migration.

CIS 720 Distributed algorithms. “Paint on the forehead” problem Each of you can see other’s forehead but not your own. I announce “some of you have paint.

Concurrency: Deadlock and Starvation Chapter 6. Goal and approach Deadlock and starvation Underlying principles Solutions? –Prevention –Detection –Avoidance.

1 Concurrency: Deadlock and Starvation Chapter 6.

Concurrency: Mutual Exclusion and Synchronization Chapter 5.

November 22, 2003 SMU School of Engineering Group 2: Hammad, Mann, Nidgundi, & Zhang CSE 8343 Advanced Operating Systems Slide 1 Instructor: Dr. Khalil.

Transparent Process Migration: Design Alternatives and the Sprite Implementation Fred Douglis and John Ousterhout.

Operating Systems Distributed Coordination. Topics –Event Ordering –Mutual Exclusion –Atomicity –Concurrency Control Topics –Event Ordering –Mutual Exclusion.

1 Announcements The fixing the bug part of Lab 4’s assignment 2 is now considered extra credit. Comments for the code should be on the parts you wrote.

1 Distributed Process Management Chapter Distributed Global States Operating system cannot know the current state of all process in the distributed.

Concurrency: Mutual Exclusion and Synchronization Chapter 5.

1 Concurrency: Mutual Exclusion and Synchronization Chapter 5.

Chapter 5 Concurrency: Mutual Exclusion and Synchronization Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee.

CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.

Chapter 5 Concurrency: Mutual Exclusion and Synchronization Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee.

CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Mutual Exclusion & Leader Election Steve Ko Computer Sciences and Engineering University.

DEADLOCK DETECTION ALGORITHMS IN DISTRIBUTED SYSTEMS

Chapter 6 Concurrency: Deadlock and Starvation Operating Systems: Internals and Design Principles, 6/E William Stallings.

Chapter 5 Concurrency: Mutual Exclusion and Synchronization Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee.

Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.

CSE 486/586 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.

1 Chapter 11 Global Properties (Distributed Termination)

Chapter pages1 Distributed Process Management Chapter 14.

Mutual Exclusion Algorithms. Topics r Defining mutual exclusion r A centralized approach r A distributed approach r An approach assuming an organization.

CS3771 Today: Distributed Coordination  Previous class: Distributed File Systems Issues: Naming Strategies: Absolute Names, Mount Points (logical connection.

Process Synchronization Presentation 2 Group A4: Sean Hudson, Syeda Taib, Manasi Kapadia.

The Principles of Operating Systems Chapter 9 Distributed Process Management.

Lecture 6 Deadlock 1. Deadlock and Starvation Let S and Q be two semaphores initialized to 1 P 0 P 1 wait (S); wait (Q); wait (Q); wait (S);. signal (S);

Synchronization: Distributed Deadlock Detection

G.Anuradha Reference: William Stallings

Concurrency: Mutual Exclusion and Synchronization

Concurrency: Mutual Exclusion and Process Synchronization

Presentation transcript:

Nov 8, 2003 SMU School of Engineering Group 2: Hammad, Mann, Nidgundi, & Zhang CSE 8343 Advanced Operating Systems Slide 1 Instructor: Dr. Khalil Distributed Process Management Team Members: Mazen Hammad Chuck Mann Vrushali Nidgundi Hong Zhang Course: CSE 8343 Advanced Operating Systems Professor: Dr. Mohamed Khalil (Group 2)

Nov 8, 2003 SMU School of Engineering Group 2: Hammad, Mann, Nidgundi, & Zhang CSE 8343 Advanced Operating Systems Slide 2 Instructor: Dr. Khalil Distributed Process Management A Collection of processors that do not share memory or a clock. Distributed process management provides various mechanisms for:  Process synchronization and communication.  Dealing with the deadlock problem and the variety of failures that are not encountered in a centralized system. Overview:  Process Migration  Distributed Global States  Distributed Algorithms

Nov 8, 2003 SMU School of Engineering Group 2: Hammad, Mann, Nidgundi, & Zhang CSE 8343 Advanced Operating Systems Slide 3 Instructor: Dr. Khalil Process Migration The process is not always executed at the site in which it is initiated, the entire process or parts of it, maybe executed at different sites. Motivation: Load Balancing: Performance can be improved if the load is balanced. Communications Performance: Intensively communicating processes can be moved to one particular node. If a data analysis is performed on a file/files larger than the process size it may be good idea to move the process to the data area rather than the other way around.

Nov 8, 2003 SMU School of Engineering Group 2: Hammad, Mann, Nidgundi, & Zhang CSE 8343 Advanced Operating Systems Slide 4 Instructor: Dr. Khalil Availability: Long-running processes may need to move if the machine is going down. Utilizing special capabilities: A process can be moved to a particular node to benefit from a specialized hardware or software capability. Motivation (Continued)

Nov 8, 2003 SMU School of Engineering Group 2: Hammad, Mann, Nidgundi, & Zhang CSE 8343 Advanced Operating Systems Slide 5 Instructor: Dr. Khalil Initiation of Migration  Depends on the goal of migration If goal is load balancing, then some module in operating system responsible for monitoring will initiate the migration process. Module will preempt and signal the process migration. The module has to be in contact with peer modules on other systems to decide where to migrate the process to keep load balance. If the goal is to reach a particular resource, then a process may migrate itself, in this case process has to be aware of the distributed system. Where as in the first case the entire migration process is transparent.

Nov 8, 2003 SMU School of Engineering Group 2: Hammad, Mann, Nidgundi, & Zhang CSE 8343 Advanced Operating Systems Slide 6 Instructor: Dr. Khalil What is Migrated  Must destroy the process on the resource system and create it on the target system.  Process control block and any links must be moved.

Nov 8, 2003 SMU School of Engineering Group 2: Hammad, Mann, Nidgundi, & Zhang CSE 8343 Advanced Operating Systems Slide 7 Instructor: Dr. Khalil Example of Process Migration (Before/After)

Nov 8, 2003 SMU School of Engineering Group 2: Hammad, Mann, Nidgundi, & Zhang CSE 8343 Advanced Operating Systems Slide 8 Instructor: Dr. Khalil Migration Schemes  Eager (All): Transfer entire address space. No trace of process is left behind. If address space is large and if the process does not need most of it, then this approach my be unnecessarily expensive.  Pre-Copy : Process continues to execute on the source node while the address space is copied. pages modified on the source during pre-copy operation have to be copied a second time. Reduces the time that a process is frozen and cannot execute during migration.

Nov 8, 2003 SMU School of Engineering Group 2: Hammad, Mann, Nidgundi, & Zhang CSE 8343 Advanced Operating Systems Slide 9 Instructor: Dr. Khalil  Eager (Dirty) : Transfer only that portion of the address space that is in main memory and has been modified. Any additional blocks of the virtual address space are transferred on demand. The source machine is involved throughout the life of the process. Migration Schemes (Continued)

Nov 8, 2003 SMU School of Engineering Group 2: Hammad, Mann, Nidgundi, & Zhang CSE 8343 Advanced Operating Systems Slide 10 Instructor: Dr. Khalil Migration Schemes (Continued)  Copy-on-Reference: Pages are only brought over on reference. Variation of eager (dirty). Has lowest initial cost of process migration.  Flushing: Pages are cleared from main memory by flushing dirty pages to disk. Relives the source of holding any pages of the migrated process in main memory.

Nov 8, 2003 SMU School of Engineering Group 2: Hammad, Mann, Nidgundi, & Zhang CSE 8343 Advanced Operating Systems Slide 11 Instructor: Dr. Khalil Negotiation of Migration 1.Starter on the source system (S) decides a process P should be migrated to a target system (D). It sends a message to D starter for a transfer request. 2.If D ’ s starter is ready to accept the offer, it sends a positive response. 3.S ’ s starter communicates this message to S ’ s kernel. 4.Kernel of S then offers to send process P to machine D, the offer includes statistics about P (age, processor and communication loads).

Nov 8, 2003 SMU School of Engineering Group 2: Hammad, Mann, Nidgundi, & Zhang CSE 8343 Advanced Operating Systems Slide 12 Instructor: Dr. Khalil 5.Starters decision is communicated to D. 6. D reserves necessary resources to avoid deadlock and flow control, finally sends an acceptance offer. 7. If D is short of those resources described in the offer, it may reject the offer. Otherwise, kernel on the D relays the message to the controlling starter. The relay includes the same information received from S. Negotiation of Migration (Continued)

Nov 8, 2003 SMU School of Engineering Group 2: Hammad, Mann, Nidgundi, & Zhang CSE 8343 Advanced Operating Systems Slide 13 Instructor: Dr. Khalil Example of Negotiation of Process Migration

Nov 8, 2003 SMU School of Engineering Group 2: Hammad, Mann, Nidgundi, & Zhang CSE 8343 Advanced Operating Systems Slide 14 Instructor: Dr. Khalil Eviction  System evict a process that has been migrated to it.  Negotiation allows the designated target machine in migration decision, it may also be useful to evict a process which has been migrated for an adequate response. Sprite has this capability, on sprite each process runs on a single host throughout its life time, this host is known as home node of the process. A process migrated to any node becomes a foreign process and the destination node may evict any foreign process in which case it is forced back to the home node.

Nov 8, 2003 SMU School of Engineering Group 2: Hammad, Mann, Nidgundi, & Zhang CSE 8343 Advanced Operating Systems Slide 15 Instructor: Dr. Khalil The elements of the Sprite eviction mechanism Are as follows: A monitor process at each node monitors current load to determine when to accept a process. If the monitors detects activity it initiates an eviction process on all foreign processes. If a process is evicted, it is sent back to the home node. All processes once marked for eviction are immediately suspended, giving extra processing power to that node. The entire address space of an evicted process is transferred to home node. Eviction (Continued)

Nov 8, 2003 SMU School of Engineering Group 2: Hammad, Mann, Nidgundi, & Zhang CSE 8343 Advanced Operating Systems Slide 16 Instructor: Dr. Khalil Some Terms  Channel: Exists between two processes if they exchange messages.  State: Sequence of messages that have been sent and received along channels incident with the process.  Snapshot: Records the state of a process.  Global State: The combined state of all processes.  Distributed Snapshot: A collection of snapshots, one for each process.

Nov 8, 2003 SMU School of Engineering Group 2: Hammad, Mann, Nidgundi, & Zhang CSE 8343 Advanced Operating Systems Slide 17 Instructor: Dr. Khalil Distributed Global States The state of of a distributed system, called the global state (or global snapshot), is given by the collective state of processes and channels.  Operating system cannot know the current state of all process in the distributed system.  A process can only know the current state of all the processes on a local system through the process control block in memory.  Concurrency issues like mutual exclusion, deadlock and starvation are also present in distributed systems.  Remote processes only know state information that is received by messages. These messages represent the state in the past

Nov 8, 2003 SMU School of Engineering Group 2: Hammad, Mann, Nidgundi, & Zhang CSE 8343 Advanced Operating Systems Slide 18 Instructor: Dr. Khalil Example  Bank account is distributed over two branches.  The total amount in the account is the sum at each branch.  At 3:00 PM the account balance is determined.  Messages are sent to request the information.

Nov 8, 2003 SMU School of Engineering Group 2: Hammad, Mann, Nidgundi, & Zhang CSE 8343 Advanced Operating Systems Slide 19 Instructor: Dr. Khalil Example (Continued)  If at the time of balance determination, the balance from branch A is in transit to branch B.  The result is a false reading.  All messages in transit must be examined at time of observation.  Total consists of balance at both branches and amount in message.

Nov 8, 2003 SMU School of Engineering Group 2: Hammad, Mann, Nidgundi, & Zhang CSE 8343 Advanced Operating Systems Slide 20 Instructor: Dr. Khalil  If clocks at the two branches are not perfectly synchronized.  Transfer amount at 3:01 from branch A.  Amount arrives at branch B at 2:59.  At 3:00 the amount is counted twice. Example (Continued)

Nov 8, 2003 SMU School of Engineering Group 2: Hammad, Mann, Nidgundi, & Zhang CSE 8343 Advanced Operating Systems Slide 21 Instructor: Dr. Khalil Distributed Snapshot Algorithm  Assumption is that messages are delivered in the order they are sent.  It uses a control message called MARKER.  A process (Q) starts this algorithm by recording its state and sending a MARKER to all outgoing channels before any messages are sent.

Nov 8, 2003 SMU School of Engineering Group 2: Hammad, Mann, Nidgundi, & Zhang CSE 8343 Advanced Operating Systems Slide 22 Instructor: Dr. Khalil  Each process (say P) upon receiving a MARKER performs: 1.(P) records its local state. 2.(P) records the state of the incoming channel from (Q) to (P) as empty. 3.(P) propagates the MARKER to all of its neighbors along all outgoing channels. 4.Algorithm terminates once MARKER has been received along all channels. Distributed Snapshot Algorithm

Nov 8, 2003 SMU School of Engineering Group 2: Hammad, Mann, Nidgundi, & Zhang CSE 8343 Advanced Operating Systems Slide 23 Instructor: Dr. Khalil Distributed Mutual Exclusion The problem of mutual exclusion arises in distributed systems whenever concurrent access to shared resources by several sites is involved.  Mutual exclusion must be enforced: only one process at a time is allowed in its critical section.  A process that halts in its non-critical section must do so without interfering with other processes.  It must not be possible for a process requiring access to a critical section to be delayed indefinitely: no deadlock or starvation.

Nov 8, 2003 SMU School of Engineering Group 2: Hammad, Mann, Nidgundi, & Zhang CSE 8343 Advanced Operating Systems Slide 24 Instructor: Dr. Khalil  When no process is in a critical section, any process that requests entry to its critical section must be permitted to enter without delay.  No assumptions are made about relative process speeds or number of processors.  A process remains inside its critical section for a finite time only. Distributed Mutual Exclusion

Nov 8, 2003 SMU School of Engineering Group 2: Hammad, Mann, Nidgundi, & Zhang CSE 8343 Advanced Operating Systems Slide 25 Instructor: Dr. Khalil Centralized Algorithm for Mutual Exclusion  One node is designated as the control node.  This node control access to all shared objects.  If control node fails, mutual exclusion breaks down.

Nov 8, 2003 SMU School of Engineering Group 2: Hammad, Mann, Nidgundi, & Zhang CSE 8343 Advanced Operating Systems Slide 26 Instructor: Dr. Khalil Distributed Algorithm  Average all nodes have equal amount of information.  Each node has a partial picture of the entire system and decision is based on that.  All nodes bear equal responsibility for the final decision.  All nodes expands equal effort in effecting a decision.  Failure of a node does not collapse the whole system  Timing events can not be regulated against a system wide common clock.

Nov 8, 2003 SMU School of Engineering Group 2: Hammad, Mann, Nidgundi, & Zhang CSE 8343 Advanced Operating Systems Slide 27 Instructor: Dr. Khalil Time-Stamping  Each system on the network maintains a counter which functions as a clock.  Each site has a numeric identifier.  When a message is received, the receiving system sets its counter to one more than the maximum of its current value and the incoming time-stamp (counter).  If two messages have the same time-stamp, they are ordered by the number of their sites.  For this method to work each message is sent from one process to all other processes. Ensures all sites have same ordering of messages. For mutual exclusion and deadlock all processes must be aware of the situation.

Nov 8, 2003 SMU School of Engineering Group 2: Hammad, Mann, Nidgundi, & Zhang CSE 8343 Advanced Operating Systems Slide 28 Instructor: Dr. Khalil Distributed Deadlock  More complicated and complex in distributed systems.  No node has the accurate knowledge of the current state of the overall system.  Message transfer between processes involves an unpredictable delay. Two Types of Deadlocks:  Resource allocation.  Communication of messages.

Nov 8, 2003 SMU School of Engineering Group 2: Hammad, Mann, Nidgundi, & Zhang CSE 8343 Advanced Operating Systems Slide 29 Instructor: Dr. Khalil Deadlock in Resource Allocation  Mutual exclusion.  Hold and wait.  No preemption.  Circular wait.

Nov 8, 2003 SMU School of Engineering Group 2: Hammad, Mann, Nidgundi, & Zhang CSE 8343 Advanced Operating Systems Slide 30 Instructor: Dr. Khalil Deadlock Prevention  Circular-wait condition can be prevented by defining a linear ordering of resource types.  Hold-and-wait condition can be prevented by requiring that a process request all of its required resource at one time, and blocking the process until all requests can be granted simultaneously.

Nov 8, 2003 SMU School of Engineering Group 2: Hammad, Mann, Nidgundi, & Zhang CSE 8343 Advanced Operating Systems Slide 31 Instructor: Dr. Khalil Distributed Deadlock Detection  The difficulty is that each site only knows about its own resources, whereas deadlock may involve distributed resources, following techniques can be employed: Centralized Control: One site is responsible for deadlock detection. Therefore it has the complete picture so it can detect deadlock. Hierarchical Control: Lowest node above the nodes involved in deadlock. It is a tree structure, at each node other than leaf nodes, information about all the resource allocation of all dependent nodes is collected. It allows the detection of deadlock at lower level rather than root node. Distributed Control: All processes cooperate in the deadlock detection function. In this case considerable information is exchanged with timestamps, thus overheads are significant.

Nov 8, 2003 SMU School of Engineering Group 2: Hammad, Mann, Nidgundi, & Zhang CSE 8343 Advanced Operating Systems Slide 32 Instructor: Dr. Khalil Deadlock in Message Communication Mutual Waiting:  Deadlock occurs in message communication: When each of a group of processes is waiting for a message from another member of the group and there are no messages in transit. Unavailability of Message Buffers:  Well known in packet-switching data networks, for each node, the queue to the adjacent node in one direction is full with packets destined for the next node beyond. Example: Buffer space for A is filled with packets destined for B. The reverse is true at B.  Structured Buffer Pool is used to prevent deadlock.

Nov 8, 2003 SMU School of Engineering Group 2: Hammad, Mann, Nidgundi, & Zhang CSE 8343 Advanced Operating Systems Slide 33 Instructor: Dr. Khalil Unavailability of Message Buffers

Nov 8, 2003 SMU School of Engineering Group 2: Hammad, Mann, Nidgundi, & Zhang CSE 8343 Advanced Operating Systems Slide 34 Instructor: Dr. Khalil Structured Buffer Pool

Nov 8, 2003 SMU School of Engineering Group 2: Hammad, Mann, Nidgundi, & Zhang CSE 8343 Advanced Operating Systems Slide 35 Instructor: Dr. Khalil References 1.Mittal, Neeraj (2003), “Notes on Consistent Global States,” CS 6378: Advanced Operating Systems, The University of Texas at Dallas, Fall [ 2.Williams, Stephen and Kafura D. (1995), “Global State Recording Algorithm :GSRA,” Online Lecture Notes, CS 5204 – Operating Systems, Virginia Tech, Fall [ Summaries/GlobalState/global_state.html] 3.Singhal, M. and Shivaratri, N. (1994), Advanced Concepts in Operating Systems, McGraw-Hill, pp Chandy, K. M. and Lamport, L. (1991), “Distributed Snapshots: Determining Global States of Distributed Systems”, ACM Transactions on Computer Systems, vol. 9, no. 3, pp Stallings, William (2001), Operating Systems: Internals and Design Principles, 4th Ed., Prentice-Hall, Upper Saddle River, NJ, Figs. 14.1, 14.2, 14.3, 14.17, and

Nov 8, 2003 SMU School of Engineering Group 2: Hammad, Mann, Nidgundi, & Zhang CSE 8343 Advanced Operating Systems Slide 36 Instructor: Dr. Khalil Questions?