COORDINATION, SYNCHRONIZATION AND LOCKING WITH ISIS2 Ken Birman 1 Cornell University.

Slides:



Advertisements
Similar presentations
Understanding an Apps Architecture ASFA Computer Science: Principles Fall 2013.
Advertisements

Isis 2 Design Choices A few puzzles to think about when considering use of Isis 2 in your work.
CS542 Topics in Distributed Systems Diganta Goswami.
CS 542: Topics in Distributed Systems Diganta Goswami.
1 Transactions and Web Services. 2 Web Environment Web Service activities form a unit of work, but ACID properties are not always appropriate since Web.
Multithreaded Programs in Java. Tasks and Threads A task is an abstraction of a series of steps – Might be done in a separate thread – Java libraries.
Implementing A Simple Storage Case Consider a simple case for distributed storage – I want to back up files from machine A on machine B Avoids many tricky.
CS514: Intermediate Course in Operating Systems Professor Ken Birman Vivek Vishnumurthy: TA.
1 Internet Networking Spring 2006 Tutorial 8 DNS and DHCP as UDP applications.
Distributed Locking. Distributed Locking (No Replication) Assumptions Lock tables are managed by individual sites. The component of a transaction at a.
CS514: Intermediate Course in Operating Systems Professor Ken Birman Vivek Vishnumurthy: TA.
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Distributed Snapshots –Termination detection Election algorithms –Bully –Ring.
Group Communications Group communication: one source process sending a message to a group of processes: Destination is a group rather than a single process.
Concurrent Processes Lecture 5. Introduction Modern operating systems can handle more than one process at a time System scheduler manages processes and.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering.
Concurrency: Mutual Exclusion, Synchronization, Deadlock, and Starvation in Representative Operating Systems.
Distributed Systems 2006 Group Membership * *With material adapted from Ken Birman.
Hands-On Microsoft Windows Server 2003 Networking Chapter 7 Windows Internet Naming Service.
Distributed Systems 2006 Virtual Synchrony* *With material adapted from Ken Birman.
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Vector timestamps Global state –Distributed Snapshot Election algorithms.
Maintaining and Updating Windows Server 2008
Transaction. A transaction is an event which occurs on the database. Generally a transaction reads a value from the database or writes a value to the.
Transactions and concurrency control
1 Spring Semester 2009, Dept. of Computer Science, Technion Internet Networking recitation #2 DNS and DHCP.
Distributed Deadlocks and Transaction Recovery.
10/04/2011CS4961 CS4961 Parallel Programming Lecture 12: Advanced Synchronization (Pthreads) Mary Hall October 4, 2011.
WORKFLOW IN MOBILE ENVIRONMENT. WHAT IS WORKFLOW ?  WORKFLOW IS A COLLECTION OF TASKS ORGANIZED TO ACCOMPLISH SOME BUSINESS PROCESS.  EXAMPLE: Patient.
Advanced Operating Systems CIS 720 Lecture 1. Instructor Dr. Gurdip Singh – 234 Nichols Hall –
Nachos Phase 1 Code -Hints and Comments
Copyright © 2007, Oracle. All rights reserved. Managing Concurrent Requests.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Mutual Exclusion Steve Ko Computer Sciences and Engineering University at Buffalo.
Orbited Scaling Bi-directional web applications A presentation by Michael Carter
Jalisa Eady Definitions Mr. Gabbard Pd
4061 Session 23 (4/10). Today Reader/Writer Locks and Semaphores Lock Files.
The Alternative Larry Moore. 5 Nodes and Variant Input File Sizes Hadoop Alternative.
1 Program5 Due Friday, March Prog4 user_thread... amount = … invoke delegate transact (amount)... mainThread... Total + = amount … user_thread...
Paxos: Agreement for Replicated State Machines Brad Karp UCL Computer Science CS GZ03 / M st, 23 rd October, 2008.
CS5412: HOW DURABLE SHOULD IT BE? Ken Birman 1 CS5412 Spring 2012 (Cloud Computing: Birman) Lecture XV.
Concurrency in Java Brad Vander Zanden. Processes and Threads Process: A self-contained execution environment Thread: Exists within a process and shares.
Lecture 8 Page 1 CS 111 Online Other Important Synchronization Primitives Semaphores Mutexes Monitors.
Replication (1). Topics r Why Replication? r System Model r Consistency Models – How do we reason about the consistency of the “global state”? m Data-centric.
DOUBLE INSTANCE LOCKING A concurrency pattern with Lock-Free read operations Pedro Ramalhete Andreia Correia November 2013.
Chapter 22 Bootstrap and Auto configuration (DHCP) History of Bootstrap -Bootstrap is used to assign IP address to the computer. -Constant changes in the.
The Client-Server Model And the Socket API. Client-Server (1) The datagram service does not require cooperation between the peer applications but such.
REPLICATING FILES AND OTHER BIG OBJECTS “OUT OF BAND” WITH ISIS2 Ken Birman 1 Cornell University.
Distributed Transaction Management, Fall 2002Lecture 2 / Distributed Locking Jyrki Nummenmaa
Programming Fundamentals. Topics to be covered Today Recursion Inline Functions Scope and Storage Class A simple class Constructor Destructor.
5. The Transport Layer 5.1 Role of Transport Layer It bridge the gab between applications and the network layer. Provides reliable cost-effective data.
CS 162 Section 10 Two-phase commit Fault-tolerant computing.
Lecture 12-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2012 Indranil Gupta (Indy) October 4, 2012 Lecture 12 Mutual Exclusion.
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 6: Planning, Configuring, And Troubleshooting WINS.
TCP/IP Illustrated, Volume 1: The Protocols Chapter 6. ICMP: Internet Control Message Protocol ( 월 ) 김 철 환
1 © Process Software Corp. DHCP Failover Protocol Jeff DECUS Europe 2000 Thursday, 13 Apr :00 - 9:45.
Fault Tolerance (2). Topics r Reliable Group Communication.
Distributed Mutual Exclusion Synchronization in Distributed Systems Synchronization in distributed systems are often more difficult compared to synchronization.
1 5-High-Performance Embedded Systems using Concurrent Process (cont.)
Mutual Exclusion Algorithms. Topics r Defining mutual exclusion r A centralized approach r A distributed approach r An approach assuming an organization.
ZOOKEEPER. CONTENTS ZooKeeper Overview ZooKeeper Basics ZooKeeper Architecture Getting Started with ZooKeeper.
Mutual Exclusion -- Addendum. Mutual Exclusion in Critical Sections.
Monitoring Dynamic IOC Installations Using the alive Record Dohn Arms Beamline Controls & Data Acquisition Group Advanced Photon Source.
Powerpoint Templates Data Communication Muhammad Waseem Iqbal Lecture # 07 Spring-2016.
Outline Other synchronization primitives
Other Important Synchronization Primitives
Understanding an App’s Architecture
Outline Announcements Fault Tolerance.
Threading And Parallel Programming Constructs
Active replication for fault tolerance
Fault-Tolerant State Machine Replication
Transactions in Distributed Systems
Electrical and Computer Engineering
Presentation transcript:

COORDINATION, SYNCHRONIZATION AND LOCKING WITH ISIS2 Ken Birman 1 Cornell University

Isis 2 has many options for coordination 2  Within a group of processes there are many ways in which you might want coordinated or synchronized behavior  Isis 2 can support all of them, but because there are many patterns, the topic isn’t trivial!

Examples 3  Primary/Backup fault-tolerance  Group g receives a request  A and B are assigned to handle it  If A succeeds, it sends a reply and we’re done  If A fails, B takes over

Examples 4  Coordinator/Cohort fault-tolerance  Group g receives a request  A and B are assigned to handle it, but in such a way that every request has its own primary (“coordinator”) and every request has one or more backups (“cohort”)  If A succeeds, it sends a reply and we’re done  If A fails, B takes over  Updates issued to the group state by A prior to failing are visible to B when it takes over

Examples 5  Periodic action based on a timer  A clock is running, and every X ms, the group members jointly perform some action  They could each take some “part” of a shared task  Or the action could be performed in primary/backup style with a primary member initiating it but others standing ready to help if the primary fails

Examples 6  Locking  The group manages some form of data. The items have an associated key  Members can obtain a lock on the key Mutex locks: while a member holds the key, no other member can access the key Read/Write locks: Read locks allow other readers but Write locks exclude both readers and writers

Barrier synchronization 7  Group has some form of task to do  An initiator sends a request to start the members working on that task, then wishes to wait until they are all finished  The members function as “workers”. As each finishes it signals that it has finished its part

Summary of Options 8 GoalPurposeTechnique to Use Primary/Backup“There can only be one” (primary, that is) Group of size 2. Primary is the member with rank=0 Coordinator/Cohort“Many hands, light work” Group of any size. Rotate the roles… Periodic ActionsHeart beatRank 0 member sends a timing signal Locking: Mutex“One at a time”Isis 2 locking tool (basic) Locking: Read/Write LocksDatabasesIsis 2 locking tool (fancy) Barrier Synchronization“Is everyone finished?”An Isis 2 Query

Primary - Backup 9

… Details 10  Primary backup  Easiest: Form a group with 2 members  Rule: The member with rank 0 is the primary, the member with rank 1 is a backup  To update data, use the Isis 2 Send primitive, but call g.Flush() before talking to an external user of the service If the group updates databases or file storage may need to use SafeSend. This topic is covered in a different module.  If a failure occurs, a new view will signal that the backup is now the new primary  It picks up in the same state that the old primary was in

Issues with primary/backup 11  Notice that Isis 2 lacks a way to send a multicast to the group plus one external member  So suppose an outside person asks the group to do something, like in our old picture:  Backup will know what the primary intended to do  But did the primary actually send the response?  Did it reach the user? Time Update the monitoring and alarms criteria for Mrs. Marsh as follows… Confirmed Response delay seen by end-user would include Internet latencies Service response delay Service instance

This problem can’t be solved! 12  In the Isis 2 system, we can’t atomically send a message to the external user in such a way that the backup will be certain it was sent.  So… the backup must re-send the last message(s) to the user!  … but how? Time Update the monitoring and alarms criteria for Mrs. Marsh as follows… Confirmed Response delay seen by end-user would include Internet latencies Service response delay Service instance Confirmed

UDP, TCP-R… 13  With UDP the backup might be able to just send the identical reply.  With TCP, the user sees the connection break and if the confirmation wasn’t received, may need to re-issue the request. The backup (the new primary) would sense that this is a repeated request and resend the old reply  Cornell has a technology called TCP-R. With it a single TCP connection can be “taken over” by the backup. With TCP-R the backup can finish the sending the primary was in the midst of doing even if it crashed

Coordinator Cohort 14

… More details 15  Coordination-Cohort  Really the identical idea, but now we relay the request into the group, using OrderedSend. If the group updates databases or file storage may need to use SafeSend. A topic covered in a different module.  Some simple rule should be used to map requests to the members: not just “rank 0 is the primary” but “rank K will be the primary for request R” For example, the request could contain some sort of identification data, or we could compute a hashcode Any rule that all members can apply will do the trick  In other ways, just like primary-backup

Which multicast should we use? 16  If external users connect in a load-balanced way, each member can  Handle read-only work locally  Use OrderedSend to relay and work that updates the group state.. If the group updates databases or file storage may need to use SafeSend.  With OrderedSend, always call Flush if important updates were done and we are able to respond to the external user  If external users all connect to the rank-0 member then we can just use Send, but we risk overloading that member if everyone connects to the same one

Periodic Actions with Timer 17

Periodic Actions 18  Easiest solution: Have the member with rank 0 launch a thread  This thread loops  Wait for K ms (use Thread.Sleep() or a timed call to WaitOne on a semaphore that is always 0)  Then issue a g.Send to “ping everyone”  Latencies of g.Send in an otherwise idle group will be very low and the jitter even smaller  Action should be taken by everyone within a millisecond or two

Periodic Actions 19  Fancier solutions are also possible  There is a literature on real-time actions in fault- tolerant systems  If you are facing a “mission critical” need you might consider using such a solution For example, every member could run its own timer, and every member could send a “ping” On receiving “ping at time T” from a majority of members of the current view, take the action  Such a solution will be far more robust, but more costly

Read and Write Locks 20

Locking 21  Isis 2 supports group-wide locks  If you want local locking, don’t use this tool  Basic API:  g.WriteLock(“name” [, timeout]). g.Lock() for short.  g.ReadLock(“name” [, timeout])  g.Unlock(“name”)  You can also control the persistency of the lock state of the group and the way that failures are handled

Rules… 22  Lock requests are handled one by one in order  Locks on different “names” don’t conflict  Locks on the same name:  Write locks exclude all other locks  Read locks: allow further read locks, until a write lock is waiting. But then read locks wait behind the write lock  Timeout: Causes a “cancel” request to be sent

Handling of failures 23  If the lock holder fails, the default action is to “break” the lock (release it). Read locks always act this way.  For write locks, you can specify that instead that a broken lock be passed to the rank-0 group member  A lock-transfer upcall event notifies you when this occurs  You would need to code whatever handling you desire. It will run in the rank-0 member as necessary.

Lock-State Persistence 24  The lock manager state is stored in a data structure that can live in memory or be retained on disk  The default configuration keeps the structure in memory  If you override this and request persistent locking, we use SafeSend instead of OrderedSend, and the locking state will be saved on disk.  In the default case, if the group terminates the lock state is discarded. In the persistent case lock state is retained even across group shutdowns

When is lock persistence important? 25  If your database will be used across periods when the whole group shuts down, we would say that the database itself is  External (not in-memory)  Persistent  In such cases you’ll use SafeSend to update the database. And for this case may want to make the lock service state persistent too. DB 1 DB 2

Barrier Synchronization 26

Barrier Synchronization 27  This is easy achieved using g.Query/OrderedQuery  Query can initiate the computation, or could simply specify “which” computation you have in mind, if you have many running in parallel.  Group members use g.Reply() when they reach the barrier point. Sender waits for all to reply

Barrier Synchronization: Failures 28  We recommend that in the Reply you send  The size of the group view when computation started  Which rank this particular member had in the group  … just record these values when the Query arrives  Then do g.Reply(myRank, N, other data…)  Caller can thus verify that it received all N replies. If not, it knows that some member crashed

Summary of Options 29 GoalPurposeTechnique to Use Primary/Backup“There can only be one” (primary, that is) Group of size 2. Primary is the member with rank=0 Coordinator/Cohort“Many hands, light work”Group of any size. Rotate the roles… Periodic ActionsHeart beatRank 0 member sends a timing signal Locking: Mutex“One at a time”Isis 2 locking tool (basic) Locking: Read/Write LocksDatabases, 2PL. Isis 2 has locks but doesn’t implement transactions. Isis 2 locking tool (fancy) Caution: make the lock state as persistent as the database! Barrier Synchronization“Is everyone finished?”An Isis 2 Query