Highly available, Fault tolerant Co-scheduling System With working implementation.

Slides:



Advertisements
Similar presentations
Consensus on Transaction Commit
Advertisements

Two phase commit. Failures in a distributed system Consistency requires agreement among multiple servers –Is transaction X committed? –Have all servers.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Consensus Steve Ko Computer Sciences and Engineering University at Buffalo.
CS 5204 – Operating Systems1 Paxos Student Presentation by Jeremy Trimble.
6.852: Distributed Algorithms Spring, 2008 Class 7.
Distributed Systems Overview Ali Ghodsi
CS542: Topics in Distributed Systems Distributed Transactions and Two Phase Commit Protocol.
Implementing A Simple Storage Case Consider a simple case for distributed storage – I want to back up files from machine A on machine B Avoids many tricky.
Nummenmaa & Thanish: Practical Distributed Commit in Modern Environments PDCS’01 PRACTICAL DISTRIBUTED COMMIT IN MODERN ENVIRONMENTS by Jyrki Nummenmaa.
Consensus Hao Li.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Byzantine Fault Tolerance Steve Ko Computer Sciences and Engineering University at Buffalo.
COS 461 Fall 1997 Transaction Processing u normal systems lose their state when they crash u many applications need better behavior u today’s topic: how.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Paxos Steve Ko Computer Sciences and Engineering University at Buffalo.
Transaction.
Consensus Algorithms Willem Visser RW334. Why do we need consensus? Distributed Databases – Need to know others committed/aborted a transaction to avoid.
Paxos Made Simple Gene Pang. Paxos L. Lamport, The Part-Time Parliament, September 1989 Aegean island of Paxos A part-time parliament – Goal: determine.
Logically Centralized, Physically Distributed Mark Stuart Day Cisco Systems.
Systems of Distributed Systems Module 2 -Distributed algorithms Teaching unit 3 – Advanced algorithms Ernesto Damiani University of Bozen Lesson 6 – Two.
CS 603 Distributed Transactions February 18, 2002.
The Architecture Design Process
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 5: Synchronous Uniform.
CS294, YelickConsensus, p1 CS Consensus
1 More on Distributed Coordination. 2 Who’s in charge? Let’s have an Election. Many algorithms require a coordinator. What happens when the coordinator.
Distributed Commit. Example Consider a chain of stores and suppose a manager – wants to query all the stores, – find the inventory of toothbrushes at.
Distributed Databases
Lecture 18 Page 1 CS 111 Online Design Principles for Secure Systems Economy Complete mediation Open design Separation of privileges Least privilege Least.
Paxos Made Simple Jinghe Zhang. Introduction Lock is the easiest way to manage concurrency Mutex and semaphore. Read and write locks. In distributed system:
CS162 Section Lecture 10 Slides based from Lecture and
Distributed Transactions March 15, Transactions What is a Distributed Transaction?  A transaction that involves more than one server  Network.
Bringing Paxos Consensus in Multi-agent Systems Andrei Mocanu Costin Bădică University of Craiova.
Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.
Chapter 19 Recovery and Fault Tolerance Copyright © 2008.
Operating Systems Distributed Coordination. Topics –Event Ordering –Mutual Exclusion –Atomicity –Concurrency Control Topics –Event Ordering –Mutual Exclusion.
Unit 2 (task 28) In this PowerPoint I will tell you about 7 important IT job roles and if a candidate might want one what he would have to do to get one.
Practical Byzantine Fault Tolerance
Byzantine fault-tolerance COMP 413 Fall Overview Models –Synchronous vs. asynchronous systems –Byzantine failure model Secure storage with self-certifying.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Paxos Steve Ko Computer Sciences and Engineering University at Buffalo.
Paxos: Agreement for Replicated State Machines Brad Karp UCL Computer Science CS GZ03 / M st, 23 rd October, 2008.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Wrap-up Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 60641: Operating Systems Implementing Fault-Tolerant Services Using the State Machine Approach: a tutorial Fred B. Schneider, ACM Computing Surveys.
CSE 486/586 CSE 486/586 Distributed Systems Byzantine Fault Tolerance Steve Ko Computer Sciences and Engineering University at Buffalo.
D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University.
Lecture 4 Page 1 CS 111 Online Modularity and Virtualization CS 111 On-Line MS Program Operating Systems Peter Reiher.
Systems Research Barbara Liskov October Replication Goal: provide reliability and availability by storing information at several nodes.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Byzantine Fault Tolerance Steve Ko Computer Sciences and Engineering University at Buffalo.
Fault Tolerance (2). Topics r Reliable Group Communication.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Byzantine Fault Tolerance Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 486/586 Distributed Systems Byzantine Fault Tolerance
Snapshots, checkpoints, rollback, and restart
The consensus problem in distributed systems
Distributed Systems – Paxos
Large Distributed Systems
Operating System Reliability
Operating System Reliability
CSE 486/586 Distributed Systems Byzantine Fault Tolerance
Outline Announcements Fault Tolerance.
Distributed Systems, Consensus and Replicated State Machines
Operating System Reliability
Operating System Reliability
Lab Exercise 2 (Lab Sessions 2, 3, and 4)
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
View Change Protocols and Reconfiguration
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Operating System Reliability
Operating System Reliability
Presentation transcript:

Highly available, Fault tolerant Co-scheduling System With working implementation

Acknowledgements This work is supported through the NSF “Enlightened” Project. Much of the design of this work, and in particular the idea to use the Paxos Consensus algorithm in the first place, are entirely due to Mark Mc Keown from Manchester.

Co-scheduling Most obvious definition is scheduling a number of resources for the same time However, I think of it as meaning the scheduling of multiple resources, where the scheduling is done in an atomic fashion - all resources or none are booked The resources might be required for different times...

Examples Schedule 8 procs on Helix, 32 on Peyote, plus optical network for 12pm to 1pm Schedule workflow tasks onto machines X, Y and Z to start at 1pm (1 hour), 1.30pm (3 hours), 5pm (1 hour)

To Have Phased Commit... Can have phased commit: 1.resources are asked for a reservation (prepare) 2.resources can say yes/no (prepared/aborted), where yes is a commitment to do the work 3.Then: a)If all resources return prepared, then they will be told to finalise the reservations (commit) b)Otherwise, all resources are told to forget the reservations (abort) Prepare Prepared Commit Prepared

To Have Phased Commit... Can have phased commit: 1.resources are asked for a reservation (prepare) 2.resources can say yes/no (prepared/aborted), where yes is a commitment to do the work 3.Then: a)If all resources return prepared, then they will be told to finalise the reservations (commit) b)Otherwise, all resources are told to forget the reservations (abort) Prepare Prepared Abort Aborted

...Or To Not Have Phased Commit Can do without this, using WS-Agreement: 1.You make individual reservations with resources 2.If a resource says no, cancel the others

Is Phased Commit Needed? Long debates on GRAAP mailing list about this between me & Karl (agreed to differ!) Without phased commit, you make a lot of assumptions about the RMs. –May end up with some reserved, some not, e.g. if there is some failure in the network –You end up making & breaking reservations - the end resources don’t know it’s coscheduling. –You may hit a “reservation quota” –You may get charged for making a reservation –You may get a decreasing rate of response (unreliable client who keeps canceling jobs!) –RMs may even think this is some denial of service attack

Problems with 2-phase Commit The transaction manager is a single point of failure If it fails/goes away the user/RMs may not be able to discover the outcome Not good in a distributed environment, without reliable message delivery

Paxos Consensus Leslie Lamport’s (in)famous algorithm Hard to learn, but ultimately simple Maybe even obvious, inevitable Best formulation: “Paxos made Simple” Was applied to Transaction Commit by Lamport and Jim Gray in “Consensus on Transaction Commit” One instance of the consensus algorithm is used for each Prepared/Aborted decision

Paxos Overview Too hard to explain here in detail But, essentially the TM functionality is replicated in multiple acceptors –Algorithm makes progress provided a majority of acceptors are working –Messages can be lost, repeated, arrive in an arbitrary order (but can’t be tampered with) –If you deploy 5 acceptors, you can get a MTTF of about 12 years (assuming a MTTF of 48 hours, and MTTR of 1 hour per acceptor) –Goes up to 600 years if you use 7!

What are we doing? Co-schedule a set of actions Actions are: –Make - create resv. –Modify - change resources in resv. –Move - change schedule of resv. –Cancel - remove resv. Can co-schedule any set of these... Not all RMs have to support all of these - can just abort with appropriate error

Example Co-schedule three Makes on X,Y,Z –X starts at 11.00, runs for one hour –Y starts at 12.15, runs for one hour –Z starts at 13.00, runs for 90 mins

Example Need 30 mins longer for Y Co-schedule Modify on Y with Move on Z

Example But this fails, saying that Z doesn’t support Move So we’re back here (nothing changed)

Example Try again... Co-schedule Modify on Y with Cancel on Z and Make on W

Describing the reservation It’s all XML. Send a set of actions... Make element contains: –Resource - where –Schedule - when –Work - what Acceptors look at the “where” part, so they know what to talk to Don’t look at Work/Schedule Could even be encrypted...

Other messages Response for successful Make is an Ident element For Cancel, you send a Resource element, and the Ident element For Move, you send a Resource, the Ident, and a new Schedule For Modify, you send a Resource, the Ident and a new Work description

Does it work? Yes! Have a working implementation XML over HTTP (no SOAP) Two RMs: –PBSPro scheduler –Calient DiamondWave network switch Co-scheduled 10 compute jobs and 2 switches at iGrid It’s available for download! (but...)

What’s missing? A couple of things not in the first release! –Security (but the model is thought out) –Writing state to stable storage

Resources Everything will happen here! Page includes: –Software for download –Mailing list details Page will include: –Documentation! –Links to those cool papers