Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Slides:



Advertisements
Similar presentations
Optimistic Methods for Concurrency Control By : H.T. Kung & John T. Robinson Presenters: Munawer Saeed.
Advertisements

COMP 655: Distributed/Operating Systems Summer 2011 Dr. Chunbo Chu Week 7: Consistency 4/13/20151Distributed Systems - COMP 655.
Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services Authored by: Seth Gilbert and Nancy Lynch Presented by:
Consistency and Replication Chapter 7 Part II Replica Management & Consistency Protocols.
Nummenmaa & Thanish: Practical Distributed Commit in Modern Environments PDCS’01 PRACTICAL DISTRIBUTED COMMIT IN MODERN ENVIRONMENTS by Jyrki Nummenmaa.
Lecture 11 Recoverability. 2 Serializability identifies schedules that maintain database consistency, assuming no transaction fails. Could also examine.
Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
1 CSIS 7102 Spring 2004 Lecture 8: Recovery (overview) Dr. King-Ip Lin.
COS 461 Fall 1997 Transaction Processing u normal systems lose their state when they crash u many applications need better behavior u today’s topic: how.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
Chapter 13 (Web): Distributed Databases
CMPT Dr. Alexandra Fedorova Lecture X: Transactions.
Concurrency Control and Recovery In real life: users access the database concurrently, and systems crash. Concurrent access to the database also improves.
Transaction Management and Concurrency Control
Transaction Management and Concurrency Control
1 ICS 214B: Transaction Processing and Distributed Data Management Replication Techniques.
Session - 14 CONCURRENCY CONTROL CONCURRENCY TECHNIQUES Matakuliah: M0184 / Pengolahan Data Distribusi Tahun: 2005 Versi:
Overview Distributed vs. decentralized Why distributed databases
Transaction Management
EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
1 Transaction Management Database recovery Concurrency control.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
©Silberschatz, Korth and Sudarshan18.1Database System Concepts Centralized Systems Run on a single computer system and do not interact with other computer.
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Transaction Management and Concurrency Control.
SM3121 Software Technology Mark Green School of Creative Media.
Transaction. A transaction is an event which occurs on the database. Generally a transaction reads a value from the database or writes a value to the.
Distributed Databases
Transactions and Recovery
Distributed Deadlocks and Transaction Recovery.
Lecture 18 Page 1 CS 111 Online Design Principles for Secure Systems Economy Complete mediation Open design Separation of privileges Least privilege Least.
LOGO OPERATING SYSTEM Dalia AL-Dabbagh
Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Lecture 18 Page 1 CS 111 Online Access Control Security could be easy – If we didn’t want anyone to get access to anything The trick is giving access to.
Replication and Consistency. Reference The Dangers of Replication and a Solution, Jim Gray, Pat Helland, Patrick O'Neil, and Dennis Shasha. In Proceedings.
DISTRIBUTED DATABASE SYSTEM.  A distributed database system consists of loosely coupled sites that share no physical component  Database systems that.
Transactions Sylvia Huang CS 157B. Transaction A transaction is a unit of program execution that accesses and possibly updates various data items. A transaction.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.
Distributed Database Systems Overview
Replicated Databases. Reading Textbook: Ch.13 Textbook: Ch.13 FarkasCSCE Spring
Optimistic Design 1. Guarded Methods Do something based on the fact that one or more objects have particular states  Make a set of purchases assuming.
Distributed Operating Systems Part III Andy Wang COP 5611 Advanced Operating Systems.
1 ACTIVE FAULT TOLERANT SYSTEM for OPEN DISTRIBUTED COMPUTING (Autonomic and Trusted Computing 2006) Giray Kömürcü.
Introduction to Database Systems1. 2 Basic Definitions Mini-world Some part of the real world about which data is stored in a database. Data Known facts.
Databases Illuminated
Data Sharing. Data Sharing in a Sysplex Connecting a large number of systems together brings with it special considerations, such as how the large number.
Distributed OSes Continued Andy Wang COP 5611 Advanced Operating Systems.
Transactions and Concurrency Control. Concurrent Accesses to an Object Multiple threads Atomic operations Thread communication Fairness.
Caching Consistency and Concurrency Control Contact: Dingshan He
Chapter 15: Transactions Loc Hoang CS 157B. Definition n A transaction is a discrete unit of work that must be completely processed or not processed at.
Chap 7: Consistency and Replication
Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.
Transactions. Transaction: Informal Definition A transaction is a piece of code that accesses a shared database such that each transaction accesses shared.
CSC 411/511: DBMS Design Dr. Nan WangCSC411_L12_JDBC_MySQL 1 Transations.
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
3 Database Systems: Design, Implementation, and Management CHAPTER 9 Transaction Management and Concurrency Control.
SHUJAZ IBRAHIM CHAYLASY GNOPHANXAY FIT, KMUTNB JANUARY 05, 2010 Distributed Database Systems | Dr.Nawaporn Wisitpongphan | KMUTNB Based on article by :
Topics in Distributed Databases Database System Implementation CSE 507 Some slides adapted from Navathe et. Al and Silberchatz et. Al.
1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.
Lecture 5 Page 1 CS 111 Summer 2013 Bounded Buffers A higher level abstraction than shared domains or simple messages But not quite as high level as RPC.
Distributed Databases – Advanced Concepts Chapter 25 in Textbook.
Chapter 19: Distributed Databases
Outline Announcements Fault Tolerance.
Database Security Transactions
Distributed OSes Continued
Distributed Database Management Systems
Database System Architectures
Transactions, Properties of Transactions
Presentation transcript:

Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems

More Introductory Materials Important Issues in distributed OSes Important distributed OS tools and mechanisms

More Important Issues in Distributed Operating Systems Autonomy Consistency and transactions

Autonomy To some degree, users need to control their own resources The more a system encourages interdependence, the less autonomy How to best trade off sharing and interdependence versus autonomy?

Problems with Too Much Interdependence Vulnerability to failures Global control Hard to pinpoint responsibility Hard security problems

Problems with Too Much Autonomy Redundancy of functions Heterogeneity  Especially in software Poor resource sharing

Methods to Improve Autonomy Without causing problems with sharing  Replicate vital services on each machine  Don’t export services that are unnecessary  Provide strong security guarantee

Consistency Maintaining consistency is a major problem in distributed systems If more than one system accesses data, can be hard to ensure consistency But if cooperating processes see inconsistent data, disasters are possible

A Sample Consistency Problem Site A Site B Site CData Item 1

A Sample Consistency Problem Site A Site B Site CData Item 1

A Sample Consistency Problem Site A Site B Site CData Item 1

A Sample Consistency Problem Site A Site B Site CData Item 1

A Sample Consistency Problem Site A Site B Site CData Item 1

A Sample Consistency Problem Site A Site B Site CData Item 1

Causes of Consistency Problems Failures and partitions Caching effects Replication of data

So why do this stuff? Note these problems arise because of what are otherwise desirable features Failures and partitions  Working in the face of failures Caching  Avoiding repetition of expensive operations Replication  Higher availability

Handling Consistency Problems Don’t share data  Generally not feasible Callbacks Invalidations Ignore the problem  Sometimes OK, but not always

Callback Methods Check that your data view is consistent whenever there might be a problem In most general case, on every access More practically, every so often Extremely expensive if remote check required High overheads if there’s usually no problem

Invalidation Methods When situations change, inform those who know about the old situation Requires extensive bookkeeping Practical in some cases when changes infrequent High overheads if there’s usually no problem

Consistency and Atomicity Atomic actions are “all or nothing”  Either the entire set of actions occur  Or none of them do At all times, including while being performed Apparently indivisible and instantaneous Relatively easy to provide in single machine systems

Atomic Actions in Single Processors Lock all associated resources (with semaphores or other synchronization mechanisms) Perform all actions without examining unlocked resources Unlock all resources Real trick is to provide atomicity even if process is switched in the middle

Why are distributed atomic actions hard? Lack of centralized control What if multiple processes on multiple machines want to perform an atomic action? How do you properly lock everything? How do you properly unlock everything? Failure conditions especially hard

Important Distributed OS Tools and Mechanisms Caching and replication Transactions and two-phase commit Hierarchical name space Optimistic methods

Caching and Replication Remotely accessing data in the pits It almost always takes longer It’s less predictable It clogs the network It annoys other nodes Other nodes annoy your It’s less secure

But what else can you do? Data must be shared  And by off-machine processes If the data isn’t local, and you need it, you must get it So, make sure data you need is local  The problem is that everyone else also wants their data local

Making Data Local Store what you need locally Make copies Migrate necessary data in Cache data Replicate data

Store It Locally Each site stores the data it needs on local media But what if two sites need to store the same data? Or if you don’t have enough room for all your data?

Local Storage Example Site A Foo Site B Bar Site C Froz

Make Copies Each site stores its own copy of the data it needs Works well for rarely updated data  Like copies of system utility programs Works poorly for frequently written data Doesn’t solve the problem of lack of local space

Copying Example Site A Foo Site B Copy of Foo Site C Copy of Foo

Migrate the Data In When you need a piece of data, find it and bring it to your site  Taking it away from the old site Works poorly for highly shared data Can cause severe storage problems Can overburden the network Essentially how shared software licenses work

Migration Example Site A Foo Site B Site C I need Foo

Migration Example Site A Site B Site C Foo

Caching When data is accessed remotely, temporarily store a copy of it locally  Perhaps using callback or invalidation for consistency  Or perhaps not Avoids problems of storage Still not quite right for frequently written data

Caching Example Site A Foo Site B Cached Foo Site C Cached Foo

Replication Maintain multiple local replicas of the data Changes made to one replica are automatically propagated to other replicas Logically connects copies of data into a single entity Doesn’t answer question of limited space

Replication Example Site A Foo 1 Site B Foo 2 Site C Foo 3

Replication Advantages Most accesses to data are purely local  So performance is good Fault tolerance  Failure of a single node doesn’t lose data  Partitioned sites can access data Load balancing  Replicas can share the work

Replication and Updates When a data item is replicated, updates to that item must be propagated to all replicas Updates come to one replica  Something must assure they get to the others

Replication Update Example Site A Foo 1 Site B Foo 2 Site C Foo 3 update Foo

Replication Update Example Site A Foo 1 Site B Foo 2 Site C Foo 3 update Foo

Update Propagation Methods Instant versus delayed Synchronous versus asynchronous Atomic versus non-atomic

Instant vs. Delayed Propagation “Instant” can’t mean instant in a distributed system  But it can mean “quickly” Instant notification not always possible  What if a site storing a replica is down? So some delayed version of update is also required

Instant Update Propagation Example Site A Foo 1 Site B Foo 2 Site C Foo 3 update Foo

Instant Update Propagation Example Site A Foo 1 Site B Foo 2 Site C Foo 3 update Foo

Instant Update Propagation Example Site A Foo 1 Site B Foo 2 Site C Foo 3 update Foo

Synchronous vs. Asynchronous Propagation Update request sooner or later gets a success signal Does it get it before all propagation completes (asynchronous) or not (synchronous)? Synchronous propagation delays completion Asynchronous propagation allows inconsistencies

Synchronous Propagation Example Site A Foo 1 Site B Foo 2 Site C Foo 3 update Foo

Synchronous Propagation Example Site A Foo 1 Site B Foo 2 Site C Foo 3 update Foo

Synchronous Propagation Example Site A Foo 1 Site B Foo 2 Site C Foo 3 update Foo

Synchronous Propagation Example Site A Foo 1 Site B Foo 2 Site C Foo 3 update Foo update complete

Asynchronous Propagation Example Site A Foo 1 Site B Foo 2 Site C Foo 3 update Foo

Asynchronous Propagation Example Site A Foo 1 Site B Foo 2 Site C Foo 3 update Foo update complete

Asynchronous Propagation Example Site A Foo 1 Site B Foo 2 Site C Foo 3 update Foo update complete

Atomic vs. Non-Atomic Update Propagation Atomic propagation lets no one see new data until all replicas store it Non-atomic lets users see data at some replicas before all replicas have updated it Atomic update propagation can seriously delay data availability Non-atomic propagation allows users to see potentially inconsistent data

Replication Consistency Problems Unless update propagation is atomic, consistency problems can arise One user sees a different data version than another user at the same time But even atomic propagation isn’t enough to prevent this situation

Concurrent Update What if two users simultaneously ask to update different replicas of the data? “Simultaneously” has a looser definition in distributed systems How do you prevent both from updating it? Update propagation style offers no help

Concurrent Update Example Site A Foo 1 Site B Foo 2 Site C Foo 3 update Foo

Concurrent Update Example Site A Foo 1 Site B Foo 2 Site C Foo 3 update Foo

Preventing Concurrent Updates One solution is to lock all copies before making updates That’s expensive And what if one of 20 replicas is unavailable? You must allow updates to data when partitions or failures occur

Locking Example Site A Foo 1 Site B Foo 2 Site C Foo 3 update Foo

Locking Example Site A Foo 1 Site B Foo 2 Site C Foo 3 update Foo request lock

Locking Example Site A Foo 1 Site B Foo 2 Site C Foo 3 update Foo request lock lock granted

Locking Example Site A Foo 1 Site B Foo 2 Site C Foo 3 update Foo

Locking Example Site A Foo 1 Site B Foo 2 Site C Foo 3 update Foo

Locking Example Site A Foo 1 Site B Foo 2 Site C Foo 3 update Foo unlock

Locking Example Site A Foo 1 Site B Foo 2 Site C Foo 3 update Foo unlock unlocked

Locking Example Site A Foo 1 Site B Foo 2 Site C Foo 3 update Foo update complete

Concurrent Update Prevention Schemes Primary site Token approaches Majority voting Weighted voting

Primary Site Methods Only one site can accept updates  Or that site must approve all updates In extraordinary circumstances, appoint new primary site + Simple - Poor reliability, availability - Non-democratic - Poor performance in many cases

Primary Site Example Site A Foo 1 Site B Foo 2 Site C Foo 3 update Foo

Primary Site Example Site A Foo 1 Site B Foo 2 Site C Foo 3 update Foo

Second Primary Site Example Site A Foo 1 Site B Foo 2 Site C Foo 3 update Foo

Second Primary Site Example Site A Foo 1 Site B Foo 2 Site C Foo 3 update Foo

Token-based Approaches Only the site holding the token can accept updates But the token can move from site to site + Relatively simple + More adaptive than central site + Exploit locality - Poor reliability, availability - Non-demonstratic - Poor performance in some cases

Token Example Site A Foo 1 Site B Foo 2 Site C Foo 3 update Foo

Token Example Site A Foo 1 Site B Foo 2 Site C Foo 3 update Foo

Second Token Example Site A Foo 1 Site B Foo 2 Site C Foo 3 update Foo

Second Token Example Site A Foo 1 Site B Foo 2 Site C Foo 3 update Foo

Second Token Example Site A Foo 1 Site B Foo 2 Site C Foo 3 update Foo

Why is this any different than primary site? Site A Foo 1 Site B Foo 2 Site C Foo 3 update Foo

Majority Voting To perform updates, replica must receive approval from majority of all replicas Once a replica grants approval to one update, it cannot grant it to another  Until the first update is completed

Majority Voting Example Site A Foo 1 Site B Foo 2 Site C Foo 3 update Foo

Majority Voting Example Site A Foo 1 Site B Foo 2 Site C Foo 3 update Foo request vote

Majority Voting Example Site A Foo 1 Site B Foo 2 Site C Foo 3 update Foo request vote yes vote

Majority Voting Example Site A Foo 1 Site B Foo 2 Site C Foo 3 update Foo request vote yes vote

Majority Voting, Con’t + Democratic + Easy to understand + More reliable, available - Some sites still can’t write - Voting is a distributed action So, it’s expensive to do it

Weighted Voting Like majority voting, but some replicas get more votes than others Must obtain majority of votes, but not necessarily from majority of sites Fits neatly into transaction models

Weighted Voting Con’t + More flexible than majority + Can provide better performance - Somewhat less democratic - Some sites still can’t write - Still potentially expensive - More complex

Basic Problems with Update Control Methods Either very poor reliability/availability or expensive distributed algorithms for update Always some reliability/availability problems Particularly bad for slow networks, expensive networks, flaky networks, mobile computers

Transactions A special case of atomic actions  Originally from databases Sequence of operations that transforms a current consistent state to a new consistent state  Without ever exposing an inconsistent state

Transaction Example Move $10 from my savings account to my checking account Basically, subtract $10 from savings account, add $10 to checking account But never “lose” my $10  And never give me an extra $10

Running Transactions Multiple transactions must not interfere So you can run them one at a time Or run them simultaneously  But avoiding all interference Serializability avoids interference

Serializability A property of a proposed schedule of transactions A serializable schedule produces the same results as some serial execution of those transactions Even though the actions may be have been performed in a different order

Consistency Example for a Distributed System Site A Site B Site C update variable X update variable Y update variable Z

What Problems Could Arise? Other processes could write the variables Other processes could read the variables Failures could interrupt the process How can we avoid these problems?