Hash History: A Method for Reconciling Mutual Inconsistency in Optimistic Replication Brent ByungHoon Kang, Robert Wilensky and John Kubiatowicz CS Division,

Slides:



Advertisements
Similar presentations
Dynamic Grid Optimisation TERENA Conference, Lijmerick 5/6/02 A. P. Millar University of Glasgow.
Advertisements

Dynamo: Amazon’s Highly Available Key-value Store
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Eventual Consistency Jinyang. Sequential consistency Sequential consistency properties: –Latest read must see latest write Handles caching –All writes.
Garbage Collecting the World Bernard Lang Christian Queinnec Jose Piquer Presented by Yu-Jin Chia See also: pp text.
Dynamo: Amazon's Highly Available Key-value Store Distributed Storage Systems CS presented by: Hussam Abu-Libdeh.
1/27 Replication and Query Processing in the APPA Data Management System Reza AKBARINIA Vidal MARTINS Esther PACITTI Patrick VALDURIEZ.
Replica Control for Peer-to- Peer Storage Systems.
Reclaiming Space from Duplicate Files in a Serverless Distributed File System From Microsoft Research.
Efficient Solutions to the Replicated Log and Dictionary Problems
Conflict-free Replicated Data Types MARC SHAPIRO, NUNO PREGUIÇA, CARLOS BAQUERO AND MAREK ZAWIRSKI Presented by: Ron Zisman.
CS-550: Distributed File Systems [SiS]1 Resource Management in Distributed Systems: Distributed File Systems.
Dynamo Highly Available Key-Value Store 1Dennis Kafura – CS5204 – Operating Systems.
The Design of the Borealis Stream Processing Engine Brandeis University, Brown University, MIT Magdalena BalazinskaNesime Tatbul MIT Brown.
Computer Science Lecture 16, page 1 CS677: Distributed OS Last Class: Web Caching Use web caching as an illustrative example Distribution protocols –Invalidate.
CS 582 / CMPE 481 Distributed Systems
“Managing Update Conflicts in Bayou, a Weakly Connected Replicated Storage System ” Distributed Systems Κωνσταντακοπούλου Τζένη.
CS 582 / CMPE 481 Distributed Systems Replication.
Department of Electrical Engineering
An Authentication Service Against Dishonest Users in Mobile Ad Hoc Networks Edith Ngai, Michael R. Lyu, and Roland T. Chin IEEE Aerospace Conference, Big.
Locality Optimizations in OceanStore Patrick R. Eaton Dennis Geels An introduction to introspective techniques for exploiting locality in wide area storage.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Mutual Consistency Detection of mutual inconsistency in distributed systems (Parker, Popek, et. al.) Distributed system with replication for reliability.
G Robert Grimm New York University Bayou: A Weakly Connected Replicated Storage System.
Jeff Chheng Jun Du.  Distributed file system  Designed for scalability, security, and high availability  Descendant of version 2 of Andrew File System.
Farsite: Ferderated, Available, and Reliable Storage for an Incompletely Trusted Environment Microsoft Reseach, Appear in OSDI’02.
More on Replication and Consistency CS-4513 D-term More on Replication and Consistency CS-4513 Distributed Computing Systems (Slides include materials.
Managing Update Conflicts in Bayou, a Weakly Connected Replicated Storage System D. B. Terry, M. M. Theimer, K. Petersen, A. J. Demers, M. J. Spreitzer.
Mobility Presented by: Mohamed Elhawary. Mobility Distributed file systems increase availability Remote failures may cause serious troubles Server replication.
Dynamo: Amazon’s Highly Available Key-value Store Presented By: Devarsh Patel 1CS5204 – Operating Systems.
Data Consistency in the Structured Peer-to-Peer Network Cheng-Ying Ou, Polly Huang Network and Systems Lab 台灣大學電機資訊學院電機所.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Case Study: Amazon Dynamo Steve Ko Computer Sciences and Engineering University at Buffalo.
Mobility in Distributed Computing With Special Emphasis on Data Mobility.
Orbe: Scalable Causal Consistency Using Dependency Matrices & Physical Clocks Jiaqing Du, EPFL Sameh Elnikety, Microsoft Research Amitabha Roy, EPFL Willy.
CS Storage Systems Lecture 14 Consistency and Availability Tradeoffs.
Replication: optimistic approaches Marc Shapiro with Yasushi Saito (HP Labs) Cambridge Distributed Systems Group.
Replication and Consistency. Reference The Dangers of Replication and a Solution, Jim Gray, Pat Helland, Patrick O'Neil, and Dennis Shasha. In Proceedings.
Pond: the OceanStore Prototype Sean Rhea, Patric Eaton, Dennis Gells, Hakim Weatherspoon, Ben Zhao, and John Kubiatowicz University of California, Berkeley.
Designing a global repository using OceanStore Steven Czerwinski, Anthony Joseph, John Kubiatowicz Summer Retreat June 11, 2002 UC Berkeley.
September 18, 2002 Windows 2000 Server Active Directory By Jerry Haggard.
Distributed File Systems Case Studies: Sprite Coda.
10.- Graph Algorithms Centro de Investigación y Estudios Avanzados del Instituto Politécnico Nacional Introduction Routing Algorithms Computation of Shortest.
Distributed File Systems Overview  A file system is an abstract data type – an abstraction of a storage device.  A distributed file system is available.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Data Versioning Lecturer.
Locating Mobile Agents in Distributed Computing Environment.
Replicated Databases. Reading Textbook: Ch.13 Textbook: Ch.13 FarkasCSCE Spring
IM NTU Distributed Information Systems 2004 Replication Management -- 1 Replication Management Yih-Kuen Tsay Dept. of Information Management National Taiwan.
Example: Rumor Performance Evaluation Andy Wang CIS 5930 Computer Systems Performance Analysis.
CS 347Notes101 CS 347 Parallel and Distributed Data Processing Distributed Information Retrieval Hector Garcia-Molina Zoltan Gyongyi.
A Low-bandwidth Network File System Athicha Muthitacharoen et al. Presented by Matt Miller September 12, 2002.
1 Multiversion Reconciliation for Mobile Databases Shirish Hemanath Phatak & B.R.Badrinath Presented By Presented By Md. Abdur Rahman Md. Abdur Rahman.
1 An Efficient, Low-Cost Inconsistency Detection Framework for Data and Service Sharing in an Internet-Scale System Yijun Lu †, Hong Jiang †, and Dan Feng.
Distributed Operating Systems Part III Andy Wang COP 5611 Advanced Operating Systems.
Write Conflicts in Optimistic Replication Problem: replicas may accept conflicting writes. How to detect/resolve the conflicts? client B client A replica.
Eventual Consistency Jinyang. Review: Sequential consistency Sequential consistency properties: –All read/write ops follow some total ordering –Read must.
Stefanos Antaris Distributed Publish/Subscribe Notification System for Online Social Networks Stefanos Antaris *, Sarunas Girdzijauskas † George Pallis.
CS 347Notes081 CS 347: Parallel and Distributed Data Management Notes 08: P2P Systems.
Highly Available Services and Transactions with Replicated Data Jason Lenthe.
Using Multiple Predictors to Improve the Accuracy of File Access Predictions Gary A. S. Whittle, U of Houston Jehan-François Pâris, U of Houston Ahmed.
Big Data Yuan Xue CS 292 Special topics on.
Kitsuregawa Laboratory Confidential. © 2007 Kitsuregawa Laboratory, IIS, University of Tokyo. [ hoshino] paper summary: dynamo 1 Dynamo: Amazon.
Mobility Victoria Krafft CS /25/05. General Idea People and their machines move around Machines want to share data Networks and machines fail Network.
Nomadic File Systems Uri Moszkowicz 05/02/02.
CSE 486/586 Distributed Systems Case Study: Amazon Dynamo
Chapter 25: Advanced Data Types and New Applications
Dynamo: Amazon’s Highly Available Key-value Store
Advanced Operating Systems Chapter 11 Distributed File systems 11
EECS 498 Introduction to Distributed Systems Fall 2017
Virtual-Time Round-Robin: An O(1) Proportional Share Scheduler
Distributed Database Management Systems
Presentation transcript:

Hash History: A Method for Reconciling Mutual Inconsistency in Optimistic Replication Brent ByungHoon Kang, Robert Wilensky and John Kubiatowicz CS Division, UC Berkeley

Background Optimistic Replication Allow mutable replica to be inconsistent temporarily in a controlled way for high availability and performance Tentative update support in OceanStore Bayou, USENET, and Peer-to-Peer File System (e.g., Ivy, Pangaea, etc.) Need Mechanism for Figuring out the ordering among updates Extracting deltas to be exchanged during reconciliation

Previous Approaches: Version Vectors Widely used in reconciling replicas In most weakly consistent replication systems Bayou, Ficus, Coda, Ivy, Pangaea … etc. Complexity of management grows As new replica site added or deleted Need to assign unique id dynamically for newly added replica sites Doesn’t scale as number of replica site increases Version vector needs one entry for each replica site Size of vector grows in proportion to number of replica sites

Site ASite BSite C ABC 000 V 0,A V 1,A V 2,B V 3,C ABC 100 ABC 210 site_id counter ABC 212 ABC 001 ABC 010 d3d3 d2d2 d1d1 V 4,A m4m4 V 5,C m5m5

Site A New Site D ABC 210 ABC 010 ABC 212 ABC 210 Site C O1: Site B D 0 D 0 D 0 D 0

Our Proposal: Hash History Each site keeps a record of the hash of each version Capture dependency among versions as a directed graph of version hashes (i.e., hash history) The sites exchange the hash history in reconciling replicas The most recent common ancestral version can be found, if no version dominates Useful hints in a subsequent diffing/merging

Site ASite BSite C V 0,A V 1,A V 2,B V 3,C d3d3 d2d2 d1d1 H 0,A H i,site = hash (V i,site )

Site ASite BSite C V 0,A V 1,A V 2,B V 3,C V 4,A d3d3 d2d2 d1d1 m4m4 H 0,A H 1,A H 0,A H 2,B H 0,A H 3,C H i,site = hash (V i,site )

Site ASite BSite C V 0,A V 1,A V 2,B V 3,C V 4,A V 5,C d3d3 d2d2 d1d1 m4m4 m5m5 H 0,A H 1,A H 0,A H 2,B H 0,A H 3,C H 0,A H 1,A H 2,B H 4,A H i,site = hash (V i,site )

Site ASite BSite C V 0,A V 1,A V 2,B V 3,C V 4,A V 5,C d3d3 d2d2 d1d1 m4m4 m5m5 H 0,A H 1,A H 0,A H 2,B H 0,A H 3,C H 0,A H 1,A H 2,B H 4,A H 0,A H 1,A H 2,B H 4,A H 3,C H 5,C H i,site = hash (V i,site )

Site A New Site D Site C O1: Site B H 0,A H 1,A H 2,B H 4,A H 0,A H 2,B H 0,A H 1,A H 2,B H 4,A H 3,C H 5,C H 0,A H 1,A H 2,B H 4,A

Hash History Graph Hash History with Hashtable ChildParents H 0,A null H 1,A H 0,A H 2,B H 0,A H 3,C H 0,A H 4,A H 1,A : H 2,B H 5,C H 4,A : H 3,C (a)(b) Latest : H 5,C H 0,A H 1,A H 2,B H 4,A H 3,C H 5,C

Hash History Graph Hash History with Hashtable ChildParentsdelta H 0,A null H 1,A H 0,A d1d1 H 2,B H 0,A d2d2 H 3,C H 0,A d3d3 H 4,A H 1,A : H 2,B m4m4 H 5,C H 4,A : H 3,C m5m5 (a)(b) Latest : H 5,C H 0,A H 1,A H 2,B H 4,A H 3,C H 5,C d3d3 d2d2 d1d1 m5m5 m4m4

HH Properties Size of hash history is unbounded Simple Aging Sharable Archived Hash Histories Can capture equality case When two different schedule of deltas produce the same output Helps faster convergence

Why Less Conflict in HH than VV HH can covey equality information to the descendents while VV cannot E.g., v1 = v2 = C merges then v3 = E merges then v4 = v3 and v4 could be the same but VV shows conflict ! If v3 and v4 are considered equal, then all descendents of v4 will dominate v3. If v3 and v4 are considered as in conflict, all descendents of v4, will be in conflict with v3

Experiment Goal Comparison with version vector result: HH converges faster with a lower conflict rate than a version vector scheme To what extent is this true in practice? Aging Policy: the aging period for pruning hash history vs. HH size vs. the false conflict rate due to aging when the pruned part of the hash history is required for determining the version dominance

Simulation Setup Event-driven simulator Events are collected from CVS logs Each user represent a replica site Reads the event After each event, the simulator repeats the anti-entropy for 50% (or 25%) of the total number of sites. E.g., if there are 20 sites so far, the anti-entropy is repeated for 10 times with 50% parameter after each event.

CVS Trace Data (from sourceforge.net)

Aging Period vs. HH Size

Aging Period vs. False Conflict

Conclusion Simple to maintain No complexity in site addition/deletion No need to assign unique id dynamically for newly added replica sites Scalable to thousands of sites HH grows in proportion to number of update instances not number of sites Faster Convergence HH can capture and propagate equality information HH growth can be controlled effectively by using aging policy or sharing archived hash history

Future Work Security aspect of HH Self-verifiable Can detect mal-functioning site More information Hash History Approach for Reconciling Mutual Inconsistency in Optimistic Replication, B. Kang, R. Wilensky and J. Kubiatowicz, The 23rd International Conference on Distributed Computing Systems (ICDCS), 2003, Providence, Rhode Island USA

Site ASite BSite C ABC 000 A0A0 B0B0 C0C0 V 0,A V 1,A V 2,B V 3,C V 4,A V 5,C  site value = sign site (value of site’s local counter) ABC 100 A1A1 B0B0 C0C0 ABC 210 A2A2 B1B1 C0C0 site counte r signat ure ABC 212 A2A2 A1A1 A2A2 ABC 001 A0A0 B0B0 C1C1 ABC 010 A0A0 B1B1 C0C0

Site A V 0,A Site B V 1,A Site C V 2,B V 3,C V 4,A V 5,C nilH0H0 H0H0 H1H1 A0A0 A1A1 H0H0 H0H0 H 1 : H 2 H0H0 H1H1 H2H2 H4H4 A0A0 A1A1 B2B2 A4A4 nilH0H0 H0H0 H2H2 A0A0 B2B2 H i = hash (V i,site )  site k = hash(  k’s parents || H k ) nilH0H0 H0H0 H3H3 A0A0 C3C3 H0H0 H0H0 H 1 : H 2 H0H0 H 3 :H 4 H0H0 H1H1 H2H2 H4H4 H3H3 H5H5 A0A0 A1A1 B2B2 A4A4 C3C3 C5C5 nil H0H0 A0A0 parents child authent icator

Site ASite BSite C V 0,A V 1,A V 2,B V 3,C V 4,A V 5,C d3d3 d2d2 d1d1 m4m4 m5m5

Site A V 0,A Site B V 1,A Site C V 2,B V 3,C V 4,A V 5,C nilH0H0 H0H0 H1H1 H0H0 H0H0 H 1 : H 2 H0H0 H1H1 H2H2 H4H4 nilH0H0 H0H0 H2H2 H i = hash (V i,site ) nilH0H0 H0H0 H3H3 H0H0 H0H0 H 1 : H 2 H0H0 H 3 :H 4 H0H0 H1H1 H2H2 H4H4 H3H3 H5H5 nil H0H0 parents child