Replication and Consistency (2). Reference r Replication in the Harp File System, Barbara Liskov, Sanjay Ghemawat, Robert Gruber, Paul Johnson, Liuba.

Slides:



Advertisements
Similar presentations
Remus: High Availability via Asynchronous Virtual Machine Replication
Advertisements

Chapter 4 : File Systems What is a file system?
Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung
CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture X: Transactions.
1 Cheriton School of Computer Science 2 Department of Computer Science RemusDB: Transparent High Availability for Database Systems Umar Farooq Minhas 1,
CS-550: Distributed File Systems [SiS]1 Resource Management in Distributed Systems: Distributed File Systems.
Chapter 13 (Web): Distributed Databases
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
CMPT Dr. Alexandra Fedorova Lecture X: Transactions.
Database Replication techniques: a Three Parameter Classification Authors : Database Replication techniques: a Three Parameter Classification Authors :
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
Teaching material based on Distributed Systems: Concepts and Design, Edition 3, Addison-Wesley Copyright © George Coulouris, Jean Dollimore, Tim.
Sinfonia: A New Paradigm for Building Scalable Distributed Systems Marcos K. Aguilera, Arif Merchant, Mehul Shah, Alistair Veitch, Christonos Karamanolis.
CS 333 Introduction to Operating Systems Class 18 - File System Performance Jonathan Walpole Computer Science Portland State University.
Reliability and Partition Types of Failures 1.Node failure 2.Communication line of failure 3.Loss of a message (or transaction) 4.Network partition 5.Any.
EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
The Google File System.
University of Pennsylvania 11/21/00CSE 3801 Distributed File Systems CSE 380 Lecture Note 14 Insup Lee.
Distributed Databases
Case Study - GFS.
File Systems (2). Readings r Silbershatz et al: 11.8.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
REPLICATION IN THE HARP FILE SYSTEM B. Liskov, S. Ghemawat, R. Gruber, P. Johnson, L. Shrira, M. Williams MIT.
Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.
Highly Available ACID Memory Vijayshankar Raman. Introduction §Why ACID memory? l non-database apps: want updates to critical data to be atomic and persistent.
Managing Multi-User Databases AIMS 3710 R. Nakatsu.
Chapter Fourteen Windows XP Professional Fault Tolerance.
Chapter 19 Recovery and Fault Tolerance Copyright © 2008.
UNIX File and Directory Caching How UNIX Optimizes File System Performance and Presents Data to User Processes Using a Virtual File System.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Data Versioning Lecturer.
1 File Systems: Consistency Issues. 2 File Systems: Consistency Issues File systems maintains many data structures  Free list/bit vector  Directories.
From Viewstamped Replication to BFT Barbara Liskov MIT CSAIL November 2007.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Database Security Outline.. Introduction Security requirement Reliability and Integrity Sensitive data Inference Multilevel databases Multilevel security.
Concurrency Control. Objectives Management of Databases Concurrency Control Database Recovery Database Security Database Administration.
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
1 Principles of Database Systems With Internet and Java Applications Today’s Topic Chapter 15: Reliability and Security in Database Servers Instructor’s.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
Computer Science Lecture 19, page 1 CS677: Distributed OS Last Class: Fault tolerance Reliable communication –One-one communication –One-many communication.
Feb 1, 2001CSCI {4,6}900: Ubiquitous Computing1 Eager Replication and mobile nodes Read on disconnected clients may give stale data Eager replication prohibits.
Chap 7: Consistency and Replication
Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University Building Dependable Distributed Systems.
Distributed File Systems 11.2Process SaiRaj Bharath Yalamanchili.
Systems Research Barbara Liskov October Replication Goal: provide reliability and availability by storing information at several nodes.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Replication Steve Ko Computer Sciences and Engineering University at Buffalo.
EEC 688/788 Secure and Dependable Computing Lecture 9 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Distributed File Systems Questions answered in this lecture: Why are distributed file systems useful? What is difficult about distributed file systems?
Computer Science Lecture 19, page 1 CS677: Distributed OS Last Class: Fault tolerance Reliable communication –One-one communication –One-many communication.
Remote Backup Systems.
Jonathan Walpole Computer Science Portland State University
MongoDB Distributed Write and Read
Lecturer : Dr. Pavle Mogin
Introduction to NewSQL
Outline Announcements Fault Tolerance.
7.1. CONSISTENCY AND REPLICATION INTRODUCTION
CSE 486/586 Distributed Systems Consistency --- 1
Active replication for fault tolerance
Printed on Monday, December 31, 2018 at 2:03 PM.
EEC 688/788 Secure and Dependable Computing
From Viewstamped Replication to BFT
EEC 688/788 Secure and Dependable Computing
Outline Introduction Background Distributed DBMS Architecture
Chapter 2: Operating-System Structures
Distributed Systems CS
EEC 688/788 Secure and Dependable Computing
Chapter 2: Operating-System Structures
Remote Backup Systems.
Concurrency Control.
Presentation transcript:

Replication and Consistency (2)

Reference r Replication in the Harp File System, Barbara Liskov, Sanjay Ghemawat, Robert Gruber, Paul Johnson, Liuba Shrira, Michael Williams, MIT

Highly Available, Reliable, Persistent (HARP) fs r Replicated Unix file system accessible via the Virtual File System (VFS) interface r VFS is a software abstraction in UNIX like OS. It provides a unified approach a number of different file systems VFS NTFSUFSEXT2FSHARP Low level file system

HARP r Provides highly available, reliable storage for files r Guarantees atomic file operations in spite of concurrency and failure r Primary copy replication m Master server authoritative m Replicas – backup servers m Updates are sent to “enough” replicas to guarantee fail- safe behavior r Log structured updates

Using logs for throughput r Disks are partitioned into tracks, sectors and cylinders r Writing a file might involve writing blocks in different tracks (slow because of seeks) r Log structure file systems allow user to write sequentially onto disk. Logs contain the transformations. Lazy update.

Outline r Replication in the Harp File System, Barbara Liskov, Sanjay Ghemawat, Robert Gruber, Paul Johnson, Liuba Shrira, Michael Williams, MIT m Barbara Liskov – first woman to receive Ph.D. in Computer Science in the US (1968, Stanford). Faculty at MIT since 1972

HARP – Highly Available Reliable Persistent r Provides highly available, reliable storage for files r Guarantees atomic file operations in spite of concurrency and failure r Primary copy replication (Eager master) m Master server authoritative m Replicas – backup servers m Updates are sent to “enough” replicas to guarantee fail- safe behavior r Log structured updates

HARP - Operating Environment r Bunch of nodes connected by a network r Fail-stop - Nodes fail by crashing (they don’t limp along) r Network may lose, duplicate messages, deliver messages out of order (but not corrupt messages) r Loosely synchronized clocks r Each server is equipped with a small UPS

Replication Method r Primary-copy method r One node acts as the primary – clients request from this primary r Primary contacts backups r Two-phase modifications m Phase 1: Primary informs the backups about modification Backups acknowledge receipt of modification Primary commits and returns m Phase 2: Primary informs backups of commit in background

Failover protocol r Failover protocol – view change r View change m failed node removed from service m Recovered node is put back into service r Primary of new view maybe different from old view r Clients requests sent to primary of new view r 2n+1 servers for tolerating n failures

Failover – Contd. r Store n+1 copies r N nodes witness view change to make sure only one view is selected r Each replicate group manages one or more file systems m 1 designated primary m N designated backups m N designated witnesses

Normal case processing r Primary maintains a log in volatile memory in which it records modification operations r Some records are phase 1 while others are for operations that have committed r Primary maintains a commit point (CP); index of the latest committed operation r An apply process applies committed operations to the file system (AP) asynchronously r Lower Bound pointer records operations that were written to disk r Global LB is used to cleanup the logs

Volatile state of the modification logs Written to disk Apply to disk Commit Phase 1 CP AP LB Primary Time Written to disk Apply to disk Commit Phase 1 CP AP LB Backup GLB

Normal case operations r Operations that do not modify file system state (get file attributes – e.g. ls) entirely in primary. Performed on committed writes. r Non-modifications can lead to problems in network partitions. Hence they use loosely synchronized clocks and restrict view change after a interval  since the last message

View change r Requirements m Correctness: Operations must appear to be executed in commit order E.g. create file, list file, delete file expects a certain ordering m Reliability: Effects of committed operations must survive single failures and simultaneous failures m Availability: m Timelines: Fail-over must be done in a timely manner r Use event records to avoid problems with multiple applications of event records

Discussion