Murtadha Al Hubail Project Team:. Motivation & Goals NC 1 Cluster Controller NC2 NC3 AsterixDB typical cluster consists of a master node (Cluster Controller)

Slides:



Advertisements
Similar presentations
Megastore: Providing Scalable, Highly Available Storage for Interactive Services. Presented by: Hanan Hamdan Supervised by: Dr. Amer Badarneh 1.
Advertisements

High Availability Deep Dive What’s New in vSphere 5 David Lane, Virtualization Engineer High Point Solutions.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Mecanismos de alta disponibilidad con Microsoft SQL Server 2008 Por: ISC Lenin López Fernández de Lara.
© 2011 VMware Inc. All rights reserved High Availability Module 7.
1 Disk Based Disaster Recovery & Data Replication Solutions Gavin Cole Storage Consultant SEE.
Oracle Data Guard Ensuring Disaster Recovery for Enterprise Data
WebLogic Clustering - Failover, and Load Balancing Bryan Ferrel and Ramarao Desaraju CS 522 Computer Communications December 4, 2002.
Chapter 13 (Web): Distributed Databases
Distributed Database Management Systems
Overview Distributed vs. decentralized Why distributed databases
Reliability and Partition Types of Failures 1.Node failure 2.Communication line of failure 3.Loss of a message (or transaction) 4.Network partition 5.Any.
1 Awareness Services for Digital Libraries Arturo Crespo Hector Garcia-Molina Stanford University.
MS CLOUD DB - AZURE SQL DB Fault Tolerance by Subha Vasudevan Christina Burnett.
MetaSync File Synchronization Across Multiple Untrusted Storage Services Seungyeop Han Haichen Shen, Taesoo Kim*, Arvind Krishnamurthy,
Data Consistency in the Structured Peer-to-Peer Network Cheng-Ying Ou, Polly Huang Network and Systems Lab 台灣大學電機資訊學院電機所.
Commit Protocols. CS5204 – Operating Systems2 Fault Tolerance Causes of failure: process failure machine failure network failure Goals : transparent:
Highly Available ACID Memory Vijayshankar Raman. Introduction §Why ACID memory? l non-database apps: want updates to critical data to be atomic and persistent.
Distributed Storage System Survey
1 The Google File System Reporter: You-Wei Zhang.
IBM Almaden Research Center © 2011 IBM Corporation 1 Spinnaker Using Paxos to Build a Scalable, Consistent, and Highly Available Datastore Jun Rao Eugene.
Module 12: Designing High Availability in Windows Server ® 2008.
RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University (Joint work with Diego Ongaro, Ryan Stutsman, Steve Rumble, Mendel.
Module 1: Recovering Messaging Databases. Overview Overview of Database Recovery Scenarios Recovering a Messaging Database Using Dial-Tone Recovery.
A BigData Tour – HDFS, Ceph and MapReduce These slides are possible thanks to these sources – Jonathan Drusi - SCInet Toronto – Hadoop Tutorial, Amir Payberah.
Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.
Freenet: A Distributed Anonymous Information Storage and Retrieval System Josh Colvin CIS 590, Fall 2011.
Module 9 Planning a Disaster Recovery Solution. Module Overview Planning for Disaster Mitigation Planning Exchange Server Backup Planning Exchange Server.
Jason Baker, Chris Bond, James C. Corbett, JJ Furman, Andrey Khorlin, James Larson,Jean Michel L´eon, Yawei Li, Alexander Lloyd, Vadim Yushprakh Megastore.
Durability and Crash Recovery for Distributed In-Memory Storage Ryan Stutsman, Asaf Cidon, Ankita Kejriwal, Ali Mashtizadeh, Aravind Narayanan, Diego Ongaro,
Lecture 12 Recoverability and failure. 2 Optimistic Techniques Based on assumption that conflict is rare and more efficient to let transactions proceed.
7. Replication & HA Objectives –Understand Replication and HA Contents –Standby server –Failover clustering –Virtual server –Cluster –Replication Practicals.
CS 347Lecture 9B1 CS 347: Parallel and Distributed Data Management Notes 13: BigTable, HBASE, Cassandra Hector Garcia-Molina.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Databases Illuminated
Using Paxos to Build a Scalable, Consistent, and Highly Available Datastore Jun Rao, Eugene J. Shekita, Sandeep Tata IBM Almaden Research Center PVLDB,
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
Commit Algorithms Hamid Al-Hamadi CS 5204 November 17, 2009.
Distributed Systems CS Consistency and Replication – Part I Lecture 10, September 30, 2013 Mohammad Hammoud.
R*: An overview of the Architecture By R. Williams et al. Presented by D. Kontos Instructor : Dr. Megalooikonomou.
Copyright © 2012 Cleversafe, Inc. All rights reserved. 1 Combining the Power of Hadoop with Object-Based Dispersed Storage.
DATABASE REPLICATION DISTRIBUTED DATABASE. O VERVIEW Replication : process of copying and maintaining database object, in multiple database that make.
Highly Available Services and Transactions with Replicated Data Jason Lenthe.
Implementation of Simple Cloud-based Distributed File System Group ID: 4 Baolin Wu, Liushan Yang, Pengyu Ji.
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
Log Shipping, Mirroring, Replication and Clustering Which should I use? That depends on a few questions we must ask the user. We will go over these questions.
DISASTER RECOVERY PLAN By: Matthew Morrow. WHAT HAPPENS WHEN A DISASTER OCCURS  What happens to a business during a disaster?  What steps does a business.
PERFORMANCE MANAGEMENT IMPROVING PERFORMANCE TECHNIQUES Network management system 1.
Operating System Reliability Andy Wang COP 5611 Advanced Operating Systems.
Outline Introduction Background Distributed DBMS Architecture
AlwaysOn Mirroring, Clustering
Alternative system models
CPS 512 midterm exam #1, 10/7/2016 Your name please: ___________________ NetID:___________ /60 /40 /10.
Operating System Reliability
Operating System Reliability
SQL Server High Availability Amit Vaid.
Supporting Fault-Tolerance in Streaming Grid Applications
Implementing Consistency -- Paxos
Outline Announcements Fault Tolerance.
EECS 498 Introduction to Distributed Systems Fall 2017
Operating System Reliability
Operating System Reliability
Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing Zaharia, et al (2012)
Operating System Reliability
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Team 6: Ali Nickparsa, Yoshimichi Nakatsuka, Yuya Shiraki
The Gamma Database Machine Project
Operating System Reliability
Operating System Reliability
Designing Database Solutions for SQL Server
Presentation transcript:

Murtadha Al Hubail Project Team:

Motivation & Goals NC 1 Cluster Controller NC2 NC3 AsterixDB typical cluster consists of a master node (Cluster Controller) and multiple node controllers (NC). Data is hashed partitioned across NCs based on primary key and stored only in one node. In case of NC failure, all data stored on that node is permanently lost. Goal: – Implement an efficient data replication protocol to prevent permanent data loss.

Related Work 1. A performance study of three high availability data replication strategies. By Hui-i Hsiao, David J. Dewitt. 2. Using Paxos to Build a Scalable, Consistent, and Highly Available Datastore. By Jun Rao, Eugene J. Shekita, Eep Tata. 3. Evaluation of remote backup algorithms for transaction-processing systems. By Christos A Polyzois, Héctor García-Molina. 4. Storage Management in AsterixDB. By Sattam Alsubaiee et al. 5. AsterixDB: A Scalable, Open Source BDMS. By Sattam Alsubaiee et al.

System Architecture Each NC has a number of NCs designated as backup nodes based on a replication factor (# of copies). Primary and backup NCs based on replication factor = 3

System Architecture Implement data replication by sending transactions REDO logs to back NCs via Replication Channel over the network, and synchronize between primary and backup nodes using Two Phase Commit protocol. Transaction logs flow from NC1 to backups in NC2 and NC3

Testing & Evaluation 1. Deploy AsterixDB on a cluster consisting for two or more nodes and enable data replication. 2. Evaluate data replication performance impact compared to no replication. 3. Disconnect a node from the cluster and erase all data to simulate hardware failure. 4. Reconnect the failed node. 5. Check that all data has been restored.