1 Cheriton School of Computer Science 2 Department of Computer Science RemusDB: Transparent High Availability for Database Systems Umar Farooq Minhas 1,

Slides:



Advertisements
Similar presentations
Remus: High Availability via Asynchronous Virtual Machine Replication
Advertisements

Live migration of Virtual Machines Nour Stefan, SCPD.
Database Architectures and the Web
High Performance Cluster Computing Architectures and Systems Hai Jin Internet and Cluster Computing Center.
Live Migration of Virtual Machines Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm Hansen, Eric Jul, Christian Limpach, Ian Pratt, Andrew Warfield.
Replication and Consistency (2). Reference r Replication in the Harp File System, Barbara Liskov, Sanjay Ghemawat, Robert Gruber, Paul Johnson, Liuba.
Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.
Transaction.
Memory-efficient Virtual Machine High Availability Karen Kai-Yuan Hou Prof. Kang G. Shin University of Michigan Mustafa Uysal (VMware) Arif Merchant (HP.
Acknowledgments University of Waterloo University of British Columbia
June 23rd, 2009Inflectra Proprietary InformationPage: 1 SpiraTest/Plan/Team Deployment Considerations How to deploy for high-availability and strategies.
Computer Science Lecture 18, page 1 CS677: Distributed OS Last Class: Fault Tolerance Basic concepts and failure models Failure masking using redundancy.
Database Systems on Virtual Machines: How Much Do We Lose? Kristin Travis March 2, 2011.
1 © Copyright 2010 EMC Corporation. All rights reserved. EMC RecoverPoint/Cluster Enabler for Microsoft Failover Cluster.
Towards High-Availability for IP Telephony using Virtual Machines Devdutt Patnaik, Ashish Bijlani and Vishal K Singh.
Database Replication techniques: a Three Parameter Classification Authors : Database Replication techniques: a Three Parameter Classification Authors :
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Remus: High Availability via Asynchronous Virtual Machine Replication.
Module – 12 Remote Replication
1© Copyright 2011 EMC Corporation. All rights reserved. EMC RECOVERPOINT/ CLUSTER ENABLER FOR MICROSOFT FAILOVER CLUSTER.
Implementing Failover Clustering with Hyper-V
National Manager Database Services
Basics of Operating Systems March 4, 2001 Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard.
Disaster Recovery as a Cloud Service Chao Liu SUNY Buffalo Computer Science.
Highly Available ACID Memory Vijayshankar Raman. Introduction §Why ACID memory? l non-database apps: want updates to critical data to be atomic and persistent.
Computer Measurement Group, India Reliable and Scalable Data Streaming in Multi-Hop Architecture Sudhir Sangra, BMC Software Lalit.
Evaluation of Delta Compression Techniques for Efficient Live Migration of Large Virtual Machines Petter Svärd, Benoit Hudzia, Johan Tordsson and Erik.
1 The Google File System Reporter: You-Wei Zhang.
Distributed Systems Tutorial 11 – Yahoo! PNUTS written by Alex Libov Based on OSCON 2011 presentation winter semester,
Remus: VM Replication Jeff Chase Duke University.
CH2 System models.
Virtualization for Adaptability Project Presentation CS848 Fall 2006 Umar Farooq Minhas 29 Nov 2006 David R. Cheriton School of Computer Science University.
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
1 Wenguang WangRichard B. Bunt Department of Computer Science University of Saskatchewan November 14, 2000 Simulating DB2 Buffer Pool Management.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.
Module 13 Implementing Business Continuity. Module Overview Protecting and Recovering Content Working with Backup and Restore for Disaster Recovery Implementing.
Practical Byzantine Fault Tolerance
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
1 ACTIVE FAULT TOLERANT SYSTEM for OPEN DISTRIBUTED COMPUTING (Autonomic and Trusted Computing 2006) Giray Kömürcü.
SPECULATIVE EXECUTION IN A DISTRIBUTED FILE SYSTEM E. B. Nightingale P. M. Chen J. Flint University of Michigan.
Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.
Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Vidhya Sivasankaran.
Distributed Databases DBMS Textbook, Chapter 22, Part II.
Kjell Orsborn UU - DIS - UDBL DATABASE SYSTEMS - 10p Course No. 2AD235 Spring 2002 A second course on development of database systems Kjell.
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
High Availability in DB2 Nishant Sinha
COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service Li Zhijian Fujitsu Limited.
Storage Systems CSE 598d, Spring 2007 Rethink the Sync April 3, 2007 Mark Johnson.
Chap 7: Consistency and Replication
Efficient Live Checkpointing Mechanisms for computation and memory-intensive VMs in a data center Kasidit Chanchio Vasabilab Dept of Computer Science,
Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Recovery.
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
Data Disaster Recovery Planning Greg Fibiger 1/7/2016.
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Presented by: Pierre LaBorde, Jordan Deveroux, Imran Ali, Yazen Ghannam, Tzu-Wei.
Complete VM Mobility Across the Datacenter Server Virtualization Hyper-V 2012 Live Migrate VM and Storage to Clusters Live Migrate VM and Storage Between.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 7 – Buffer Management.
Virtual Machine Movement and Hyper-V Replica
Log Shipping, Mirroring, Replication and Clustering Which should I use? That depends on a few questions we must ask the user. We will go over these questions.
MongoDB Distributed Write and Read
A Technical Overview of Microsoft® SQL Server™ 2005 High Availability Beta 2 Matthew Stephen IT Pro Evangelist (SQL Server)
EECS 498 Introduction to Distributed Systems Fall 2017
CS 258 Reading Assignment 4 Discussion Exploiting Two-Case Delivery for Fast Protected Messages Bill Kramer February 13, 2002 #
SpiraTest/Plan/Team Deployment Considerations
7.1. CONSISTENCY AND REPLICATION INTRODUCTION
Distributed Availability Groups
The SMART Way to Migrate Replicated Stateful Services
Distributed Databases
Presentation transcript:

1 Cheriton School of Computer Science 2 Department of Computer Science RemusDB: Transparent High Availability for Database Systems Umar Farooq Minhas 1, Shriram Rajagopalan 2, Brendan Cully 2, Ashraf Aboulnaga 1, Kenneth Salem 1, Andrew Warfield 2

Umar Farooq Minhas RemusDB: Transparent High Availability for Database Systems 2 The Need for High Availability A database system is highly available (HA) if it remains accessible to its users in the face of hardware failures Users expect 24x7 availability even for simple database applications –HA requirement is no longer limited to mission critical applications Key challenges in providing HA –maintaining database consistency in the face of failures –minimizing the impact of HA on performance Existing HA solutions are complex and expensive Goal: Provide simple and cheap HA for database systems

Umar Farooq Minhas RemusDB: Transparent High Availability for Database Systems 3 DBMS HA: Active/Standby Replication A copy of the database is stored at two servers, a primary and a backup Primary server accepts user requests and performs database updates Changes to database propagated to backup server by propagating the transaction log Backup server takes over as primary upon failure DB DBMS Primary Server DB DBMS Backup Server Database Changes Primary Server

Umar Farooq Minhas RemusDB: Transparent High Availability for Database Systems 4 High Availability As a Service Active/standby replication is complex to implement in the DBMS, and complex to administer –propagating the transaction log –atomic handover from primary to backup on failure –redirecting client requests to backup after failure –minimizing effect on performance Our approach: provide HA as a service from the underlying virtualization infrastructure –implement active/standby replication at the virtual machine layer –push the complexity out of the DBMS –any DBMS can be made HA with little or no modification –low performance overhead

Umar Farooq Minhas RemusDB: Transparent High Availability for Database Systems 5 Changes to VM State RemusDB: Transparent HA for DBMS RemusDB is a reliable, cost-effective, active/standby HA solution implemented at the virtualization layer –propagates all changes in VM state from primary to backup –HA with no code changes to the DBMS –completely transparent failover from primary to backup –failover to a warmed up backup server Backup Server DB DBMS Primary Server VM DB DBMS VM Primary Server

Umar Farooq Minhas RemusDB: Transparent High Availability for Database Systems 6 Outline Introduction VM Based HA (Remus) RemusDB Experimental Evaluation Conclusion

Umar Farooq Minhas RemusDB: Transparent High Availability for Database Systems 7 HA Through Virtual Machine Checkpointing RemusDB is based on Remus, which is part of the Xen hypervisor –maintains replica of a running VM on a separate physical machine –extends live migration to do efficient VM replication –provides transparent failover with only seconds of downtime Remus uses an epoch based checkpointing system –divides time into epochs (~50ms) –performs a checkpoint at the end of each epoch 1.the primary VM is suspended 2.all state changes are copied to a buffer 3.the primary VM is resumed 4.an asynchronous message is sent to the backup containing all state changes

Umar Farooq Minhas RemusDB: Transparent High Availability for Database Systems 8 Remus Checkpoints After a failure, backup resumes execution from the latest checkpoint –any work done by the primary during epoch C will be lost (unsafe) Remus provides a consistent view of execution to clients –any network packets sent during an epoch are buffered until the next checkpoint –guarantees that a client will see results only if they are based on safe execution –same principle is also applied to disk writes

Umar Farooq Minhas RemusDB: Transparent High Availability for Database Systems 9 VM Checkpointing with Database Workloads DBMS Client Primary Server query response (unprotected) no protection processing response time (unprotected) Remus protection network buffering response (protected) overhead of protection up to 32 % response time (protected)

Umar Farooq Minhas RemusDB: Transparent High Availability for Database Systems 10 RemusDB Remus, optimized for protecting DBMS Memory Optimizations –database workloads tend to modify more memory in each epoch as compared to other workloads –reduce checkpointing overhead by Network Optimization –exploit DBMS transaction semantics to avoid message buffering latency –commit protection (CP) 1.sending less data –asynchronous checkpoint compression (ASC) 2.protecting less memory –disk read tracking (RT) –memory deprotection

Umar Farooq Minhas RemusDB: Transparent High Availability for Database Systems 11 Asynchronous Checkpoint Compression Goal: Reduce overhead by sending less checkpoint data Key observations 1.Database workloads typically involve a large set of frequently changing pages of memory e.g., buffer pool pages results in a large amount of replication traffic 2.Memory writes often change only a small part of the pages data to be replicated contains redundancy Replication traffic can be significantly reduced by only sending the actual changes to the memory pages

Umar Farooq Minhas RemusDB: Transparent High Availability for Database Systems 12 Asynchronous Checkpoint Compression Protected VM Xen Dirty Pages (epoch i) LRU Cache Dirty pages from epochs [1 … i-1] to backup Compute delta and compress Domain 0

Umar Farooq Minhas RemusDB: Transparent High Availability for Database Systems 13 Disk Read Tracking DB BP Active VM Standby VM DBMS P Changes to VM State P DB P BP DBMS DBMS loads page from disk into buffer pool (BP) –clean to DBMS, dirty to Remus Remus synchronizes dirty BP pages in every checkpoint Synchronization of clean BP pages is unnecessary –can be read from the disk at the backup on failover P

Umar Farooq Minhas RemusDB: Transparent High Availability for Database Systems 14 Disk Read Tracking Goal: Reduce overhead by avoiding unnecessary page synchronizations Disk read tracking in RemusDB –tracks the set of memory pages into which disk reads are placed –does not mark these pages dirty unless they are actually modified –adds an annotation to the replication stream indicating the disk sectors to read to reconstruct these pages

Umar Farooq Minhas RemusDB: Transparent High Availability for Database Systems 15 Network Optimization Remus requires buffering of outgoing network packets –ensures clients can never see results of unsafe computation –adds 2 to 3 orders of magnitude in latency per round trip –single largest source of overhead for many database workloads Key idea: Exploit consistency and durability semantics provided by database transactions –allow DBMS to decide which packets to protect Commit Protection (CP) –protect only transaction control packets i.e., COMMIT and ABORT –any committed transaction is safe Reduces latency but not fully transparent

Umar Farooq Minhas RemusDB: Transparent High Availability for Database Systems 16 Implementing Commit Protection Added a new setsockopt() option to Linux –an interface for the DBMS to selectively protect packets DBMS changes –use setsockopt() to switch client connection to protected mode before sending COMMIT or ABORT –after failover, a recovery handler runs in the DBMS at the backup aborts all in-flight transactions where the client connection was in unprotected mode CP is not transparent to the DBMS –103 LoC for PostgreSQL, 85 LoC for MySQL

Umar Farooq Minhas RemusDB: Transparent High Availability for Database Systems 17 Outline Introduction VM Based HA (Remus) RemusDB Experimental Evaluation Conclusion

Umar Farooq Minhas RemusDB: Transparent High Availability for Database Systems 18 Experimental Setup Backup Server DB PostgreSQL / MySQL (Active VM) Primary Server Xen 4.0 Gigabit Ethernet DB PostgreSQL / MySQL (Standby VM) Xen 4.0 TPC-C / TPC-H

Umar Farooq Minhas RemusDB: Transparent High Availability for Database Systems 19 Behavior of RemusDB During Failover (MySQL) Primary server fails

Umar Farooq Minhas RemusDB: Transparent High Availability for Database Systems 20 Overhead During Normal Operation (TPC-C)

Umar Farooq Minhas RemusDB: Transparent High Availability for Database Systems 21 Overhead During Normal Operation (TPC-H)

Umar Farooq Minhas RemusDB: Transparent High Availability for Database Systems 22 Conclusion Maintaining availability in the face of hardware failures is an important goal for any DBMS Traditional HA solutions are expensive and complex by nature RemusDB is an efficient HA solution implemented at the virtualization layer –offers HA as a service –relies on whole VM checkpointing –runs on commodity hardware RemusDB can make any DBMS highly available with little or no modification while imposing very little performance overhead

Umar Farooq Minhas RemusDB: Transparent High Availability for Database Systems 23 Behavior of RemusDB During Failover (MySQL) Primary server fails

Umar Farooq Minhas RemusDB: Transparent High Availability for Database Systems 24 Effects of DB Buffer Pool Size (TPC-H)

Umar Farooq Minhas RemusDB: Transparent High Availability for Database Systems 25 Effects of DB Buffer Pool Size (TPC-H)

Umar Farooq Minhas RemusDB: Transparent High Availability for Database Systems 26 Effect of Database Size on RemusDB (TPC-C)