VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Trade-offs in Cloud.

Slides:



Advertisements
Similar presentations
Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services Authored by: Seth Gilbert and Nancy Lynch Presented by:
Advertisements

Replication. Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.
Replication and Consistency (2). Reference r Replication in the Harp File System, Barbara Liskov, Sanjay Ghemawat, Robert Gruber, Paul Johnson, Liuba.
BREWER’S CONJECTURE AND THE FEASIBILITY OF CAP WEB SERVICES (Eric Brewer) Seth Gilbert Nancy Lynch Presented by Kfir Lev-Ari.
Advanced Database Systems September 2013 Dr. Fatemeh Ahmadi-Abkenari 1.
CS 582 / CMPE 481 Distributed Systems
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Computer Science Lecture 14, page 1 CS677: Distributed OS Consistency and Replication Introduction Consistency models –Data-centric consistency models.
©Silberschatz, Korth and Sudarshan19.1Database System Concepts Lecture-10 Distributed Database System A distributed database system consists of loosely.
©Silberschatz, Korth and Sudarshan18.1Database System Concepts Centralized Systems Run on a single computer system and do not interact with other computer.
NoSQL Database.
Distributed Databases
IBM Haifa Research 1 The Cloud Trade Off IBM Haifa Research Storage Systems.
Massively Parallel Cloud Data Storage Systems S. Sudarshan IIT Bombay.
Plan for Intro to Cloud Databases
Databases with Scalable capabilities Presented by Mike Trischetta.
Distributed Systems Tutorial 11 – Yahoo! PNUTS written by Alex Libov Based on OSCON 2011 presentation winter semester,
Replication and Consistency. Reference The Dangers of Replication and a Solution, Jim Gray, Pat Helland, Patrick O'Neil, and Dennis Shasha. In Proceedings.
1. Big Data A broad term for data sets so large or complex that traditional data processing applications ae inadequate. 2.
04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Distributed Database Systems.
© , OrangeScape Technologies Limited. Confidential 1 Write Once. Cloud Anywhere. Building Highly Scalable Web applications BASE gives way to ACID.
Database Management System Module 5 DeSiaMorewww.desiamore.com/ifm1.
Transactions Sylvia Huang CS 157B. Transaction A transaction is a unit of program execution that accesses and possibly updates various data items. A transaction.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Amazon’s Dynamo Lecturer.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.
Transaction Lectured by, Jesmin Akhter, Assistant professor, IIT, JU.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Data Versioning Lecturer.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Exam and Lecture Overview.
Replicated Databases. Reading Textbook: Ch.13 Textbook: Ch.13 FarkasCSCE Spring
CAP + Clocks Time keeps on slipping, slipping…. Logistics Last week’s slides online Sign up on Piazza now – No really, do it now Papers are loaded in.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation MongoDB Architecture.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Partitioning and Replication.
PMIT-6101 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
CSE 486/586 CSE 486/586 Distributed Systems Consistency Steve Ko Computer Sciences and Engineering University at Buffalo.
CAP Theorem Justin DeBrabant CIS Advanced Systems - Fall 2013.
Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...
CSE 486/586 Distributed Systems Consistency --- 3
1 Advanced Database Concepts Transaction Management and Concurrency Control.
“Distributed Algorithms” by Nancy A. Lynch SHARED MEMORY vs NETWORKS Presented By: Sumit Sukhramani Kent State University.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Advanced Database Design.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Introduction to Cloud.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Cassandra Architecture.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Amazon’s Dynamo Lecturer.
Intro to NoSQL Databases Tony Hannan November 2011.
CSCI5570 Large Scale Data Processing Systems
Cloud Computing and Architecuture
Trade-offs in Cloud Databases
MongoDB Distributed Write and Read
CSE 486/586 Distributed Systems Consistency --- 1
Partitioning and Replication
Cassandra Transaction Processing
NOSQL.
Lecturer : Dr. Pavle Mogin
Database Concepts.
Introduction to NewSQL
Consistency and CAP.
Chapter 19: Distributed Databases
NOSQL databases and Big Data Storage Systems
Massively Parallel Cloud Data Storage Systems
CSE 486/586 Distributed Systems Consistency --- 3
NoSQL Databases An Overview
EECS 498 Introduction to Distributed Systems Fall 2017
Outline Announcements Fault Tolerance.
CSE 486/586 Distributed Systems Consistency --- 1
PERSPECTIVES ON THE CAP THEOREM
Transaction Properties: ACID vs. BASE
Transactions and Concurrency
CSE 486/586 Distributed Systems Consistency --- 3
Presentation transcript:

VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Trade-offs in Cloud Databases Lecturer : Dr. Pavle Mogin

Advanced Database Design and Implementation 2015 Trade-offs in Cloud Databases 1 Plan for Trade-offs in Cloud Databases Features of relational SQL databases ACID properties NoSQL databases CAP Theorem BASE Properties –More about consistency/availability trade-offs –Readings: Have a look at Useful Links at the Course Home Page

Advanced Database Design and Implementation 2015 Trade-offs in Cloud Databases 2 Traditional Databases Most traditional databases are relational Relational data model has been developed to serve applications that insist on data consistency and reliability RDBMSs are OLTP DBMSs Data consistency and reliability are guaranteed by ACID properties ACID stands for Atomicity, Consistency, Isolation, and Durability

Advanced Database Design and Implementation 2015 Trade-offs in Cloud Databases 3 ACID Database Properties Atomicity –Transactions must act as a whole, or not at all –No changes within a transaction can persist if any change within the transaction fails Consistency –Changes made by a transaction respect all database integrity constraints and the database remains in a consistent state at the end of a transaction Isolation –Transactions cannot see changes made by other concurrent transactions, –Concurrent transactions leave the database in the same state as if the transactions were executed serially Durability –Database data are persistent, –Changes made by a transaction that completed successfully are stored permanently

Advanced Database Design and Implementation 2015 Trade-offs in Cloud Databases 4 Techniques to Achieve ACID Properties Relational databases: –Have a schema that provides a strict structure and integrity constraints for database data, –Schemes are normalized to avoid data redundancy and update anomalies, –Normalization is achieved by a vertical splitting of database tables, –Queries frequently ask for joins. Integrity constraints (checked by writes) provide for consistency Locking (to avoid transaction anomalies) provides for isolation (to avoid e.g. lost update) Logging – provides for durability –But changes made by each transaction have to be written on disk two times

Advanced Database Design and Implementation 2015 Trade-offs in Cloud Databases 5 ACID Properties and Cloud Databases Many cloud applications (like social networks) require databases that: –Are highly scalable and available while consistency is considered to be of some lesser importance, –Have high throughput and –Be network partition tolerant These three requirements are in a direct collision with ACID properties, particularly with integrity constraint checking, locking, logging, and performing joins These applications also consider strict data structuring not being suitable

Advanced Database Design and Implementation 2015 Trade-offs in Cloud Databases 6 NoSQL Database Systems Many cloud databases are not relational They are termed NoSQL databases (Not only SQL) Generally, NoSQL database systems: –Adopt the key-value data model for storing semi structured or unstructured data –Are either scheme less, or their schemes lack rigidity and integrity constraints of relational schemes –Use data partitioning, replication and distribution to several independent network nodes made of commodity machines for: Scalability, Availability, and Network partition tolerance –Are not normalized, and –Do not support joins and use data sharding (horizontal partitioning) to avoid need for joins and achieve a good throughput

Advanced Database Design and Implementation 2015 Trade-offs in Cloud Databases 7 CAP Theorem NoSQL database systems do not support ACID properties –Locking and logging would severely impair scalability, availability, and throughput Still, Consistency, Availability, and Network Partition Tolerance (CAP) represent their desirable properties But, according the CAP Theorem, also known as Brewer’s Conjecture, any networked, shared data system can have at most two of these three desirable properties

Advanced Database Design and Implementation 2015 Trade-offs in Cloud Databases 8 CAP Theorem (2) For a distributed database system as a service, the components of the CAP acronym stand for: –Consistency - A service is considered to be consistent if after an update operation of some writer all readers see his updates in some shared data source (all nodes containing replicas of a datum have the same value), –Availability - A service is considered to have a high availability if it allows read and write operations a high proportion of time (even in the case of a node crush or some hardware or software parts are down due to upgrades) –Network partition tolerance is the ability of a service to perform expected operations in the presence of a network partition, unless the whole network crushes A network partition occurs if two or more “islands” of network nodes cannot connect to each other Dynamic addition and removal of nodes is also considered to be a network partition.)

Advanced Database Design and Implementation 2015 Trade-offs in Cloud Databases 9 Comments on CAP Properties Contrary to traditional databases, many cloud database are not expected to satisfy any integrity constraints High availability of a service is often characterized by small latency –Amazon claims that raise of 0.1 sec in their response time will cost them 1% in sales, –Google claims that just.5 sec in latency caused traffic to drop by 20% It is hard to draw a clear border between availability and network partitions –Network partitions may induce delays in operations of a service, and hence a greater latency

Advanced Database Design and Implementation 2015 Trade-offs in Cloud Databases 10 An Informal Proof of the CAP Theorem (1) Eric Brewer presented his conjecture in 2000, Gilbert&Lynch proved it formally in 2002 Here is an informal presentation of their proof showing that a highly available and partition tolerant service may be inconsistent Let us consider two nodes N 1 and N 2 –Both nodes contain the same copy of data D having a value D 0, –A program W (write) runs on N 1 –A program R (read) runs on N 2 W D0D0 R D0D0 N1N1 N2N2

Advanced Database Design and Implementation 2015 Trade-offs in Cloud Databases 11 An Informal Proof of the CAP Theorem (2) In a regular situation: –The program W writes a new value of D, say D 1 on N 1 –Then, N 1 sends a message M to N 2 to update the copy of D to D 1, –The program R reads D and returns D 1 W D0D0 R D1D1 N1N1 N2N2 W D1D1 N1N1 W D1D1 N1N1 D1D1 R D0D0 N2N2 R D0D0 N2N2 M D1D1

Advanced Database Design and Implementation 2015 Trade-offs in Cloud Databases 12 An Informal Proof of the CAP Theorem (3) Assume now the network partitions and the message from N 1 to N 2 is not delivered –The program R reads D and returns D 0 W D0D0 R D0D0 N1N1 N2N2 W D1D1 N1N1 W D1D1 N1N1 D1D1 R D0D0 N2N2 R D0D0 N2N2 M D0D0

Advanced Database Design and Implementation 2015 Trade-offs in Cloud Databases 13 Comments on the CAP Theorem If applications in a distributed and replicated network have to be: –Highly available (i.e. working with minimal latency), and –Tolerant to network partitions (performs expected user initiated operations), then It may happen that some of N i (i = 1, 2, …) nodes have stale data The decision on which of {CA, CP, AP} property pairs to satisfy is made in early stages of a cloud application design, according to requirements and expectations defined for the application NoSQL database systems fall in the AP class –Users of web applications expect the AP behavior from the backing cloud database

Advanced Database Design and Implementation 2015 Trade-offs in Cloud Databases 14 BASE Properties of Cloud DBs(1) BASE (as defined by Eric Brewer) stands for: –BA (Basically Available) meaning that the system is always available, although there may be intervals of time when some of its components are not available A basically available system answers each request in such a way that it appears that system works correctly and is always available and responsive –S (Soft State) means that the system has not to be in a consistent state after each transaction The database state may change without user’s intervention (due to eventual consistency) –E (Eventually Consistent) guarantees that: If no new updates are made to a given data item, accesses to each replica of that item will eventually return the last updated value A system that has achieved eventual consistency is often said to have converged, or achieved replica convergence

Advanced Database Design and Implementation 2015 Trade-offs in Cloud Databases 15 BASE Properties of Cloud DBs(2) The reconciliation of differences between multiple replicas requires exchanging and comparing versions of data between nodes –Usually "last writer wins“ –The reconciliation [ is performed as a: [ read, write, or asynchronous repair Asynchronous repairs are performed in a background non transactional process Stale and approximate data in answers are tolerated

Advanced Database Design and Implementation 2015 Trade-offs in Cloud Databases 16 Consistency – Availability Trade-offs Assume: –A cloud database is divided into p partitions and each partition is replicated on n (> 2) servers (replica nodes), –The database is unavailable if any of its parts is unavailable, –Each node stores only one replica Strict consistency requires p*n servers to be available –This is hard to achieve with commodity hardware, –If only one node fails, the database is unavailable Strong consistency, where w + r > p*n and –w and r are numbers of nodes acknowledging a write and returning a read, respectively –At least 1 node in each partition can fail and the database is still available A special case: quorum consistency, requires al least replica nodes to be available in each partition Eventual consistency requires at least 1 node to be available in each partition

Advanced Database Design and Implementation 2015 Trade-offs in Cloud Databases 17 Example: Consistency vs. Availability(1) Let the number of partitions p = 10 and the replication factor n = 3 –So, the database is stored on 30 nodes Assume strong consistency Question –How many nodes can fail and still the whole database to be available? Answer: –Since n = 3, quorum q = 2, –In the worst case, all unavailable nodes belong to the same partition Then, to have the whole database available under strong consistency, only 1 node in total is allowed to be unavailable –In the best case, all unavailable nodes are evenly distributed over all partitions Then, to have the whole database available under strong consistency, one node in each partition is allowed to be unavailable, giving 10 nodes in total

Advanced Database Design and Implementation 2015 Trade-offs in Cloud Databases 18 Example: Consistency vs. Availability(2) Let the number of partitions p = 10 and the replication factor n = 3 –So, the database is stored on 30 nodes Assume eventual consistency Question –How many nodes can fail and still the whole database to be available? Answer: –In the worst case, all unavailable nodes belong to the same partition Then, to have the whole database available under eventual consistency, only 2 nodes in total are allowed to be unavailable –In the best case, all unavailable nodes are evenly distributed over all partitions Then, to have the whole database available under eventual consistency, two nodes in each partition are allowed to be unavailable, giving 20 nodes in total

Advanced Database Design and Implementation 2015 Trade-offs in Cloud Databases 19 Summary (1) NoSQL cloud databases are scalable, highly available and tolerate network partitions Cloud database system can have only two out of Consistency, Availability, and Network Partition Tolerance BASE represents a “best effort” approach to ensuring database reliability –BASE forfeits ACID properties, also by its name (base versus acid, as found in chemistry) –Basically Available, –Soft state, –Eventually consistent

Advanced Database Design and Implementation 2015 Trade-offs in Cloud Databases 20 Summary(2) Eventual Consistency is a model used in distributed systems that guarantees that all accesses to replicas of an updated item will eventually return the last updated value Consistency levels: –Strict consistency, –Strong consistency, –Eventual consistency Consistency and availability mutually restrict each other: –As greater the consistency requirement as lower the database availability