Consistency Rationing in the Cloud: Pay only when it matters Tim Kraska, Martin Hentschel, Gustavo Alonso, Donald Kossmann 27.09.2009 Systems Modified.

Slides:



Advertisements
Similar presentations
Dissemination-based Data Delivery Using Broadcast Disks.
Advertisements

Disk Arrays COEN 180. Large Storage Systems Collection of disks to store large amount of data. Performance advantage: Each drive can satisfy only so many.
Scheduling in Web Server Clusters CS 260 LECTURE 3 From: IBM Technical Report.
Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.
Concurrency Control WXES 2103 Database. Content Concurrency Problems Concurrency Control Concurrency Control Approaches.
Principles of Transaction Management. Outline Transaction concepts & protocols Performance impact of concurrency control Performance tuning.
1 Storage-Aware Caching: Revisiting Caching for Heterogeneous Systems Brian Forney Andrea Arpaci-Dusseau Remzi Arpaci-Dusseau Wisconsin Network Disks University.
Cloud Control with Distributed Rate Limiting Raghaven et all Presented by: Brian Card CS Fall Kinicki 1.
G. Alonso, D. Kossmann Systems Group
December 4, 2000© 2000 Fernando L. Alvarado1 Reliability concepts and market power Fernando L. Alvarado Professor, The University of Wisconsin Invited.
Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
Copyright © 2010 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.
McGraw-Hill Ryerson Copyright © 2011 McGraw-Hill Ryerson Limited. Adapted by Peter Au, George Brown College.
1 Routing and Scheduling in Web Server Clusters. 2 Reference The State of the Art in Locally Distributed Web-server Systems Valeria Cardellini, Emiliano.
SATISFIABILITY OF ELASTIC DEMAND IN THE SMART GRID Jean-Yves Le Boudec, Dan-Cristian Tomozei EPFL May 26,
Database Replication techniques: a Three Parameter Classification Authors : Database Replication techniques: a Three Parameter Classification Authors :
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Distributed Systems 2006 Styles of Client/Server Computing.
CS603 Data Replication: Advanced March 1, Data Replication: What haven’t we Covered? Transparent replication possible –Maintain serializability,
© 2001 Stanford Distinguishing P, S, D state n Persistent: loss inevitably affects application correctness, cannot easily be regenerated l Example: billing.
Discrete Event Simulation How to generate RV according to a specified distribution? geometric Poisson etc. Example of a DEVS: repair problem.
Chapter 15 Transaction Management. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Transaction basics Concurrency.
Transaction Management and Concurrency Control
Signature Based Concurrency Control Thomas Schwarz, S.J. JoAnne Holliday Santa Clara University Santa Clara, CA 95053
Overview Distributed vs. decentralized Why distributed databases
Ant Colonies As Logistic Processes Optimizers
Real-Time Distributed Databases By: Chris Scardino CSC536 Monday, May 2, 2005.
Integration of Applications MIS3502: Application Integration and Evaluation Paul Weinberg Adapted from material by Arnold Kurtz, David.
Improving the Accuracy of Continuous Aggregates & Mining Queries Under Load Shedding Yan-Nei Law* and Carlo Zaniolo Computer Science Dept. UCLA * Bioinformatics.
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
1 Recap Database: –collection of data central to some enterprise that is managed by a Database Management System –reflection of the current state of the.
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Transaction Management and Concurrency Control.
Bootstrapping applied to t-tests
Locking Key Ranges with Unbundled Transaction Services 1 David Lomet Microsoft Research Mohamed Mokbel University of Minnesota.
Computer System Architectures Computer System Software
IMS 4212: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida Distributed Databases Business needs.
Where in the world is my data? Sudarshan Kadambi Yahoo! Research VLDB 2011 Joint work with Jianjun Chen, Brian Cooper, Adam Silberstein, David Lomax, Erwin.
PicsouGrid Viet-Dung DOAN. Agenda Motivation PicsouGrid’s architecture –Pricing scenarios PicsouGrid’s properties –Load balancing –Fault tolerance Perspectives.
Loading a Cache with Query Results Laura Haas, IBM Almaden Donald Kossmann, Univ. Passau Ioana Ursu, IBM Almaden.
Computer Measurement Group, India Optimal Design Principles for better Performance of Next generation Systems Balachandar Gurusamy,
SAMANVITHA RAMAYANAM 18 TH FEBRUARY 2010 CPE 691 LAYERED APPLICATION.
Software Performance Testing Based on Workload Characterization Elaine Weyuker Alberto Avritzer Joe Kondek Danielle Liu AT&T Labs.
Multiple Aggregations Over Data Streams Rui ZhangNational Univ. of Singapore Nick KoudasUniv. of Toronto Beng Chin OoiNational Univ. of Singapore Divesh.
1 Topic 4 - Continuous distributions Basics of continuous distributions Uniform distribution Normal distribution Gamma distribution.
CS 5204 (FALL 2005)1 Leases: An Efficient Fault Tolerant Mechanism for Distributed File Cache Consistency Gray and Cheriton By Farid Merchant Date: 9/21/05.
Overview – Chapter 11 SQL 710 Overview of Replication
Replicated Databases. Reading Textbook: Ch.13 Textbook: Ch.13 FarkasCSCE Spring
Chapter 6 Distributed File Systems Summary Bernard Chen 2007 CSc 8230.
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
A Method for Transparent Admission Control and Request Scheduling in E-Commerce Web Sites S. Elnikety, E. Nahum, J. Tracey and W. Zwaenpoel Presented By.
1 Admission Control and Request Scheduling in E-Commerce Web Sites Sameh Elnikety, EPFL Erich Nahum, IBM Watson John Tracey, IBM Watson Willy Zwaenepoel,
CAP Theorem Justin DeBrabant CIS Advanced Systems - Fall 2013.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
1 Lecture 4: Transaction Serialization and Concurrency Control Advanced Databases CG096 Nick Rossiter [Emma-Jane Phillips-Tait]
Author Utility-Based Scheduling for Bulk Data Transfers between Distributed Computing Facilities Xin Wang, Wei Tang, Raj Kettimuthu,
BASE Dan Pritchett, Ebay ACM Queue, May/June 2008.
Chapter 13 Managing Transactions and Concurrency Database Principles: Fundamentals of Design, Implementation, and Management Tenth Edition.
Chapter5 Statistical and probabilistic concepts, Implementation to Insurance Subjects of the Unit 1.Counting 2.Probability concepts 3.Random Variables.
What is Database Administration ?
OPERATING SYSTEMS CS 3502 Fall 2017
The Impact of Replacement Granularity on Video Caching
Transaction Management and Concurrency Control
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Admission Control and Request Scheduling in E-Commerce Web Sites
Concurrency Control WXES 2103 Database.
Feifei Li, Ching Chang, George Kollios, Azer Bestavros
Introduction of Week 13 Return assignment 11-1 and 3-1-5
Distributed Database Management Systems
Statistical Thinking and Applications
Presentation transcript:

Consistency Rationing in the Cloud: Pay only when it matters Tim Kraska, Martin Hentschel, Gustavo Alonso, Donald Kossmann Systems Modified by Raja Amadala Sandeep N

2

You own a Kiosk o Items are not as valuable o Strong protection is not required o It is easier to handle o Less expensive o It scales better Of course, more can go wrong! 3

Key Observation Not all data needs to be treated at the same level consistency Eg :Web shop : Credit card information User Preferences – (who bought this item also …) Why : Why we are looking at data divisions ? Consistency In cloud storage consistency not only determines correctness but also the actual cost per transaction. 4

Relation between cost, consistency and transaction 5 o We are using cloud database services o For that we have to pay o In terms of transaction basis o How we are going to measure the transaction cost. o Number of service calls o For higher consistency – more number of service calls needed. o For lower consistency – less number of service calls enough.

Consistency Rationing - Idea o Strong consistency is expensive o ACID prevents scaling & availability o But not everything is worth gold! Transaction Cost vs. Inconsistency Cost Use ABC-analysis to categorize the data Apply different consistency strategies per category 6

Outline  Consistency Rationing  Adaptive Guarantees  Implementation & Experiments  Conclusion and Future Work 7

Outline  Consistency Rationing  Adaptive Guarantees  Implementation & Experiments  Conclusion and Future Work 8

Consistency Rationing (Classification) 9

10 Consistency Rationing (Category A) o Serializability o Always stays consistent o In cloud – expensive transaction cost Performance issues o Complex protocols needed o Require more interaction with additional services (e.g., lock services, queuing services) o Higher cost o Lower performance(response time) o 2PL protocol, serializability.

11 Consistency Rationing (Category C) o Session consistency o It is the minimum consistency level o Below this – application can’t see it’s own updates o Connects system in terms of Sessions o System guarantees read-your-own-writes monotonicity. o Session of different clients will not always see each other’s updates. o Only uses Eventual consistency o Conflict resolution in terms of type of updates o Non –commutative updates (overrides) o Commutative updates (numerical operations e.g., add) o It is very cheap o Permits extensive caching o Should always place data in C- category

12 Consistency Rationing (Category B)  Adaptive consistency  Switches between Serializability and Session consistency

Consistency Rationing – Transactions on mixes  Consistency guarantees per category instead of transaction level  Different categories can mix in a single operation/transaction  For joins, unions, etc, the  lowest category wins  Issues: transactions may see different consistency levels 13

Outline  Consistency Rationing  Adaptive Guarantees  Implementation & Experiments  Conclusion and Future Work 14

Adaptive Guarantees for B-Data  B-data: Inconsistency has a cost, but it might be tolerable  Often the bottleneck in the system  Here, we can make big improvements  Let B-data automatically switch between A and C guarantees  Use policy to optimize: Transaction Cost vs. Inconsistency Cost 15

B-Data Consistency Classes 16

General Policy - Idea Apply strong consistency protocols only if the likelihood of a conflict is high 1. Gather temporal statistics at runtime 2. Derive the likelihood of an conflict by means of a simple stochastic model 3. Use strong consistency if the likelihood of a conflict is higher than a certain threshold Consistency becomes a probabilistic guarantee 17

General Policy - Model  n servers  Servers cache data with cache interval - CI  Load equally distributed  Two updates considered as a conflict  Conflicts for A and B data can be detected and resolved after every CI  Every server makes local decisions (no synchronization) 18

General Policy - Model Likelihood of a conflict inside one CI X is a stochastic variable corresponding to the number of updates to the same record within the cache interval. P(X > 1) is the probability of more than one update over the same record in one cache interval. the probability that the concurrent updates happen on the same server 19

20 General Policy - Model Simplification of this equation using Poisson process λ - arrival rate -

General Policy – Temporal Statistics On every server: Collect update rate for a window ω with sliding factor per item  Window size ω is a smoothing factor  Sliding factor and window size influence adaptability and space Requirement  Sliding factor specifies the granularity which the window moves 21

22 General Policy – Temporal Statistics All complete intervals of a window build a histogram of the updates. Larger window size – better the statistics Issues also present – like hot spot data We can calculate arrival rate λ (useful at Model) X – is arithmetic mean of the number of the updates to a record n – number of servers δ - Sliding factor Issues : local statistics can easily mislead the statistics So local statistics can combined into a centralized view. Broadcast statistics from time to time to all other servers.

General Policy – Setting the Threshold  Use strong consistency if the savings are bigger than the penalty cost  Cost of A transaction  Cost of C transaction  Cost of inconsistency (O) 23

24 Numeric Data – Value Constraint Most of the conflicting updates around numerical values. Eg., : Stock of items in a store. Available tickets in a reservation system Often characterized by integrity constraint defined limit Optimize the general policy by considering the actual update values to decide on which consistency level enforce

25 Numeric Data – Value Constraint Fixed threshold policy value is below a certain threshold, the record is handled under strong consistency Current value – amount <= Threshold Issues : Inconsistency can occur if the sum of all updates on different servers let the value under watch drop below the limit. Finding the optimal Threshold - experimentally determine over time. Static threshold.

26 Numeric Data – Value Constraint Demarcation policy The idea of the protocol is to assign a certain amount of the value (e.g., the stock) to every server with overall amount being distributed across the servers. Servers may request additional shares from other servers n – number of the servers v – value T - Threshold Issues : Large number of servers n, so this policy will treat B data as A data. All most all transactions require locking

27 Value Constraint – Dynamic Policy Apply strong consistency protocols only if the likelihood of violating a value constraint becomes high  Commutative updates  Without loss of generality the limit is assumed to be “0”  Assumptions as for the general policy  Statistics: Value change over time

28 Numeric Data – Value Constraint Dynamic policy: It implements probablistic consistency guarantees by adjusting threshold. Y is a stochastic variable corresponding to the sum of update values within the cache interval CI.

29 Dynamic policy: Temporal statistics: probability density function (PDF) f for the cache interval.

30 Dynamic policy: Using central limit theorem- approximating the CDF

Outline  Consistency Rationing  Adaptive Guarantees  Implementation & Experiments  Conclusion and Future Work 31

Implementation  Architecture of “Building a DB on S3” [Sigmod08]  Extended protocols  Additional services   Locking Service  Reliable Queues  Counter Service  Every call is priced  Approach is not restricted to this architecture  PNUTS  Distributed DB  Traditional DB 32

33 Implementation

34 Session consistency Serializability Meta data Implementation Alternative using PNUT Implementation

Experiments Modified TPC-W  No refilling of the stock  Different demand distributions  Stock is uniformly distributed between  Order mix (up to 6 items per basket (80-20 rule)  10 app servers, 1000 products  but up to 12,000 products are sold in 300sec  Rationed consistency Results represent just one possible scenario!!! 35

Overall Cost (including the penalty cost) per TRX [$/1000] 36

Overall Cost per TRX [$/1000], vary penalty costs 37

Response Time [ms] 38

39 Cost per TRX [$/1000]

40 Response Time [ms]

41

42

43

Conclusion and Future Work  Rationing the consistency can provide big performance and cost benefits  With consistency rationing we introduced the notion of probabilistic consistency guarantees  Self-optimizing system – just the penalty cost is required  Future work  Applying it to other architectures  Other optimization parameters (e.g. energy)  Better and faster statistics 44