When Scalability Meets Consistency: Genuine Multiversion Update-Serializable Partial Data Replication Sebastiano Peluso, Pedro Ruivo, Paolo Romano, Francesco.

Slides:



Advertisements
Similar presentations
Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky David G. Andersen * Princeton, Intel Labs, CMU Dont Settle for Eventual : Scalable Causal Consistency.
Advertisements

Technology from seed Cloud-TM: A distributed transactional memory platform for the Cloud Paolo Romano INESC ID Lisbon, Portugal 1st Plenary EuroTM Meeting,
Time-based Transactional Memory with Scalable Time Bases Torvald Riegel, Christof Fetzer, Pascal Felber Presented By: Michael Gendelman.
Concurrency Control III. General Overview Relational model - SQL Formal & commercial query languages Functional Dependencies Normalization Physical Design.
Optimistic Methods for Concurrency Control By : H.T. Kung & John T. Robinson Presenters: Munawer Saeed.
School of Information Technologies Hyungsoo Jung (presenter) Hyuck Han* Alan Fekete Uwe Röhm Serializable Snapshot Isolation for Replicated Databases in.
Relaxed Consistency Models. Outline Lazy Release Consistency TreadMarks DSM system.
Exploiting Distributed Version Concurrency in a Transactional Memory Cluster Kaloian Manassiev, Madalin Mihailescu and Cristiana Amza University of Toronto,
Evaluating Database-Oriented Replication Schemes in Software Transacional Memory Systems Roberto Palmieri Francesco Quaglia (La Sapienza, University of.
Database Systems, 8 th Edition Concurrency Control with Time Stamping Methods Assigns global unique time stamp to each transaction Produces explicit.
1 Database Replication Using Generalized Snapshot Isolation Sameh Elnikety, EPFL Fernando Pedone, USI Willy Zwaenepoel, EPFL.
Middleware based Data Replication providing Snapshot Isolation Yi Lin Bettina Kemme Marta Patiño-Martínez Ricardo Jiménez-Peris June 15, 2005.
Transactional Memory (TM) Evan Jolley EE 6633 December 7, 2012.
Database Replication techniques: a Three Parameter Classification Authors : Database Replication techniques: a Three Parameter Classification Authors :
CS 582 / CMPE 481 Distributed Systems
Transaction Management and Concurrency Control
Sinfonia: A New Paradigm for Building Scalable Distributed Systems Marcos K. Aguilera, Arif Merchant, Mehul Shah, Alistair Veitch, Christonos Karamanolis.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
CS252/Patterson Lec /28/01 CS 213 Lecture 10: Multiprocessor 3: Directory Organization.
CS 603 Data Replication February 25, Data Replication: Why? Fault Tolerance –Hot backup –Catastrophic failure Performance –Parallelism –Decreased.
Recovery Techniques in Distributed Databases Naveen Jones December 5, 2011.
Distributed Deadlocks and Transaction Recovery.
An Introduction to Software Transactional Memory
6.4 Data And File Replication Presenter : Jing He Instructor: Dr. Yanqing Zhang.
Replication and Consistency. Reference The Dangers of Replication and a Solution, Jim Gray, Pat Helland, Patrick O'Neil, and Dennis Shasha. In Proceedings.
Roberto Palmieri – Workshop on Distributed Transactional Memory (WDTM 2012) - 22/02/2012 Lisbon Boosting STM Replication via Speculation Roberto Palmieri,
HPDCS Research Group Research Focus STM Systems Dependability of STM Performance Modelling of STM EURO-TM | 1 st Plenary.
Reliable Communication in the Presence of Failures Based on the paper by: Kenneth Birman and Thomas A. Joseph Cesar Talledo COEN 317 Fall 05.
Consistent and Efficient Database Replication based on Group Communication Bettina Kemme School of Computer Science McGill University, Montreal.
PAVANI REDDY KATHURI TRANSACTION COMMUNICATION. OUTLINE 0 P ART I : I NTRODUCTION 0 P ART II : C URRENT R ESEARCH 0 P ART III : F UTURE P OTENTIAL 0 R.
C-Store: Concurrency Control and Recovery Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Jun. 5, 2009.
Multiprossesors Systems.. What are Distributed Databases ? “ A Logically interrelated collection of shared data ( and a description of this data) physically.
1 ZYZZYVA: SPECULATIVE BYZANTINE FAULT TOLERANCE R.Kotla, L. Alvisi, M. Dahlin, A. Clement and E. Wong U. T. Austin Best Paper Award at SOSP 2007.
Preventive Replication in Database Cluster Esther Pacitti, Cedric Coulon, Patrick Valduriez, M. Tamer Özsu* LINA / INRIA – Atlas Group University of Nantes.
IM NTU Distributed Information Systems 2004 Replication Management -- 1 Replication Management Yih-Kuen Tsay Dept. of Information Management National Taiwan.
S-Paxos: Eliminating the Leader Bottleneck
XA Transactions.
Commit Algorithms Hamid Al-Hamadi CS 5204 November 17, 2009.
Optimistic Methods for Concurrency Control By: H.T. Kung and John Robinson Presented by: Frederick Ramirez.
1 Multiversion Reconciliation for Mobile Databases Shirish Hemanath Phatak & B.R.Badrinath Presented By Presented By Md. Abdur Rahman Md. Abdur Rahman.
Transactions and Concurrency Control. Concurrent Accesses to an Object Multiple threads Atomic operations Thread communication Fairness.
Database Replication in WAN Yi Lin Supervised by: Prof. Kemme April 8, 2005.
7c.1 Silberschatz, Galvin and Gagne ©2003 Operating System Concepts with Java Module 7c: Atomicity Atomic Transactions Log-based Recovery Checkpoints Concurrent.
A Multiversion Update-Serializable Protocol for Genuine Partial Data Replication Sebastiano Peluso, Pedro Ruivo, Paolo Romano, Francesco Quaglia and Luís.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 March 20, 2008 Session 9.
Distributed Computing Systems CSCI 4780/6780. Scalability ConceptExample Centralized servicesA single server for all users Centralized dataA single on-line.
1/12 Distributed Transactional Memory for Clusters and Grids EuroTM, Paris, May 20th, 2011 Michael Schöttner.
Multidatabase Transaction Management COP5711. Multidatabase Transaction Management Outline Review - Transaction Processing Multidatabase Transaction Management.
Database Isolation Levels. Reading Database Isolation Levels, lecture notes by Dr. A. Fekete, resentation/AustralianComputer.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Don’t be lazy, be consistent: Postgres-R, A new way to implement Database Replication Paper by Bettina Kemme and Gustavo Alonso, VLDB 2000 Presentation.
CS 540 Database Management Systems NoSQL & NewSQL Some slides due to Magda Balazinska 1.
CS 440 Database Management Systems
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
CMSC 611: Advanced Computer Architecture
The University of Adelaide, School of Computer Science
Consistency Models.
CS 440 Database Management Systems
Chapter 10 Transaction Management and Concurrency Control
Consistent Data Replication: Is it feasible in WANs?
Leader Election Using NewSQL Database Systems
Introduction of Week 13 Return assignment 11-1 and 3-1-5
Distributed Systems CS
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 17 Multiprocessors and Thread-Level Parallelism
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Presentation transcript:

When Scalability Meets Consistency: Genuine Multiversion Update-Serializable Partial Data Replication Sebastiano Peluso, Pedro Ruivo, Paolo Romano, Francesco Quaglia and Luís Rodrigues 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal

Talk Structure Motivation and related work The GMU protocol Experimental results 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 1

Motivation and related work 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 2

Distributed STMs STMs are being employed in new scenarios: Database caches in three-tier web apps (FénixEDU) HPC programming language (X10) In-memory cloud data grids (Coherence, Infinispan) New challenges: Scalability Fault-tolerance 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal REPLICATION 3

Full Replication All sites store the whole set of data Full replication in transactional systems is a very investigated problem: Several solutions in DBMS world: Update anywhere-anytime-anyway solutions [SIGMOD96] Deferred-update replication techniques [JDPD03, VLDB00] Lazy techniques by relaxing consistency properties [SOSP07] Specific solutions for DSTMs: Efficient coding of the read-set [PRDC09] Communication/computation overlapping [NCA10] Lease-based commits [Middleware10] 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 4

Partial Replication It is a way to increase scalability. Each site stores a partial copy of the data. Genuine partial replication schemes maximize scalability by ensuring that: Only data sites that replicate data item read or written by a transaction T, exchange messages for executing/committing T. Existing 1-Copy Serializable implementations enforce distributed validation of read-only transactions [SRDS10]: considerable overheads in typical workloads 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 5

Objectives 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal Requirements Read-only transactions never abort or block Genuine certification mechanism Objectives Partially replicated DSTM Scalability and performance as first class targets Find a sweet spot in the consistency/performance tradeoff 6

Issues with Partial Replication Extending existing local multiversion (MV) STMs is not enough Local MV STMs rely on a single global counter to track version advancement Problem: Commit of transactions should involve ALL NODES NO GENUINENESS = POOR SCALABILITY 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 7

GMU: Genuine Multiversion Update serializable replication [ICDCS12] 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 8

Key concepts In the execution/commit phase of a transaction T, ONLY nodes which store data items accessed by T are involved. It uses multiple versions for each data item It builds visible snapshots = freshest consistent snapshots taking into account: 1. causal dependencies vs. previously committed transactions at the time a transaction began, 2. previous reads executed by the same transaction Vector clocks used to establish visible snapshots 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal G M U 9

Main data structures (i) For each node N: VCLog: sequence of vector clocks of “recently” committed transactions on N PrepareVC: vector clock greater than or equal to the most recent vector clock in VCLog 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 10

Main data structures (ii) For each transaction T: VC: a vector clock that is initialized with the most recent vector clock in local VCLog, updated upon reads during execution >> to ensure that T observes the most recent serializable snapshot, at commit time >> to assign final vector clock to the transaction (and to its write-set). 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 11

Main data structures (iii) A chain of versions per data item id: previous: value: 2 VN: 8 previous: value: 1 VN: 5 previous: value: 0 VN: 2 id Versions: 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal i n-2 n-1 Transaction T commits on node i T’s Vector Clock 12

T reads id on node i: Rule 1 Informally: it avoids reading remotely “too old” versions Formally: if it is the first read of T on i wait that VCLog.mostRecVC i [i] >= T.VC[i] this ensures that causal dependencies are enforced 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 13

Rule 1 in action Node 0 Node 1 (it stores X) Node 2 (it stores Y) X (2) T 1 :R(X) (1,1,1) (1,2,2) (1,1,1) Y (2) (1,2,2) T 0 :W(X,v) T 0 :W(Y,w) (1,1,1) T 1 :R(Y) Y (2) (1,2,2) Most recent VC in VCLog T 1.VC T 0 :Commit Commit (1,2,2) T 1.VC 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 14

T reads id on node i: Rule 2 Informally: it maximizes freshness by moving T’s VC ahead in time “as much as possible” in commit log Formally: if it is the first read of T on i, select the most recent VC in i’s Commit Log s.t. VC[j] <= T.VC[j] for each node j on which T has already read T.VC=MAX{VC, T.VC} Note: this updates only the entries of T.VC of the nodes from which T had not read yet 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 15

Rule 2 in action Node 0 Node 1 (it stores X) Node 2 (it stores Y) X (21) Y (11) X (20) T 0 :R(X) (1,1,1) (1,21,21) (1,1,1) Y (21) (1,21,21) T 1 :W(X,v) T 1 :W(Y,w) X (20) (1,20,1) T 0 :R(Y)Y (11) T 0 :Commit (1,20,1) Most recent VC in VCLog T 0.VC T 1 :Commit Commit (1,20,11) T 0.VC 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal (1,1,11) Y (1) 16

T reads id on node i: Rule 3 Informally: observe the most recent consistent version of id based on T’s history (previous reads) Formally: iterate over the versions of id and return the most recent one s.t. id.version.VN <= T.VC[i] 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 17

Committing read-only transactions Read-only transactions commit locally: No additional validations No possibility of aborts … and are never blocked, as in typical multiversion schemes. 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 18

Committing update transactions Run 2PC : Upon prepare message reception (participant-side i): Acquire read & write locks Validate read-set Increase PrepareVC[i] number and send PrepareVC back If all replies are positive (coordinator-side): Build a commit vector clock Broadcast back commit message 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 19

Building the commit Vector Clock A variant of the Skeen’s algorithm is implemented [SKEEN85]. This allows to keep track causal dependencies developed by: a transaction T during its execution, the most recent committed transactions at the nodes contacted by T 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 20

Consistency criterion GMU ensures Extended Update Serializability: Update Serializability ensures: 1-Copy-Serializabilty (1CS) on the history restricted to committed update transactions 1CS on the history restricted to committed update transactions and any single read-only transaction: but it can admit non-1CS histories containing at least 2 read-only transactions Extended Update Serializability: ensures US property also to executing transactions analogous to opacity in STMs 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 21

Experimental Results 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 22

Experiments on private cluster 8 core physical nodes TPC-C - 90% read-only xacts - 10% update xacts - 4 threads per node - moderate contention (15% abort rate at 20 nodes) 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 23

Experiments on private cluster 8 core physical nodes TPC-C - 90% read-only xacts - 10% update xacts - 4 threads per node - moderate contention (15% abort rate at 20 nodes) 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 24

FutureGrid Experiments All nodes are 2-core VMs deployed in the same site TPC-C - 90% read-only xacts - 10% update xacts - 1 thread per node - low/moderate contention, also at 40 nodes 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 25

Thanks for the attention 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 26

References 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal [ICDCS12] Sebastiano Peluso, Pedro Ruivo, Paolo Romano, Francesco Quaglia, Luís Rodrigues. “When Scalability Meets Consistency: Genuine Multiversion Update-Serializable Partial Replication”. The IEEE 32 nd International Conference on Distributed Computing Systems, June, [JDPD03] Fernando Pedone, Rachid Guerraoui, André Schiper. “The Database State Machine Approach”. Journal of Distributed and Parallel Databases, vol. 14, issue 1, 71-98, July, [Middleware10] Nuno Carvalho, Paolo Romano, Luís Rodrigues. “Asynchronous lease-based replication of software transactional memory”. Proc. of the 11 th ACM/IFIP/USENIX International Conference on Middleware, , [NCA10] Roberto Palmieri, Francesco Quaglia, Paolo Romano. “AGGRO: Boosting STM Replication via Aggressively Optimistic Transaction Processing”. Proc. of the 9 th IEEE International Symposium on Networking Computing and Applications, 20-27, [PRDC09] Maria Couceiro, Paolo Romano, Nuno Carvalho, Luís Rodrigues. “D2STM: Dependable Distributed Software Trasanctional Memory”. Proc. of 15 th IEEE Pacific Rim International Symposium on Dependable Computing, , [SIGMOD96] Jim Gray, Pat Helland, Patrick O’Neil, Dennis Shasha. “The dangers of replication and solutions”. Proc. of the 1996 ACM SIGMOD international conference on Management of data, vol. 25, issue 2, , June, [SKEEN85] D. Skeen. “Unpublished communication”, Referenced in K. Birman, T. Joseph “Reliable Communication in the Presence of Failures”, ACM Trans. on Computer Systems, 47-76, 1987 [SOSP07] G. DeCandia et al. “Dynamo: Amazon’s Highly Available key-value Store”. Proc. of the 21 st ACM SIGOPS Symposium on Operating Systems Principles, 2007 [SRDS10] Nicolas Schiper, Pierre Sutra, Fernando Pedone. “P-Store: Genuine Partial Replication in Wide Area Networks”. Proc. of the 29 th Symposium of Reliable Distributed Systems, [VLDB00] Bettina Kemme, Gustavo Alonso. “Don’t Be Lazy, Be Consistent: Postgres-R, A New Way to Implement Database Replication”. Proc. of the 26 th International Conference on Very Large Data Bases, ,