Download presentation
Presentation is loading. Please wait.
Published byMercy Patrick Modified over 9 years ago
1
Preventive Replication in Database Cluster Esther Pacitti, Cedric Coulon, Patrick Valduriez, M. Tamer Özsu* LINA / INRIA – Atlas Group University of Nantes - France * University of Waterloo - Canada July 2005
2
2 LINA / INRIA – Atlas Group Outline Motivations Cluster Architecture Preventive Replication Multi-Master Partially Replicated configurations Replication Manager Architecture Optimizations RepDB* Prototype Experiments Conclusions, Current and Future Work
3
3 LINA / INRIA – Atlas Group Motivations Applications and Data are asynchronously replicated among a set of cluster nodes connected by a fast and reliable network to improve users requests response times Use of lazy preventive replication to enforce data consistency Cluster of n PC nodes External Users Requests
4
4 LINA / INRIA – Atlas Group Cluster system architecture Cluster Architecture
5
5 LINA / INRIA – Atlas Group Preventive Replication (1) Properties: Strong consistency Non-blocking Scale and Speeds Up Highly High Data availability
6
6 LINA / INRIA – Atlas Group Preventive Replication (2) Assumptions: Network interface provides FIFO reliable multicast Max is the upper bound of time needed to multicast a message from a node i and to be received at a receiving node j Clocks are -synchronized Each transaction has a timestamp C value (arrival time)
7
7 LINA / INRIA – Atlas Group Preventive Replication (3) Consistency Criteria Total Order Enforcement: Transactions are received in the same order at all involved nodes: correspond to the execution order To enforce total order, transactions are chronologically ordered at each node using its delivery_time value: delivery_time = C + Max + ε T is received at node i node i Wait until delivery_time T node j
8
8 LINA / INRIA – Atlas Group Preventive Replication (4) Whenever a node i receives T Propagation: It multi-cast T to all nodes including itself Scheduling: At each node T’s delivery-time expires if and only if it is the older transaction Execution: When T’s delivery-time expires then T is entirely executed
9
9 LINA / INRIA – Atlas Group Partial Architecture
10
10 LINA / INRIA – Atlas Group R S r', s' r'', s'' R 1, S 1 R 2, S 2 R 3, S 3 R 4, S 4 Bowtie Fully replicated Partially replicated R 1, S 1 S2S2 R2R2 Partially replicated R 1, S R 2, s' R3R3 s'' Preventive Replication (4) PRIMARY copies (R): Can be updated only on master node Secondary copies (r): read-only MULTIMASTER copies (R 1 ): Can be updated on more than one node
11
11 LINA / INRIA – Atlas Group Preventive Replication (5) Introduces Max + ε delay time Negligible in Cluster Networks Critical in bursty workloads Data placement restrictions Lazy-Master, Fully replicated In Fully-Replicated Overhead of message exchanges Not all nodes may have enough place to stores all replicas => Free data placement
12
12 LINA / INRIA – Atlas Group In the case where all data are not fully replicated, some transactions cannot be executed on target nodes Example: UPDATE r SET c1 WHERE c2 IN (SELECT c3 FROM s); N2 T1(R, S) R 1, S 1 S2S2 R2R2 N3 N1 Partially Replicated Configurations (1)
13
13 LINA / INRIA – Atlas Group On target nodes, T 1 waits after its selection (Step 3) At the end of the execution on the origin node, a Refresh Transaction (RT 1 ) is multicast to target nodes (Step 4) RT 1 is executed to update replicated data R 1, S 1 R2R2 S2S2 N1 N2 N3 Client T 1 (r S, w R ) R 1, S 1 R2R2 S2S2 N1 N2 N3 R 1, S 1 R2R2 S2S2 N1 N2N3 Client Answer T 1 R 1, S 1 R2R2 S2S2 N1 N2 N3 Step 1 R 1, S 1 R2R2 S2S2 N1 N2 N3 Step 2 Step 3Step 4Step 5 T 1 (r S, w R ) Standby RT 1 (w R ) Perform Partially Replicated Configurations (2)
14
14 LINA / INRIA – Atlas Group Data Placement Tables must have a Primary Key A node i can not hold primary copies which has Foreign keys of others tables which are not held by node i ITEM, ORDER (On N 3, a order can be done on an item which doesn’t exist) N3N3 N1N1 order N2N2
15
15 LINA / INRIA – Atlas Group Replication Manager Architecture
16
16 LINA / INRIA – Atlas Group Optimization: Eliminating delay times (1) In a cluster network, messages are naturally totally ordered Schedule a transaction in parallel with its execution Submitting a transaction to execution as soon as it is received Schedule the commit order of the transactions: A transaction can be committed only after Max + ε Abort and re execute all younger transactions when a transaction is received out of order Concurrent execution of non conflicting transactions
17
17 LINA / INRIA – Atlas Group Optimization: Eliminating delay times (2) Scheduling Execution T Validation SchedulingValidationExecution T Abort Preventive replication: Optimized Preventive Replication:
18
18 LINA / INRIA – Atlas Group Optimisation Example (3)
19
19 LINA / INRIA – Atlas Group Optimization: Eliminating delay times (4) Without the optimization, the refreshment time of a transaction T is always delayed by: Max + ε + t With the optimization, the refreshment time of a transaction T is : Maximum((Max + ε), t), where t is the time spent to execute T
20
20 LINA / INRIA – Atlas Group RepDB* Prototype: Architecture DBMS Clients Replica Interface JDBC server Log Monitor DBMS specific Propagator Receiver Refresher Deliver Network JDBC RepDB*
21
21 LINA / INRIA – Atlas Group RepDB* Prototype: Implementation Java (around 10000 lines) DBMS is a black-box Interface JDBC (RMI-JDBC) Use of Spread toolkit to manage the network (Center for Networking and Distributed Systems - CNDS) Simulation version (SimJava) http://www.sciences.univnantes.fr/ATLAS/RepDB
22
22 LINA / INRIA – Atlas Group Replicas definition (1) A file contains the replica placement specification: R S T R S
23
23 LINA / INRIA – Atlas Group Interface: Applications / RepDB* (2) Connection c; Statement s; Class.forName(“org.atlas.repdb.jdbc.Driver”); c = DriverManager.getConnection( ” jdbc:repdb://node0:4444/”, ”login”, ”password”); s = c.createStatement(); s.executeUpdate( “ R, S T “ + “UPDATE R SET att2 = 1 WHERE att1 IN “ + “(SELECT att3 FROM T); “+ “UPDATE S SET att2 = 1 WHERE att1 NOT IN “ + “(SELECT att3 FROM T);” ); s.close(); c.close();
24
24 LINA / INRIA – Atlas Group Experiments (1): TPC-C benchmark 1 / 5 / 10 Warehouses 10 clients per Warehouse Transactions’ arrival rate is 1s / 200ms / 100ms 4 types of transactions: New-order: Read-Write, high frequency (45%) Payment: Read-Write, high frequency (45%) Order-status: Read, low frequency (5%) Stock-level: Read, low frequency (5%)
25
25 LINA / INRIA – Atlas Group Experiments (2) Cluster of 64 nodes PostgreSQL 7.3.2 1 Gb/s network 2 Configurations Fully Replicated (FR) Partially Replicated (PR): each type of TPC- C transaction runs using ¼ of the nodes.
26
26 LINA / INRIA – Atlas Group Experiments (3): Scale up a) Fully Replicated (FR)b) Partially Replicated (PR)
27
27 LINA / INRIA – Atlas Group Experiments (4): Speed up + Launch 128 clients that submit Order-status transactions (read-only) a) Fully Replicated (FR)b) Partially Replicated (PR)
28
28 LINA / INRIA – Atlas Group Experiments (5): Unordored messages a) Fully Replicated (FR)b) Partially Replicated (PR)
29
29 LINA / INRIA – Atlas Group Experiments (6): Delay x Trans. size
30
30 LINA / INRIA – Atlas Group Conclusions Preventive replication Strong consistency Prevents conflicts for partially replicated databases Full node autonomy Scale and Seeps up Experiments show the configuration and the placement of the copies should be tuned to selected types of transactions
31
31 LINA / INRIA – Atlas Group Current and Future Work Preventive Replication for P2P systems Small and Dynamic multi-master groups Max is computed dynamically Small and dynamic slave groups Optimistic Replication Distributed Semantic Reconcialiation
32
32 LINA / INRIA – Atlas Group Thanks ! Merci ! Obrigado ! Questions ?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.