Presentation is loading. Please wait.

Presentation is loading. Please wait.

A DAPT IST-2001-37126 Middle-R: A Middleware for Dynamically Adaptive Database Replication R. Jiménez-Peris, M. Patiño-Martínez, Jesús Milán Distributed.

Similar presentations


Presentation on theme: "A DAPT IST-2001-37126 Middle-R: A Middleware for Dynamically Adaptive Database Replication R. Jiménez-Peris, M. Patiño-Martínez, Jesús Milán Distributed."— Presentation transcript:

1 A DAPT IST-2001-37126 Middle-R: A Middleware for Dynamically Adaptive Database Replication R. Jiménez-Peris, M. Patiño-Martínez, Jesús Milán Distributed Systems Laboratory Universidad Politécnica de Madrid (UPM) Lsd

2 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)2 Symmetric vs. Asymmetric Processing Transactions in a replicated system can be processed either: –Symmetrically, that means, that all replicas process the whole transaction. This approach can only scale by introducing queries in the workload. –Asymmetrically, that means, that one replica process the transaction and the other replicas just apply the resulting updates. This approach can scale depending the ratio between the cost of executing the whole transaction and the cost of just applying the updates.

3 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)3 Scalability of Symmetric Systems w = 1

4 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)4 Scalability of Asymmetric Systems Asymmetric System The transaction is fully executed at its master site. Non-master sites only apply the updates. This approach leaves some spare computing power that enables the scalability

5 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)5 Comparing the Scalability Scalability of our middleware using asymmetric processing Potential scalability of a symmetric system

6 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)6 Taxonomy of Eager Database Replication –White box. Modifying the database engine (Betinna’s PostgresR [VLDB’00,TODS’00]). It can use either symmetric or asymmetric processing. –Black box. At the middleware level without assuming anything from the database (Yair Amir [ICDCS’02]). Inherently symmetric approach. Transactions are executed sequentially by all replicas. –Gray box. At the middleware level based on the get/set updates services (our approach [ICDCS’02]). It can use symmetric processing. It can also use asymmetric processing provided two services from the database to get/set updates of a transaction. This the approach we have taken.

7 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)7 Assumptions in Middle-R Each site has the entire database (no partial replication). Read one – write all available. We work on a LAN. Virtually synchronous group communication available. The underlying database provides two basic services (i.e. similar to the Corba ones): –get state: returns a list of the physical updates performed by a transaction, –set state: applies the physical updates of a transaction at a site. Our approach exploits the application semantics; we assume that the database is partitioned in some arbitrary way and that it is known which data partitions are going to be accessed by a transaction. –This allows us to execute transactions from different partitions in parallel. Transactions spanning several partitions are also considered.

8 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)8 Replicamanager DB Replica 4 Replicamanager DB Replica 3 Replicamanager DB Replica 2 Replicamanager DB Replica 1 Protocol Overview [Disc’00] Databaselayer Middlewarelayer GetState SetStateSetStateSetState ClientTransaction Update propagation

9 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)9 Integrating the Middleware with the Application Server JBoss accesses databases through JDBC. In order to integrate the middleware with JBoss it will be necessary to develop a JDBC driver. This JDBC driver will access the middleware by multicasting requests to the middleware instances at each site.

10 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)10 Integrating the Middleware with the Application Server JBoss JDBC Driver DB JBoss JDBC Driver JBoss JDBC Driver DB Group Communication Bus Middle-R

11 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)11 Integrating the Middleware with the Application Server If JBoss is replicated, some issues should be tackled with: –Independently of the kind of replication in JBoss, duplicated requests might reach the replicated database. Active replication provokes the duplication of every request. Other kinds of replication strategies might generate duplicate requests upon fail-over (i.e., requests done by the failed primary might be resubmitted by the new primary). –The middleware imposes the requirement to identify duplicate requests identically. –The middleware, provided the above guarantee, will enforce the removal of duplicate requests.

12 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)12 Automatic DB partitioning Middle-R exploits application semantics, that is, it requires to partition the DB in some arbitrary way and know in advance which partitions each transaction is going access. In our previous work, these partitioning was performed by the programmer. –For each stored procedure accessing the DB, a function was provided that taking the parameters of the invocation determined the partitions that would be accessed by the stored procedure invocation. This is a limitation of the previous approach that has to be overcome in Adapt. –This DB partitioning should transparent to users and therefore automatically performed on a partition per table basis (at least).

13 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)13 Automatic DB Partitioning The second issue is how to know in advance which partitions a particular transaction is going to access. Our new approach will analyze on-the-fly the submitted SQL statements to determine which partitions it will access.

14 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)14 DB Interaction Model Our previous work assumed that each transaction was submitted in a single message to the middleware. –This model was suitable for working for stored procedures. However, this interaction model does not match with the one adopted by JDBC. –Under JDBC a transaction might span an arbitrary number of requests. –Under JDBC a transaction might be distributed, so the XA interface should be supported for distributed atomic commit. For this reason, we are extending the underlying replication protocol to deal with transactions spanning multiple messages.

15 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)15 Dynamic Adaptability The following dynamic adaptability properties are considered: –Online recovery. Whilst a new (or failed) replica is being recovered, the system continues its regular processing without disruption ([SRDS’02] approach that extend ideas from [DSN’01] to the middleware context). –Load balancing. The masters of the different partitions are reassigned to balance the load dynamically. –Admission control. Depending on the workload the optimal number of transactions active in the system changes. A limit of active transactions is dynamically adapted to reach the maximum throughput for each workload.

16 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)16 Dynamic Adaptability: Online Recovery [SRDS’02] Recovery is performed on a per-partition basis. Recovery is not performed during the state transfer associated to the view change to prevent the blocking of regular requests. Once a partition is recovered at a recovering replica, it can start processing requests on that partition although the other partitions are not recovered yet. Recovery is flexible to enable load balancing policies to take into account the load of recovery: –The recovery can use one or more recoverers. –Each recoverer can recover one or more partitions.

17 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)17 Dynamic Adaptability: Online Recovery Replicas might recover in a cascading fashion. The online recovery protocol deals efficiently with cascading recoveries. Basically, it prevent redundancies in the recovery process as follows: –A replica that starts recovery, whilst the recovery of another replica is underway, is not delayed till the whole recovery completes. –Neither a new recovery is started in parallel (yielding redundant recoveries). –Instead, this replica joins the recovery process with the next partition to be recovered. –In this way, cascading recovering replicas share the recovery of common partitions.

18 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)18 Dynamic Adaptability: Load Balancing The middleware approach has the advantage that every replica knows without any additional information the load of each other replica. This allows to achieve load balancing with very little overhead. One of the main difficulties of load balancing is to determine the current load of each replica. We are currently modeling the behavior of the DB to be able to determine dynamically the current load of each replica. These models will enable the middleware to determine which replicas become saturated, so its load can be redistributed. The load is redistributed by reducing the number of partitions that are mastered by an overloaded replica.

19 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)19 Dynamic Adaptability: Load Balancing during Online Recovery The load balancing will also control the online recovery to adapt it to the load conditions. When the system load is low it will increase the resources devoted to recovery to accelerate it taking advantage of the spare computing resources. When the system load increases it will dynamically decrease the resources devoted to recovery to cope with the new load.

20 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)20 Dynamic Adaptability: Admission Control The maximum throughput for a workload is reached with a given number of concurrent transactions in the system. Once this threshold is exceeded the DB begins to thrash. This threshold is different for each workload so it needs to be dynamically adapted to achieve the maximum throughput for the changing workload. The middleware has a pool of connections with the DB, and it can control the transaction admission to attain the optimal degree of concurrency. We are developing behavior models that will enable us to find dynamically the thrashing point and adapt dynamically the threshold in the admission control.

21 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)21 Wide Area Replication The underlying protocols in the middleware are amenable to be used in a WAN. We are currently studying which new requirements are needed in a WAN to find problems that might require changes in the protocols. Replication across a WAN help to survive catastrophic failures and it is also needed by many multinational companies with branches spanning different countries. –For the former scenario we contemplate a replica at each geographic location. –For the latter scenario we contemplate a cluster at each geographic location.

22 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)22 Partial Replication Scalability in the middleware, although good, is limited due to the overhead induced by propagating the updates to all the replicas (see [SRDS’01] for an analytical model determining the precise scalability of the approach). This limitation can be overcome by means of partial replication. In this way, each partition can be dynamically replicated to the optimal level. However, partial replication introduces new complications such as queries spanning multiple partitions that cannot be performed on a replica that do not hold a copy of all the accessed partitions.

23 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)23 Conclusions Extensions to our previous work and the JDBC driver will enable the use of our middleware approach to provide dynamically adaptable DB replication for JBoss. The flexibility of the middleware approach enable us to contribute on different issues regarding dynamic adaptability such as online recovery, dynamic admission control, dynamic load balancing, changing dynamically the degree of partial replication, etc.

24 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)24 Optimistic Delivery [KPAS99] Total order MC of the transaction time Latency for a transaction Execution of transaction a) Replication protocol with non-optimistic total ordered multicast Opt-delivery time Total order MC of the transaction Totally ordered- delivery Execution of transaction Latency for a transaction b) Replication protocol with optimistic total ordered multicast

25 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)25 Advantages of optimistic delivery For the optimism to create problems two things must happen at the same time: –Messages get out of order (unlikely in a LAN). –The corresponding transactions conflict. The resulting probability is very low and we can make it even lower (transaction reordering at the primary). Cost of group communication minimized.

26 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)26 Experimental set up Database: PostgreSQL. Group communication Ensemble. Network: 100 Mbit Ethernet. 15 database sites (each SUN Ultra 5, Solaris). Two kinds of transactions were used in the workload: –Queries (only reads). –Pure updates (only writes).

27 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)27 Experiments #1: using replication does not make the system worse. #2: adding more replicas increases the throughput of the system. #3: the increase in throughput does not affect the response time. #4: acceptable overhead in worst case scenarios. The dangers of replication: none of these statements is true in conventional eager replication protocols.

28 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)28 1.Comparison with Distributed Locking Distributed locking degrades very fast with an increasing number of replicas Our middleware response time is stable Load 5 tps

29 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)29 2. Throughput Scalability Scalability of 1/2 the nominal capacity Scalability of 2/3 the nominal capacity 15 tps 225 tps

30 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)30 3. Response Time Analysis

31 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)31 3. Response Time Analysis

32 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)32 3. Response Time Analysis

33 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)33 4. Coordination overhead

34 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)34 Conclusions Consistent replication can be implemented at the middleware level. Achieving efficiency requires to understand the dangers of replication: –Only one message per transaction –Asymmetric system –Reduce communication latency –Reduce abort rates Our system demonstrates different ways to address all of these problems.

35 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)35 Ongoing work We are using the middleware to implement replication in object containers (e.g. J2EE, Corba). Tests are underway to use the system to implement replication across the Internet. Porting system to Spread [Amir et al.]. Load balancing for web servers based on replicated databases. Online recovery and dynamic system reconfiguration: –DSN 2001 [Kemme, Bartoli, Babaoglu]. –SRDS 2002 [Jimenez, Patiño, Alonso].

36 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)36 Analytical vs. Empirical Measures

37 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)37 How can the middleware perform with faster databases? The 1 upd transaction took 10 ms to be executed, whilst an 8 upd transaction took 55 ms. This means that in a faster database for transactions lasting within these ranges we can obtain similar scalabilities (till some bottleneck is reached, most likely group communication). The determinant factor of scalability is the ratio of the cost of executing a full transaction and applying its updates, but this factor, although can be reduced, will be always significant (in Postgres for 8 upd transactions it was 0.16 and for 1 upd transactions it was 0.2).

38 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)38 Background Replication has been used for two different and exclusive purposes in transactional systems: –To increase availability (eager replication) by providing redundancy at the cost of throughput and scalability. –To increase throughput and scalability by distributing the work among replicas (lazy replication) at the cost of consistency. We want both availability and performance. However, Gray in “The Dangers of replication” SIGMOD’96 stated that eager replication could not scale.

39 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)39 Motivation Postgres-R [KA00] showed how to combine database replication with group communication to implement a scalable solution within a database. We extended this work [PJKA00] by exploring how to implement replication outside the database: –Protocol is provably correct. –Could be implemented as middleware. –It scales (e.g. adding more sites increases the capacity). In this talk we discuss the performance of such protocol as implemented on a cluster of computers connected through a LAN and show that it can be used in a wide range of applications.

40 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)40 Eager Data Replication There is a copy of the database at each site. Every replica can perform update transactions (update everywhere). Transaction updates must be propagated to the rest of the replicas. Queries (read only transactions) are executed at a single replica.

41 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)41 Understanding the Scalability of Data Replication Each transaction executed by a site induces a load of one transaction on each other site Assume sites with a processing capacity of 4 tps Symmetric System The capacity of the system is at most the capacity of a single site: 4 tps

42 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)42 Asymmetric Systems In an asymmetric system the work performed by a replica consists of: –Local transactions, i.e., transactions submitted to the replica. –Remote transactions, i.e., update transactions submitted to other replicas.

43 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)43 A Middleware Replication Layer Group Communication PostgreSQL QueueManager Client A Client B CommunicationManagerConnectionManager Replica Manager X PostgreSQL QueueManager Client C Client D CommunicationManagerConnectionManager Replica Manager Y

44 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)44 A Middleware Replication Layer The replication system has been implemented as a middleware layer that runs on top of off-the-shelf non-distributed databases or other data stores (e.g., an object container like Corba). This layer only requires two simple services from the underlying data repository: –get state: returns a list of the physical updates performed by a transaction, –set state: applies the physical updates of a transaction at a replica.

45 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)45 Exp. 1: Comparison with Distributed Locking In this experiment we compared our system with a commercial database using distributed locking and eager replication to guarantee full consistency of the replicas. A small load of 5 transactions per second was used for this experiment.

46 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)46 Response Time Analysis The goal of this experiment is to show that transaction latency keeps stable with loads within the scalability interval. For each configuration and update rate, the load is increased until the response time degenerates.

47 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)47 Exp. 2: Throughput Scalability This experiment tested how the throughput of the system varies for an increasing number of replicas. In particular, we wanted to know the power of the cluster relative to a single site.

48 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)48 Measuring the Overhead The latency of short transactions is extremely sensitive to any overhead. The goal of this experiment is to measure how the response time was affected by the overhead introduced by the middleware layer. In this experiment the shortest update transaction was used: a transaction with a single update.

49 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)49 Motivation and background Eager replication is the text book approach to achieve availability … Yet, very few database products provide consistent replication. The reasons were explained by Gray in “The Dangers of replication” SIGMOD’96. Postgres-R [KA00] showed how to avoid these dangers and implement eager replication within a DB. –Combines transaction processing and group communication. –Uses asymmetric processing –Showed how to embed these techniques in a real database engine.

50 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)50 Motivation and Background A subsequent approach explored scalable eager DB replication outside the DB, at the middleware level [Disc00,ICDCS’02]. Experiments showed that it was possible to achieve replication at the middleware level with a scalability close to the one achieved within the database.

51 A DAPT 2nd Adapt workshop, 13-14th Feb. 2003, Bologna (Italy)51 Two Crucial Issues Processing should be asymmetric –Otherwise it does not scale … –… but difficult to do outside the database Avoid the latency introduced by group communication (especially for large groups) –Otherwise the response time suffers … –… but we need the group communication semantics


Download ppt "A DAPT IST-2001-37126 Middle-R: A Middleware for Dynamically Adaptive Database Replication R. Jiménez-Peris, M. Patiño-Martínez, Jesús Milán Distributed."

Similar presentations


Ads by Google