Database Replication Policies for Dynamic Content Applications Gokul Soundararajan, Cristiana Amza, Ashvin Goel University of Toronto EuroSys 2006: Leuven, Belgium April 19, 2006
2 Dynamic Content Web Server
3
4 Today’s Server Farms Data centers can run multiple applications E.g., IBM/HP Service providers can multiplex resources E.g., applications have peaks at different times Challenge: database server becomes the bottleneck
5 Motivation Scale the database backend on clusters Handle more clients Run multiple applications Handle failures in the backend Our approach: Database replication Dynamic replica allocation Adapt to changing load or failures
6 Database Replication Read-one, write-all Plattner & Alonso, MW 04 Lin et. al, SIGMOD 05 Amza et. al, ICDE 05 Scaling for E-Commerce (TPC-W)
7 Dynamic Replication Assume a cluster hosts 2 applications App1 (Red) using 2 machines App2 (Blue) using 2 machines Assume App1 has a load spike
8 Dynamic Replication Choose nr. of replicas to allocate to App1 Say, we adapt by allocating one more replica Then, two options App2 still uses two replicas (overlap replica sets) App2 loses one replica (disjoint replica sets)
9 Dynamic Replication Choose nr. of replicas to allocate to App1 Say, we adapt by allocating one more replica Then, two options App2 still uses two replicas (overlap replica sets) App2 loses one replica (disjoint replica sets)
10 Dynamic Replication Choose nr. of replicas to allocate to App1 Say, we adapt by allocating one more replica Then, two options App2 still uses two replicas (overlap replica sets) App2 loses one replica (disjoint replica sets)
11 Challenges Adding a replica can take time Bring replica up-to-date Warm-up memory Can avoid adaptation with fully-overlapped replica sets
12 Challenges However, overlapping applications compete for memory causing interference Can avoid interference with disjoint replica sets
13 Challenges However, overlapping applications compete for memory causing interference Can avoid interference with disjoint replica sets Tradeoff between adaptation delay and interference
14 Insight for Dynamic Content Apps Database reads are much heavier than writes Reads are multi-table joins Writes are single row updates Overlapping reads – high interference Overlapping writes – little interference
15 Insight for Dynamic Content Apps Database reads are much heavier than writes Reads are multi-table joins Writes are single row updates Overlapping reads – high interference Overlapping writes – little interference Solution: Separate reads and overlap writes
16 Our Solution – Partial Overlap Reads of applications sent to disjoint replica sets Avoids interference Read-Set Set of replicas where reads are sent
17 Our Solution – Partial Overlap Writes of apps sent to overlapping replica sets Reduces replica addition time Write-Set Set of replicas where writes are sent
18 Optimization For a given application, Replicas in Write-Set – Fully Up-to-Date Other Replicas – Periodic Batch Updates
19 When do we adapt? Add when application’s requirements not met Due to either load spikes or failures Remove when replica not needed Application requirements defined through a Service Level Agreement (SLA)
20 Resource Manager Feedback Loop Global Resource Manager Monitor Analyze Request Add/Remove Execute
21 Resource Manager Feedback Loop Global Resource Manager Monitor Analyze Request Add/Remove Execute When does the feedback loop end?
22 Possible Oscillations Change not seen immediately Replica addition takes time Bring replica fully up-to- date, warm-up memory May trigger more adds Oscillations cause interference between applications Global Resource Manager Monitor Analyze Request Add/Remove Execute
23 Avoiding Oscillations Delay-Awareness Use load-balance as heuristic for stabilization after replica addition Removes are conservative Tentative removes Global Resource Manager Monitor Analyze Request Add/Remove Execute
24 Cluster Architecture
25 Experimental Setup Hardware AMD Athlon running at 2.1 Ghz 512 MB of RAM 60 GB Hard Drive Software RedHat Fedora Core 2 Linux Apache with PHP 4.0 MySQL with InnoDB tables Benchmarks TPC-W: E-Commerce Retail Store RUBIS: Online Bidding
26 Outline of Results Defined SLA in terms of query latency bound Query latency < 600 ms Cluster Size Up to 8 database replicas 10 web/application servers Experiments Interference between Workloads Adapting to Load Changes Adapting to Faults
27 Disjoint
28 Partial Overlap
29 Full Overlap
30 Interference
31 Adaptation to Load Changes
32 Adapting to Load Changes Three schemes Disjoint – 4/4 Dynamic allocation using Partial overlap Full Overlap – 8/8
33 Disjoint TPC-WRUBIS
34 Full Overlap TPC-WRUBIS
35 Partial Overlap TPC-WRUBIS
36 Adaptation to Faults
37 Adaptation to Faults
38 More Results - In the Paper More complex load scenarios Including overload Effect of delay-awareness Avoiding oscillations
39 Conclusion Database replication Handle more clients Dynamic replica allocation Handle multiple workloads with different peaks Handle faults
40 Thanks!