Download presentation
Presentation is loading. Please wait.
Published byPeter Jefferson Modified over 9 years ago
1
Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety
2
Replication for Performance Expensive Limited scalability Expensive Limited scalability 2
3
DB Replication is Challenging Single database system –Large, persistent state –Transactions –Complex software Replication challenges –Maintain consistency –Middleware replication 3
4
Background Replica 1 Standalone DBMS 4
5
Background Replica 2 Replica 1 Replica 3 Load Balancer 5
6
Read Tx Replica 2 Replica 1 Replica 3 Load Balancer T Read tx does not change DB state 6
7
Update tx changes DB state Update tx changes DB state Update Tx 1/2 Replica 2 Replica 1 Replica 3 Load Balancer T wsws 7
8
Update tx changes DB state Update tx changes DB state Update Tx 1/2 Replica 2 Replica 1 Replica 3 Load Balancer T ws Apply (or commit) T everywhere ws ws ws Example: T1: { set x = 1 } Example: T1: { set x = 1 } 8 ws
9
Ordering Update Tx 2/2 Replica 2 Replica 1 Replica 3 Load Balancer T ws Update tx changes DB state Update tx changes DB state ws T ws ws 9
10
Ordering Update Tx 2/2 Replica 2 Replica 1 Replica 3 Load Balancer T Update tx changes DB state Update tx changes DB state T ws ws wsws ws ws ws ws Replica 3 Example: T1: { set x = 1 } T2: { set x = 7 } Example: T1: { set x = 1 } T2: { set x = 7 } Commit updates in order 10
11
Ordering Sub-linear Scalability Wall Replica 2 Replica 1 Replica 3 Load Balancer T T ws ws wsws ws ws ws ws Replica 3 11 Replica 4
12
General scaling techniques –Address fundamental bottlenecks –Synergistic, implemented in middleware –Evaluated experimentally This Talk 12
13
Super-linear Scalability TPS 12 X 25 X 37 X 1 X 7 X
14
Big Picture: Let’s Oversimplify Standalone DBMS R reading update logging U 14
15
reading update logging Big Picture: Let’s Oversimplify Replica 1/N (traditional) Standalone DBMS R reading update logging U N.R N.U R U (N-1).ws 15
16
reading update logging reading update logging Big Picture: Let’s Oversimplify Replica 1/N (traditional) Replica 1/N (optimized) Standalone DBMS 16 R reading update logging U N.R N.U R U (N-1).ws N.R N.U R* U* (N-1).ws*
17
reading update logging reading update logging Big Picture: Let’s Oversimplify Replica 1/N (traditional) Replica 1/N (optimized) Standalone DBMS 17 R reading update logging U N.R N.U R U (N-1).ws N.R N.U R* U* (N-1).ws* MALB Update Filtering Uniting O & D MALB Update Filtering Uniting O & D
18
Key Points 1.Commit updates in order –Perform serial synchronous disk writes –Unite ordering and durability 2.Load balancing –Optimize for equal load: memory contention –MALB: optimize for in-memory execution 3.Update propagation –Propagate updates everywhere –Update filtering: propagate to where needed 18
19
Tx A Roadmap Replica 2 Replica 1 Replica 3 Load Balancer Ordering Load balancing Update propagation Commit updates in order 19
20
Traditionally: –Commit ordering and durability are separated Key idea: –Unite commit ordering and durability Key Idea 20
21
All Replicas Must Agree All replicas agree on –which update tx commit –their commit order Total order –Determined by middleware –Followed by each replica durability Replica 3 Tx A Tx B durability Replica 2 durability Replica 1 21
22
Tx B durability Replica 3 Ordering Tx A Order Outside DBMS Tx A Tx B durability Replica 2 durability Replica 1 22
23
Tx B durability Replica 3 Ordering Tx A A B Order Outside DBMS Tx A Tx B durability Replica 2 A B durability Replica 1 A B 23
24
Ordering A B DBMS durability Replica 3 Proxy Tx A Tx B SQL interface Task A Task B Enforce External Commit Order 24
25
Ordering A B DBMS durability Replica 3 Proxy Tx A Tx B SQL interface Task A Task B BA Enforce External Commit Order 25
26
Ordering A B DBMS durability Replica 3 Proxy Tx A Tx B SQL interface Task A Task B BA Cannot commit A & B concurrently! Enforce External Commit Order 26
27
Ordering A B durability Replica 3 Proxy Tx A Tx B SQL interface Task A Task B A Enforce Order = Serial Commit DBMS 27
28
Ordering A B durability Replica 3 Proxy Tx A Tx B SQL interface Task A Task B AB Enforce Order = Serial Commit DBMS 28
29
Commit Serialization is Slow Durability A Proxy DBMS durability CPU Ordering A B C Commit order A B C Durability A B CPU Durability A B C CPU Commit ACommit BCommit C Ack A Ack B Ack C 29
30
Commit Serialization is Slow Durability A Proxy DBMS durability CPU Ordering A B C Commit order A B C Durability A B CPU Durability A B C CPU Commit ACommit BCommit C Ack A Ack B Ack C Problem: Durability & ordering separated → serial disk writes Problem: Durability & ordering separated → serial disk writes 30
31
Commit ACommit BCommit C Ack A Ack B Ack C Unite D. & O. in MiddlewareProxy DBMS CPU Ordering A B C Commit order A B C CPU Durability A B C CPU durability OFF durability 31
32
Commit ACommit BCommit C Ack A Ack B Ack C Unite D. & O. in MiddlewareProxy DBMS CPU Ordering A B C Commit order A B C CPU Durability A B C CPU durability OFF durability Solution: Move durability to MW Durability & ordering in middleware → group commit Solution: Move durability to MW Durability & ordering in middleware → group commit 32
33
Middleware logs tx effects –Durability of update tx Guaranteed in middleware Turn durability off at database Middleware performs durability & ordering –United → group commit → fast Database commits update tx serially –Commit = quick main memory operation Implementation: Uniting D & O in MW 33
34
Uniting Improves Throughput Metric –Throughput Workload –TPC-W Ordering (50% updates) System –Linux cluster –PostgreSQL –16 replicas –Serializable exec. TPC-W 1 X 12 X 7 X TPS
35
Tx A Roadmap Replica 2 Replica 1 Replica 3 Load Balancer Ordering Load balancing Update propagation Commit updates in order 35
36
Key Idea Replica 1 Mem Disk Replica 2 Mem Disk Load Balancer Equal load on replicas 36
37
Key Idea Replica 1 Mem Disk Replica 2 Mem Disk Load Balancer Equal load on replicas MALB: (Memory-Aware Load Balancing) Optimize for in-memory execution MALB: (Memory-Aware Load Balancing) Optimize for in-memory execution 37
38
How Does MALB Work? Database 213 Workload A → B → Mem Memory 21 23 38
39
A, B, A, B Read Data From Disk A, B, A, B Replica 1 Mem Disk 213 Replica 2 Mem Disk 213 Least Loaded 31 A → B → 21 23 39
40
A, B, A, B Read Data From Disk A, B, A, B Replica 1 Mem Disk 213 Replica 2 Mem Disk 213 Least Loaded 31 Slow A → B → 21 23 40 21331 21331
41
Data Fits in Memory Replica 1 Mem Disk 213 Replica 2 Mem Disk 213 MALB A → B → 21 23 A, A, A, A B, B, B, B A, B, A, B 41
42
Data Fits in Memory Replica 1 Mem Disk 213 21 Replica 2 Mem Disk 213 32 MALB Fast A → B → 21 23 A, A, A, A B, B, B, B Memory info? Many tx and replicas? Memory info? Many tx and replicas? A, B, A, B 42
43
Exploit tx execution plan –Which tables & indices are accessed –Their access pattern Linear scan, direct access Metadata from database –Sizes of tables and indices Estimate Tx Memory Needs 43
44
Objective –Construct tx groups that fit together in memory Bin packing –Item: tx memory needs –Bin: memory of replica –Heuristic: Best Fit Decreasing Allocate replicas to tx groups –Adjust for group loads Grouping Transactions 44
45
MALB in Action A B C D E F MALB 45
46
MALB in Action A B C D E F MALB Memory needs for A, B, C, D, E, F 46
47
Group A MALB in Action A B C D E F Group B C Group D E F MALB Memory needs for A, B, C, D, E, F 47
48
Group A MALB in Action A B C D E F Replica Group B C A Group D E F B C D E F MALB Disk Memory needs for A, B, C, D, E, F 48
49
Objective –Optimize for in-memory execution Method –Estimate tx memory needs –Construct tx groups –Allocate replicas to tx groups MALB Summary 49
50
Implementation –No change in consistency –Still middleware Compare –United: efficient baseline system –MALB: exploits working set information Same environment –Linux cluster running PostgreSQL –Workload: TPC-W Ordering (50% update txs) Experimental Evaluation 50
51
MALB Doubles Throughput TPC-W Ordering 16 replicas 51 TPS 105% 12 X 25 X 1 X 7 X
52
MALB Doubles Throughput 52 TPS Read I/O, normalized 105% 12 X 25 X 1 X 7 X
53
BigSmall Big Small Mem Size DB Size Big Gains with MALB 4% 0% 29% 48% 105% 45% 182% 75% 12%
54
BigSmall Big Small Mem Size DB Size Big Gains with MALB 4% 0% 29% 48% 105% 45% 182% 75% 12% Run from memory Run from disk
55
Tx A Roadmap Replica 2 Replica 1 Replica 3 Load Balancer Ordering Load balancing Update propagation Commit updates in order 55
56
Traditional: –Propagate updates everywhere Update Filtering: –Propagate updates to where they are needed Key Idea 56
57
Update Filtering Example Replica 1 Mem Disk 213 Replica 2 Mem Disk 213 MALB UF A → B → 21 23 A, B, A, B 57
58
Group A Update Filtering Example Replica 1 Group B Mem Disk 213 21 Replica 2 Mem Disk 213 32 MALB UF A → B → 21 23 A, B, A, B 58
59
Group A Update Filtering Example Disk Replica 1 Group B Mem 21 2 1 Replica 2 Mem Disk 213 2 MALB UF Update table 1 3 3 A → B → 21 23 A, B, A, B 59
60
Group A Update Filtering Example Disk Replica 1 Group B Mem 21 2 1 Replica 2 Mem Disk 2 1 3 2 MALB UF Update table 1 3 3 A → B → 21 23 A, B, A, B 60
61
Group A Update Filtering Example Disk Replica 1 Group B Mem 21 2 1 Replica 2 Mem Disk 23 2 MALB UF Update table 1 3 3 A → B → 21 23 A, B, A, B 61 1
62
Group A Update Filtering Example Disk Replica 1 Group B Mem 21 21 Replica 2 Mem Disk 213 2 MALB UF Update table 1 3 Update table 3 3 A → B → 21 23 A, B, A, B 62
63
Group A Update Filtering Example Disk Replica 1 Group B Mem 21 21 Replica 2 Mem Disk 213 2 MALB UF Update table 1 3 Update table 3 3 A → B → 21 23 A, B, A, B 63
64
Group A Update Filtering Example Disk Replica 1 Group B Mem 21 21 Replica 2 Mem Disk 213 2 MALB UF Update table 1 3 Update table 3 3 A → B → 21 23 A, B, A, B 64
65
Update Filtering in Action UF 65
66
Update Filtering in Action UF Update to red table 66
67
Update Filtering in Action UF Update to red table Update to green table 67
68
Update Filtering in Action UF Update to red table Update to green table 68
69
Update Filtering in Action UF Update to red table Update to green table 69
70
MALB+UF Triples Throughput 37 X TPS 12 X 25 X 1 X 7 X 49 % TPC-W Ordering 16 replicas
71
MALB+UF Triples Throughput 37 X TPS 12 X 25 X 1 X 7 X Prop. Updates 15 7 49 %
72
Filtering Opportunities 50% Ordering Mix 5% Browsing Mix Updates Ratio MALB+UF / MALB 72
73
Conclusions 1.Commit updates in order –Perform serial synchronous disk writes –Unite ordering and durability 2.Load balancing –Optimize for equal load: memory contention –MALB: optimize for in-memory execution 3.Update propagation –Propagate updates everywhere –Update filtering: propagate to where needed 73
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.