Presentation is loading. Please wait.

Presentation is loading. Please wait.

Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Similar presentations


Presentation on theme: "Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety."— Presentation transcript:

1 Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety

2 Replication for Performance Expensive Limited scalability Expensive Limited scalability 2

3 DB Replication is Challenging Single database system –Large, persistent state –Transactions –Complex software Replication challenges –Maintain consistency –Middleware replication 3

4 Background Replica 1 Standalone DBMS 4

5 Background Replica 2 Replica 1 Replica 3 Load Balancer 5

6 Read Tx Replica 2 Replica 1 Replica 3 Load Balancer T Read tx does not change DB state 6

7 Update tx changes DB state Update tx changes DB state Update Tx 1/2 Replica 2 Replica 1 Replica 3 Load Balancer T wsws 7

8 Update tx changes DB state Update tx changes DB state Update Tx 1/2 Replica 2 Replica 1 Replica 3 Load Balancer T ws Apply (or commit) T everywhere ws ws ws Example: T1: { set x = 1 } Example: T1: { set x = 1 } 8 ws

9 Ordering Update Tx 2/2 Replica 2 Replica 1 Replica 3 Load Balancer T ws Update tx changes DB state Update tx changes DB state ws T ws ws 9

10 Ordering Update Tx 2/2 Replica 2 Replica 1 Replica 3 Load Balancer T Update tx changes DB state Update tx changes DB state T ws ws wsws ws ws ws ws Replica 3 Example: T1: { set x = 1 } T2: { set x = 7 } Example: T1: { set x = 1 } T2: { set x = 7 } Commit updates in order 10

11 Ordering Sub-linear Scalability Wall Replica 2 Replica 1 Replica 3 Load Balancer T T ws ws wsws ws ws ws ws Replica 3 11 Replica 4

12 General scaling techniques –Address fundamental bottlenecks –Synergistic, implemented in middleware –Evaluated experimentally This Talk 12

13 Super-linear Scalability TPS 12 X 25 X 37 X 1 X 7 X

14 Big Picture: Let’s Oversimplify Standalone DBMS R reading update logging U 14

15 reading update logging Big Picture: Let’s Oversimplify Replica 1/N (traditional) Standalone DBMS R reading update logging U N.R N.U R U (N-1).ws 15

16 reading update logging reading update logging Big Picture: Let’s Oversimplify Replica 1/N (traditional) Replica 1/N (optimized) Standalone DBMS 16 R reading update logging U N.R N.U R U (N-1).ws N.R N.U R* U* (N-1).ws*

17 reading update logging reading update logging Big Picture: Let’s Oversimplify Replica 1/N (traditional) Replica 1/N (optimized) Standalone DBMS 17 R reading update logging U N.R N.U R U (N-1).ws N.R N.U R* U* (N-1).ws* MALB Update Filtering Uniting O & D MALB Update Filtering Uniting O & D

18 Key Points 1.Commit updates in order –Perform serial synchronous disk writes –Unite ordering and durability 2.Load balancing –Optimize for equal load: memory contention –MALB: optimize for in-memory execution 3.Update propagation –Propagate updates everywhere –Update filtering: propagate to where needed 18

19 Tx A Roadmap Replica 2 Replica 1 Replica 3 Load Balancer Ordering Load balancing Update propagation Commit updates in order 19

20 Traditionally: –Commit ordering and durability are separated Key idea: –Unite commit ordering and durability Key Idea 20

21 All Replicas Must Agree All replicas agree on –which update tx commit –their commit order Total order –Determined by middleware –Followed by each replica durability Replica 3 Tx A Tx B durability Replica 2 durability Replica 1 21

22 Tx B durability Replica 3 Ordering Tx A Order Outside DBMS Tx A Tx B durability Replica 2 durability Replica 1 22

23 Tx B durability Replica 3 Ordering Tx A A  B Order Outside DBMS Tx A Tx B durability Replica 2 A  B durability Replica 1 A  B 23

24 Ordering A  B DBMS durability Replica 3 Proxy Tx A Tx B SQL interface Task A Task B Enforce External Commit Order 24

25 Ordering A  B DBMS durability Replica 3 Proxy Tx A Tx B SQL interface Task A Task B BA  Enforce External Commit Order 25

26 Ordering A  B DBMS durability Replica 3 Proxy Tx A Tx B SQL interface Task A Task B BA  Cannot commit A & B concurrently! Enforce External Commit Order 26

27 Ordering A  B durability Replica 3 Proxy Tx A Tx B SQL interface Task A Task B A Enforce Order = Serial Commit DBMS 27

28 Ordering A  B durability Replica 3 Proxy Tx A Tx B SQL interface Task A Task B AB  Enforce Order = Serial Commit DBMS 28

29 Commit Serialization is Slow Durability A Proxy DBMS durability CPU Ordering A  B  C Commit order A  B  C Durability A  B CPU Durability A  B  C CPU Commit ACommit BCommit C Ack A Ack B Ack C 29

30 Commit Serialization is Slow Durability A Proxy DBMS durability CPU Ordering A  B  C Commit order A  B  C Durability A  B CPU Durability A  B  C CPU Commit ACommit BCommit C Ack A Ack B Ack C Problem: Durability & ordering separated → serial disk writes Problem: Durability & ordering separated → serial disk writes 30

31 Commit ACommit BCommit C Ack A Ack B Ack C Unite D. & O. in MiddlewareProxy DBMS CPU Ordering A  B  C Commit order A  B  C CPU Durability A  B  C CPU durability OFF durability 31

32 Commit ACommit BCommit C Ack A Ack B Ack C Unite D. & O. in MiddlewareProxy DBMS CPU Ordering A  B  C Commit order A  B  C CPU Durability A  B  C CPU durability OFF durability Solution: Move durability to MW Durability & ordering in middleware → group commit Solution: Move durability to MW Durability & ordering in middleware → group commit 32

33 Middleware logs tx effects –Durability of update tx Guaranteed in middleware Turn durability off at database Middleware performs durability & ordering –United → group commit → fast Database commits update tx serially –Commit = quick main memory operation Implementation: Uniting D & O in MW 33

34 Uniting Improves Throughput Metric –Throughput Workload –TPC-W Ordering (50% updates) System –Linux cluster –PostgreSQL –16 replicas –Serializable exec. TPC-W 1 X 12 X 7 X TPS

35 Tx A Roadmap Replica 2 Replica 1 Replica 3 Load Balancer Ordering Load balancing Update propagation Commit updates in order 35

36 Key Idea Replica 1 Mem Disk Replica 2 Mem Disk Load Balancer Equal load on replicas 36

37 Key Idea Replica 1 Mem Disk Replica 2 Mem Disk Load Balancer Equal load on replicas MALB: (Memory-Aware Load Balancing) Optimize for in-memory execution MALB: (Memory-Aware Load Balancing) Optimize for in-memory execution 37

38 How Does MALB Work? Database 213 Workload A → B → Mem Memory 21 23 38

39 A, B, A, B Read Data From Disk A, B, A, B Replica 1 Mem Disk 213 Replica 2 Mem Disk 213 Least Loaded 31 A → B → 21 23 39

40 A, B, A, B Read Data From Disk A, B, A, B Replica 1 Mem Disk 213 Replica 2 Mem Disk 213 Least Loaded 31 Slow A → B → 21 23 40 21331 21331

41 Data Fits in Memory Replica 1 Mem Disk 213 Replica 2 Mem Disk 213 MALB A → B → 21 23 A, A, A, A B, B, B, B A, B, A, B 41

42 Data Fits in Memory Replica 1 Mem Disk 213 21 Replica 2 Mem Disk 213 32 MALB Fast A → B → 21 23 A, A, A, A B, B, B, B Memory info? Many tx and replicas? Memory info? Many tx and replicas? A, B, A, B 42

43 Exploit tx execution plan –Which tables & indices are accessed –Their access pattern Linear scan, direct access Metadata from database –Sizes of tables and indices Estimate Tx Memory Needs 43

44 Objective –Construct tx groups that fit together in memory Bin packing –Item: tx memory needs –Bin: memory of replica –Heuristic: Best Fit Decreasing Allocate replicas to tx groups –Adjust for group loads Grouping Transactions 44

45 MALB in Action A B C D E F MALB 45

46 MALB in Action A B C D E F MALB Memory needs for A, B, C, D, E, F 46

47 Group A MALB in Action A B C D E F Group B C Group D E F MALB Memory needs for A, B, C, D, E, F 47

48 Group A MALB in Action A B C D E F Replica Group B C A Group D E F B C D E F MALB Disk Memory needs for A, B, C, D, E, F 48

49 Objective –Optimize for in-memory execution Method –Estimate tx memory needs –Construct tx groups –Allocate replicas to tx groups MALB Summary 49

50 Implementation –No change in consistency –Still middleware Compare –United: efficient baseline system –MALB: exploits working set information Same environment –Linux cluster running PostgreSQL –Workload: TPC-W Ordering (50% update txs) Experimental Evaluation 50

51 MALB Doubles Throughput TPC-W Ordering 16 replicas 51 TPS 105% 12 X 25 X 1 X 7 X

52 MALB Doubles Throughput 52 TPS Read I/O, normalized 105% 12 X 25 X 1 X 7 X

53 BigSmall Big Small Mem Size DB Size Big Gains with MALB 4% 0% 29% 48% 105% 45% 182% 75% 12%

54 BigSmall Big Small Mem Size DB Size Big Gains with MALB 4% 0% 29% 48% 105% 45% 182% 75% 12% Run from memory Run from disk

55 Tx A Roadmap Replica 2 Replica 1 Replica 3 Load Balancer Ordering Load balancing Update propagation Commit updates in order 55

56 Traditional: –Propagate updates everywhere Update Filtering: –Propagate updates to where they are needed Key Idea 56

57 Update Filtering Example Replica 1 Mem Disk 213 Replica 2 Mem Disk 213 MALB UF A → B → 21 23 A, B, A, B 57

58 Group A Update Filtering Example Replica 1 Group B Mem Disk 213 21 Replica 2 Mem Disk 213 32 MALB UF A → B → 21 23 A, B, A, B 58

59 Group A Update Filtering Example Disk Replica 1 Group B Mem 21 2 1 Replica 2 Mem Disk 213 2 MALB UF Update table 1 3 3 A → B → 21 23 A, B, A, B 59

60 Group A Update Filtering Example Disk Replica 1 Group B Mem 21 2 1 Replica 2 Mem Disk 2 1 3 2 MALB UF Update table 1 3 3 A → B → 21 23 A, B, A, B 60

61 Group A Update Filtering Example Disk Replica 1 Group B Mem 21 2 1 Replica 2 Mem Disk 23 2 MALB UF Update table 1 3 3 A → B → 21 23 A, B, A, B 61 1

62 Group A Update Filtering Example Disk Replica 1 Group B Mem 21 21 Replica 2 Mem Disk 213 2 MALB UF Update table 1 3 Update table 3 3 A → B → 21 23 A, B, A, B 62

63 Group A Update Filtering Example Disk Replica 1 Group B Mem 21 21 Replica 2 Mem Disk 213 2 MALB UF Update table 1 3 Update table 3 3 A → B → 21 23 A, B, A, B 63

64 Group A Update Filtering Example Disk Replica 1 Group B Mem 21 21 Replica 2 Mem Disk 213 2 MALB UF Update table 1 3 Update table 3 3 A → B → 21 23 A, B, A, B 64

65 Update Filtering in Action UF 65

66 Update Filtering in Action UF Update to red table 66

67 Update Filtering in Action UF Update to red table Update to green table 67

68 Update Filtering in Action UF Update to red table Update to green table 68

69 Update Filtering in Action UF Update to red table Update to green table 69

70 MALB+UF Triples Throughput 37 X TPS 12 X 25 X 1 X 7 X 49 % TPC-W Ordering 16 replicas

71 MALB+UF Triples Throughput 37 X TPS 12 X 25 X 1 X 7 X Prop. Updates 15 7 49 %

72 Filtering Opportunities 50% Ordering Mix 5% Browsing Mix Updates Ratio MALB+UF / MALB 72

73 Conclusions 1.Commit updates in order –Perform serial synchronous disk writes –Unite ordering and durability 2.Load balancing –Optimize for equal load: memory contention –MALB: optimize for in-memory execution 3.Update propagation –Propagate updates everywhere –Update filtering: propagate to where needed 73


Download ppt "Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety."

Similar presentations


Ads by Google