Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Tashkent: Uniting Durability & Ordering in Replicated Databases Sameh Elnikety, EPFL Steven Dropsho, EPFL Fernando Pedone, USI.

Similar presentations


Presentation on theme: "1 Tashkent: Uniting Durability & Ordering in Replicated Databases Sameh Elnikety, EPFL Steven Dropsho, EPFL Fernando Pedone, USI."— Presentation transcript:

1 1 Tashkent: Uniting Durability & Ordering in Replicated Databases Sameh Elnikety, EPFL Steven Dropsho, EPFL Fernando Pedone, USI

2 2 Write-Many Replicated Database separation  All replicas agree on –which update tx commit –their commit order Total order –Determined by middleware –Followed by each replica durability Replica 3 Tx A Tx B durability Replica 2 durability Replica 1

3 3 Tx B durability Replica 3 Replication MW (global ordering) Tx A A  B Order Determined Outside DB Tx A Tx B One Replica  durability Replica 2 A  B durability Replica 1 A  B

4 4 Middleware Commit order: A  B Database durability Replica Proxy Tx A Tx B SQL interface Task A Task B BA  Cannot commit A & B concurrently! Enforce External Commit Order Must serialize 

5 5 Middleware Commit order: A  B Database durability Replica Proxy Tx A Tx B SQL interface Task A Task B AB  Enforce Order = Serial Commit Serialization slow 

6 6 Commit Serialization is Slow Solutions  Durability A Proxy Database durability CPU Middleware order: A  B  C Commit order A  B  C Durability A  B CPU Durability A  B  C CPU Commit ACommit BCommit C Ack A Ack B Ack C Root cause: Durability & ordering separated  serial disk writes Root cause: Durability & ordering separated  serial disk writes

7 7 1-Pass order info to DB durability Replica durability Replica Middleware (ordering) order 2-Move durability to MW order Solution: Unite Durability & Ordering Replica Middleware (ordering) durability OFF durability OFF durability Unite in DB 

8 8 1- Unite Dur. & Ord. in Database Solutions  Proxy Database durability CPU Middleware order: A  B  C Commit order A  B  C Durability A  B  C Ack A Ack B Ack C Commit A at 1 Commit B at 2 Commit C at 3 order Solution 1: pass order info to DB Durability & ordering in database  group commit Solution 1: pass order info to DB Durability & ordering in database  group commit

9 9 1-Pass order info to DB durability Replica durability Replica Middleware (ordering) order 2-Move durability to MW order Solution: Unite Durability & Ordering Replica Middleware (ordering) durability OFF durability OFF durability Unite in DB 

10 10 Commit ACommit BCommit C Ack A Ack B Ack C 2- Unite D. & O. in Middleware Roadmap  Proxy Database CPU Middleware order: A  B  C Commit order A  B  C CPU Durability A  B  C CPU durability OFF durability Solution 2: move durability to MW Durability & ordering in middleware  group commit Solution 2: move durability to MW Durability & ordering in middleware  group commit

11 11 Durability & ordering –Separated  serial commit  slow –United  group commit  fast Two Implementations –Tashkent-API: united in DB –Tashkent-MW: united in MW Tashkent-MW –Implementation –Recovery –Performance Roadmap

12 12 Tx B Replication MW (global ordering) Tx A A  B  C Tashkent-MW Tx A Tx B One Replica  durability OFF Replica 2 durability OFF Replica 1 A  B  C durability A  B  C Tx C Replica 3 A  B  C Tx C durability OFF

13 13 Middleware logs tx effects –Durability of update tx Guaranteed in middleware Turn durability off at database Middleware performs durability & ordering –United  group commit  fast Database commits update tx serially –Commit = quick main memory operation Tashkent-MW Durability & Ordering in Middleware Back to Example 

14 14 Replication MW (global ordering) Recovery in Tashkent-MW Db i/o  Replica 2 Replica 1 durability Replica 3 durability OFF durability OFF durability OFF

15 15 Database Standard Database I/O DB recovery  Disk Memory DataLog Data Log flushed for 1- Durability 2- Allow cleaning dirty data pages: { physical integrity } A A Crash! Tx A A bad

16 16 Database Database I/O with Durability=off DB recovery  Disk Memory DataLog Data Simple Solution Recover from a data dump (checkpoint) A A Crash! Tx A Middleware order: A  B  C Durability A A bad

17 17 Durability & ordering –Separated  serial commit  slow –United  group commit  fast Two Implementations –Tashkent-API: united in DB –Tashkent-MW: united in MW Tashkent-MW –Implementation –Recovery –Performance Roadmap

18 18 Performance - Setup Metrics: –Throughput –Response time Workload: –AllUpdates: tx = { 1 update }, mix= %100 updates –TPC-B: tx={4 update,1 read}, mix=%100 updates –TPC-W: mix of long & short txs System configuration: –Linux Cluster running PostgreSQL AllUpdates TH 

19 19 AllUpdates Throughput Throughput 

20 20 AllUpdates Throughput

21 21 AllUpdates Throughput RT 

22 22 AllUpdates Response Time In paper 

23 23 In the Paper Design & Implementation –Tashkent-API Performance results –TPC-B & TPC-W –Recovery times –Another I/O subsystems Conclusions 

24 24 Conclusions Durability & ordering –Separated  serial commit  slow –United  group commit  fast Two Implementations –Tashkent-API: united in DB –Tashkent-MW: united in MW Tashkent-MW system –Pure middleware replication –Significant performance improvement

25 25

26 26

27 27 Concurrency Control Generalized Snapshot Isolation – GSI Conclusions valid whenever replicas agree 1- on which update transactions commit 2- on their commit order Example (bank database) –T1: set balance = $1000 –T2: set balance = $2000 –Replica1: see T1 then T2  balance = $2000 –Replica2: see T2 then T1  balance = $1000 

28 28 Durability and Ordering 1/2 Replica 1 Certifier T4 T9 Proxy Database Cert. Log: T4 T9  Scalability problem: one write per trans. DB1 Log: T4 T9

29 29... Replica 1 Certifier T4 T9 Proxy Database Replica 2 Proxy Database T3 T8... Ti’s DB1 Log: T1 T2 T3 T4 T5 T6 T7 T8 T9 One disk write  Scalability problem: two writes per trans. Durability and Ordering 2/2 Cert. Log: T1 T2 T3 T4 T5 T6 T7 T8 T9 DB1 Log: T1,T2,T3 T4 T5, T6, T7, T8 T9

30 30 AllUpdates 1-Replica Throughput low replication overhead, 1-replica == standalone DB

31 31 AllUpdates Response Time In paper 

32 32 TPC-B Throughput Low replication overhead, 1-replica system == standalone DB, Performance scales with multiple replicas In the Paper 


Download ppt "1 Tashkent: Uniting Durability & Ordering in Replicated Databases Sameh Elnikety, EPFL Steven Dropsho, EPFL Fernando Pedone, USI."

Similar presentations


Ads by Google