Transaction chains: achieving serializability with low-latency in geo-distributed storage systems Yang Zhang Russell Power Siyuan Zhou Yair Sovran *Marcos K. Aguilera Jinyang Li New York University *Microsoft Research Silicon Valley
Large-scale Web applications Why geo-distributed storage? Geo-distributed storage Replication
Geo-distribution is hard Low latency: O(Intra-datacenter RTT) Low latency: O(Intra-datacenter RTT) Strong semantics: relational tables w/ transactions Strong semantics: relational tables w/ transactions
? Low latency Key/value only Limited forms of transaction General transaction Prior work Strict serializable Serializable Eventual Various non-serializable High latency Provably high latency according to CAP Spanner [OSDI’12] Dynamo [SOSP’07] COPS [SOSP’11] Walter [SOSP’11] Eiger [NSDI’13] Our work
Our contributions 1.A new primitive: transaction chain – Allow for low latency, serializable transactions 1.Lynx geo-storage system: built with chains – Relational tables – Secondary indices, materialized join views
Talk Outline Motivation Transaction chains Lynx Evaluation
Why transaction chains? BidderItemPrice SellerItemHighest bid Bids Items AliceBook$100 BobBook$20 AliceiPhone$20 Bob Datacenter-1 Datacenter-2 Alice BobCamera$100 Auction service
Why transaction chains? Alice’s Bids AliceBook$100 Bob Datacenter-1 Datacenter-2 Alice BobCamera$100 Bob’s Items 1. Insert bid to Alice’s Bids 2. Update highest bid on Bob’s Items Operation: Alice bids on Bob’s camera 1. Insert bid to Alice’s Bids
Why transaction chains? Alice’s Bids AliceBook$100 Bob Datacenter-1 Datacenter-2 Alice BobCamera$100 Bob’s Items 2. Update highest bid on Bob’s Items Operation: Alice bids on Bob’s camera 1. Insert bid to Alice’s Bids
Low latency with first-hop return Alice’s Bids AliceBook$100 Bob Datacenter-1 Datacenter-2 Alice BobCamera$100 Bob’s Items bid on Bob’s camera AliceCamera$500
Problem: what if chains fail? 1.What if servers fail after executing first-hop? 2.What if a chain is aborted in the middle?
Solution: provide all-or-nothing atomicity 1.Chains are durably logged at first-hop – Logs are replicated to another closest data center – Chains are re-executed upon recovery 2.Chains allow user-aborts only at first hop Guarantee: First hop commits all hops eventually commit
Problem: non-serializable interleaving Concurrent chains ordered inconsistently at different hops X=1Y=1 X=2Y=2 Time T1 T2 Server-X: T1 < T2 Server-Y: T2 < T1 Not serializable! T2T1 Traditional 2PL+2PC prevents non-serializable interleaving at the cost of high latency
Conflict? Solution: detect non-serializable interleaving via static analysis Statically analyze all chains to be executed – Web applications invoke fixed set of operations X=1Y=1 X=2Y=2 Serializable if no SC-cycle [Shasha et. al TODS’95] A SC-cycle has both red and blue edges T1 T2
Outline Motivation Transaction chains Lynx’s design Evaluation
How Lynx uses chains User chains: used by programmers to implement application logic System chains: used internally to maintain – Secondary indexes – Materialized join views – Geo-replicas
Example: secondary index BobCar$20 AliceBook$20 BobCamera$100 AliceiPhone$100 BidderItemPrice Bids (base table) AliceCamera$100 BobiPhone$20 BidderItemPrice Bids (secondary index) AliceCamera$100 BobCar$20
Example user and system chain AliceBook$100 Bob Datacenter-1 Datacenter-2 Alice BobCamera$100 bid on Bob’s camera AliceCamera$100
Insert to Bids table Update Items table Lynx statically analyzes all chains beforehand Put-bid Read-bids Put-bid Insert to Bids table Update Items table Read-bids SC-cycle One solution: execute chain as a distributed transaction Read Bids table Read Bids table
Insert to Bids table Update Items table SC-cycle source #1: false conflicts in user chains Put-bid Insert to Bids table Update Items table Put-bid False conflict because max(bid, current_price) commutes
Insert to Bids table Update Items table Solution: users annotate commutativity Put-bid Insert to Bids table Update Items table Put-bid commutes
SC-cycle source #2: system chains Insert to Bids table … Put-bid Insert to Bids table … Put-bid Insert to Bids-secondary Insert to Bids-secondary SC-cycle
Solution: chains provide origin-ordering Observation: conflicting system chains originate at the same first hop server. Both write the same row of Bids table Origin-ordering: if chains T1 < T2 at same first hop, then T1 < T2 at all subsequent overlapping hops. – Can be implemented cheaply sequence number vectors T1 Insert to Bids table Insert to Bids-secondary T2 Insert to Bids table Insert to Bids-secondary
Limitations of Lynx/chains 1.Chains are not strictly serializable, only serializable. 2.Programmers can abort only at first hop Our application experience: limitations are managable
Outline Motivation Transaction chains Lynx’s design Evaluation
Simple Twitter Clone on Lynx AuthorTweet Tweets AliceNew York rocks FromTo Follow-Graph AliceBob AliceEve BobTime to sleep ToFrom Follow-Graph (secondary) BobAlice BobClark Geo-replicated Author (=to) FromTweet BobAliceTime to sleep EveAliceHi there Tweets JOIN Follow-Graph (Timeline) EveHi there
Experimental setup us-west europe us-east 82ms 153ms 102ms Lynx protoype: In-memory database Local disk logging only. Lynx protoype: In-memory database Local disk logging only.
Returning on first-hop allows low latency First hop return Chain completion
Applications achieve good throughput
Related work Transaction decomposition – SAGAS [SIGMOD’96], step-decomposed transactions Incremental view maintenance – Views for PNUTS [SIGMOD’09] Various geo-distributed/replicated storage – Spanner[OSDI’12], MDCC[Eurosys’13], Megastore[CIDR’11], COPS [SOSP’11], Eiger[NSDI’13], RedBlue[OSDI’12].
Conclusion Chains support serializability at low latency – With static analysis of SC-cycles Key techniques to reduce SC-cycles – Origin ordering – Commutative annotation Chains are useful – Performing application logic – Maintaining indices/join views/geo-replicas
Limitations of Lynx/chains 1.Chains are not strict serializable Time Remedies: – Programmers can wait for chain completion – Lynx provides read-your-own-writes 2. Programmers can only abort at first hop Our application experience shows the limitations are managable SerializableStrict serializable
2PC and chains The easy way W(A) R(A) W(B) W(A) W(B) R(A) 2PC-W(AB) R(A) T1 T2 T1 T2 T1
2PC and chains The hard way W(A) R(A)R(B) W(B) W(A) W(B) R(A)R(B) 2PC-W(AB) R(A)R(B) R(A)R(B) T1 T2 T1 T2 T1
2PC and chains The hard way Chain DC1 DC2 DC3 DC4 A B CD 2PC retry Parallel unlock
Lynx is scalable
1. Insert bid into bid history2. Update max price on item 1. Insert bid into bid history2. Update max price on item T1 T2 Conflict on bid history Conflict on item SC-cycle Not serializable Challenge of static analysis: false conflict
Solution: communitivity annotations 1. Insert bid into bid history2. Update max price on item 1. Insert bid into bid history2. Update max price on item T1 T2 Conflict on bid history Commutative operation No SC-cycle Serializable Conflict on item No real conflict because bid ids are unique Updating max commutes Commutative operation
ACID: all-or-nothing atomicity Chain’s failure guarantee: – If the first hop of a chain commits, then all hops eventually commit Users are only allowed to abort a chain in the first hop Achievable with low latency: – Log chains durably at the first hop Logs replicated to a nearby datacenter – Re-execute stalled chains upon failure recovery
ACID: serializability Serializability – Execution result appears as if obey a serial order for all transactions – No restrictions on the serial order Ordering 1 Ordering 2 Transactions
Problem #2: unsafe interleaving Serializability – Execution result appears as if obey a serial order for all transactions – No restrictions on the serial order Ordering 1 Ordering 2 Transactions
Chains are not linearizable Serializability Linearability Ordering 1 Ordering 2 Transactions Time Linearizable a total ordering of chains & total order obeys the issue order
Transaction chains: recap Chains provide all-or-nothing atomicity Chains ensure serializability via static analysis Practical challenges: – How to use chains? – How to avoid SC-cycles?
Example user chain BidderItemPrice Bids AliceCamera Insert bid into Alice’s bid history Alice Bob SellerItemHighest Items BobCameraBobCamera Update max price on Bob’s camera
Lynx implementation 5000 lines C++ and 3500 lines RPC library Uses an in-memory key/value store Support user chains in Javascript (via V8)
Geo-distributed storage is hard Applications demand simplicity & performance – Friendly programming model Relational tables Transactions – Fast response Ideally, operation latency = O(intra-datacenter RTT) Geo-distribution leads to high latency – Coordinate data access across datacenters Operation latency = O(inter-datacenter RTT) = O(100ms)