Download presentation
Presentation is loading. Please wait.
Published byMelissa Main Modified over 10 years ago
1
Omid Efficient Transaction Management and Incremental Processing for HBase Copyright © 2013 Yahoo! All rights reserved. No reproduction or distribution allowed without express written permission. Daniel Gómez Ferro 21/03/2013
2
About me Research engineer at Yahoo! Main Omid developer Apache S4 committer Contributor to: ZooKeeper HBase Omid - Yahoo! Research 2
3
Outline Motivation Goal: transactions on HBase Architecture Examples Future work 3 Omid - Yahoo! Research
4
Motivation Traditional DBMS: SQL Transactions Scalable NoSQL stores: SQL Transactions Scalable Some use cases require Scalability Consistent updates Omid - Yahoo! Research 4
5
Motivation Requested feature Support for transactions in HBase is a common request Percolator Google implemented transactions on BigTable Incremental Processing MapReduce latencies measured in hours Reduce latency to seconds Requires transactions Omid - Yahoo! Research 5
6
Incremental Processing State 0 State 1 Shared State Shared State Offline MapReduce Online Incremental Processing Fault tolerance Scalability Fault tolerance Scalability Shared state Using Percolator, Google dramatically reduced crawl-to-index time Omid - Yahoo! Research 6
7
Transactions on HBase
8
Omid ‘Hope’ in Persian Optimistic Concurrency Control system Lock free Aborts transactions at commit time Implements Snapshot Isolation Low overhead http://yahoo.github.com/omid http://yahoo.github.com/omid Omid - Yahoo! Research 8
9
Snapshot Isolation Transactions read from their own snapshot Snapshot: Immutable Identified by creation time Values committed before creation time Transactions conflict if two conditions apply: A)Overlap in time B)Write to the same row Omid - Yahoo! Research 9
10
Transaction conflicts r1r2r3r4 Time Transaction B overlaps with A and C B ∩ A = ∅ B ∩ C = { r4 } B and C conflict Transaction D doesn’t conflict Omid - Yahoo! Research 10 Write set: r3r4 r2r4 r1r3 Id: A B C D Time overlap:
11
Omid Architecture
12
Omid Clients Omid Clients Architecture 1.Centralized server 2.Transactional metadata replicated to clients 3.Store transaction ids in HBase timestamps HBase Region Servers Status Oracle (SO) HBase Region Servers HBase Region Servers HBase Region Servers (1) (2) (3) Omid - Yahoo! Research 12 Omid Clients SOSO
13
Status Oracle Limited memory Evict old metadata … maintaining consistency guarantees! Keep LowWatermark Updated on metadata eviction Abort transactions older than LowWatermark Necessary for Garbage Collection Omid - Yahoo! Research 13
14
Transactional Metadata Status Oracle Aborted T2T18… LowWatermark T4 Omid - Yahoo! Research 14
15
Metadata Replication Transactional metadata is replicated to clients Status Oracle not in critical path Minimize overhead Clients guarantee Snapshot Isolation Ignore data not in their snapshot Talk directly to HBase Conflicts resolved by the Status Oracle Expensive, but scalable up to 1000 clients Omid - Yahoo! Research 15
16
Omid Clients Omid Clients Architecture recap HBase Region Servers HBase Region Servers HBase Region Servers HBase Region Servers Omid - Yahoo! Research 16 Omid Clients SOSO Aborted T2T18… LowWatermark T4 Status Oracle
17
Architecture + Metadata Status Oracle Abort … LowW … Abort … LowW … Client Omid - Yahoo! Research 17 Conflict detection (not replicated) Conflict detection (not replicated) HBase rowTSval r110hi HBase values tagged with Start TimeStamp
18
Comparison with Percolator 18 Omid - Yahoo! Research Omid ClientsHBase Status Oracle Percolator ClientsBigTable TS Server
19
Simple API Omid - Yahoo! Research 19
20
Example Transaction
21
Abort … LowW … Client Transaction Example Status Oracle Abort … LowW … HBase rowTSval r110hi >> | Omid - Yahoo! Research 21
22
Status Oracle Abort … LowW … Client Transaction Example Abort … LowW … >> t = TM.begin() Transaction { ST: 25 } >> >> t = TM.begin() Transaction { ST: 25 } >> t ST: 25 t ST: 25 | Omid - Yahoo! Research 22 HBase rowTSval r110hi
23
Status Oracle Abort … LowW … Client Transaction Example Abort … LowW … >> t = TM.begin() Transaction { ST: 25 } >> TTable.get(t, ‘r1’) ‘hi’ >> >> t = TM.begin() Transaction { ST: 25 } >> TTable.get(t, ‘r1’) ‘hi’ >> t ST: 25 t ST: 25 | Omid - Yahoo! Research 23 HBase rowTSval r110hi
24
Status Oracle Abort … LowW … Client Transaction Example Abort … LowW … >> t = TM.begin() Transaction { ST: 25 } >> TTable.get(t, ‘r1’) ‘hi’ >> TTable.put(t, ‘r1’, ‘bye’) >> >> t = TM.begin() Transaction { ST: 25 } >> TTable.get(t, ‘r1’) ‘hi’ >> TTable.put(t, ‘r1’, ‘bye’) >> | Omid - Yahoo! Research 24 HBase rowTSval r110hi r1
25
Status Oracle Abort … LowW … Client Transaction Example Abort … LowW >> TTable.get(t, ‘r1’) ‘hi’ >> TTable.put(t, ‘r1’, ‘bye’) >> TM.commit(t) Committed{ CT: 30 } >> >> TTable.get(t, ‘r1’) ‘hi’ >> TTable.put(t, ‘r1’, ‘bye’) >> TM.commit(t) Committed{ CT: 30 } >> t ST: 25 R: {r1} t ST: 25 R: {r1} | Omid - Yahoo! Research 25 HBase rowTSval r110hi r125bye
26
Status Oracle Abort … LowW … Client Transaction Example Abort >> t_old Transaction{ ST: 15 } >> TT.put(t_old, ‘r1’, ‘yo’) >> TM.commit(t_old) Aborted >> >> t_old Transaction{ ST: 15 } >> TT.put(t_old, ‘r1’, ‘yo’) >> TM.commit(t_old) Aborted >> t_old ST: 15 R: {r1} t_old ST: 15 R: {r1} Omid - Yahoo! Research 26 HBase rowTSval r1 25bye LowW … |
27
Centralized Approach 27 Omid - Yahoo! Research
28
Omid Performance Centralized server = bottleneck? Focus on good performance In our experiments, it never became the bottleneck Omid - Yahoo! Research 28 512 rows/t 1K TPS 10 ms 2 rows/t 100K TPS 5 ms
29
Fault Tolerance Omid uses BookKeeper for fault tolerance BookKeeper is a distributed Write Ahead Log Recovery: reads log’s tail (bounded time) Downtime < 25 s Omid - Yahoo! Research 29
30
Example Use Case News Recommendation System
31
Model-based recommendations Similar users belong to a cluster Each cluster has a trained model New article Evaluate against clusters Recommend to users User actions Train model Split cluster Omid - Yahoo! Research 31
32
Transactions Advantages All operations are consistent Updates and queries Needed for a recommendation system? We avoid corner cases Implementing existing algorithms is straightforward Problems we solve Two operations concurrently splitting a cluster Query while cluster splits … 32 Omid - Yahoo! Research
33
Example overview 33 Omid - Yahoo! Research Index Articles & User actions Queries Index Data Store Wor kers Updates
34
Other use cases Update indexes incrementally Original Percolator use case Generic graph processing Adapt existing transactional solutions to HBase 34 Omid - Yahoo! Research
35
Future Work
36
Current Challenges Load isn’t distributed automatically Complex logic requires big transactions More likely to conflict More expensive to retry API is low level for data processing Omid - Yahoo! Research 36
37
Future work Framework for Incremental Processing Simpler API Trigger-based, easy to decompose operations Auto scalable Improve user friendliness Debug tools, metrics, etc Hot failover Omid - Yahoo! Research 37
38
Questions? All time contributors Ben Reed Flavio Junqueira Francisco Pérez-Sorrosal Ivan Kelly Matthieu Morel Maysam Yabandeh Partially supported by CumuloNimbo project (ICT-257993) funded by the European Commission Try it! yahoo.github.com/omid
39
Backup 39 Omid - Yahoo! Research
40
Commit Algorithm Receive commit request for txn_i Set CommitTimestamp(txn_i) <= new timestamp For each modified row: if LastCommit(row) > StartTimestamp(txn_i) => abort For each modified row: Set LastCommit(row) <= CommitTimestamp(txn_i) Omid - Yahoo! Research 40
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.