Download presentation
Presentation is loading. Please wait.
Published byLucas Quinn Modified over 8 years ago
1
VLDB2012 Hoang Tam Vo #1, Sheng Wang #2, Divyakant Agrawal †3, Gang Chen §4, Beng Chin Ooi #5 #National University of Singapore, †University of California, Santa Barbara, §Zhejiang University 15300240024 王夏青 LogBase: A Scalable Log-structured Database System in the cloud
2
Abstract Introduction Background & Related Work Design & Implementation Performance Evaluation Conclusion
3
Introduction: Requirements High write throughput Dynamic scalability Efficient multiversion data access Transactional semantics Fast recovery from machine failures
4
Introduction: Characters Log serves as the unique data repository in the system Adopts an architecture similar to HBase and BigTable where a mashine in the system is responsible for some tablets Builds an index per tablet for retrieving the data from the log
5
Introduction: Contributions Propose LogBase – a scalable log-structured database system that can be dynamically deployed in the cloud. Design a multiversion index strategy in LogBase to provide efficient access to the multiversion data. Enhance LogBase to support transactional. Conduct an extensive performance study on LogBase.
6
Background & Related Work No-overwrite Strategies: System R: shadow paging strategy; POSTGRES: delta record WAL+Data: Most storage systems Log-structured Systems: LFS, BlueSky, Berkeley DB, PrimeBase, Hyder, RAMCloud
7
Design & Implementation: Data Model Model: relational data model Data Partitioning: vertical: column groups; horizontal: tablets
8
Design & Implementation: Architecture Overview Log Repository Data Access Manager Transaction Manager
9
Design & Implementation: Log Repository Guarantee: Stable storage: The log-only approach provides similar capability of recovering data from machine failures compared to the WAL+Data approach Stores the log in HDFS Design choices for the implementation of the log Log record: LogKey: LSN, table name, tablet information Data:
10
Design & Implementation: In-memory Multiversion Index Index: to provide efficient access to the data In-memory index Index structure: Blink-trees Index entry: IdxKey: primary key + timestamp Consumption analysis
11
Design & Implementation: Tablet Serving(1)
12
Design & Implementation: Tablet Serving(2) Write Read Delete Scan Compaction
13
Design & Implementation: Transaction Management(1) Concurrency Control and Isolation: The Rationale of MVOCC Validation with Write Locks Snapshot Isolation in LogBase Guarantee: Isolation: The hybrid scheme of multiversion optimistic concurrency control(MVOCC) in LogBase guarantees snapshot isolation
14
Design & Implementation: Transaction Management(2) Commit Protocol and Atomicity: Guarantee: Atomicity: The LogBase’s commit protocol guarantees similar atomicity property to the WAL+Data approach Commit procedure
15
Design & Implementation: Failures and Recovery Guarantee: Durability: The LogBase’s recovery protocol guarantees similar data durability property to the WAL+Data approach Checkpoint operation Recovery procedure
16
Performance Evaluation: Experimental Setup An in-house cluster including 24 machines, each with a quad core processor, 8 GB of physical memory, 500 GB of disk capacity and 1 gigabit Ethernet Implemented in Java, inherits basic infrastructures from HBase open source Compare the performance of LogBase with HBase Workload: 5000 operations 15000 operations for warming up the cathe
17
Performance Evaluation: Micro-benchmarks(1) Basic data operations: Write Random read Sequential scan Range scan
18
Performance Evaluation: Micro-benchmarks(2)
19
Performance Evaluation: Micro-benchmarks(3)
20
Performance Evaluation: Micro-benchmarks(4)
21
Performance Evaluation: YCSB Benchmark(1) Mixed workloads: 95% and 75% update in the workload Varying system sizes: 3 to 24 nodes
22
Performance Evaluation: YCSB Benchmark(2)
23
Performance Evaluation: YCSB Benchmark(3)
24
Performance Evaluation: TPC-W Benchmark(1) Examine the performance when accessing multiple data records possibly from different tables within the transaction boundary Models a webshop application workload Browsing, shopping, ordering: 5%, 20%, 50% update transactions
25
Performance Evaluation: TPC-W Benchmark(2)
26
Performance Evaluation: Checkpoint and Recovery
27
Performance Evaluation: Comparison with Log-structured Systems(1) RAMClouds: stores its data and indexes entirely in memory Hyder: scales its database in shared-flash environments without data partitioning LRS: has a distributed architecture and data partitioning strategy similar to RAMCloud and LogBase but stores data on disks
28
Performance Evaluation: Comparison with Log-structured Systems(2)
29
Performance Evaluation: Comparison with Log-structured Systems(3)
30
Conclusion Introduced a scalable log-structured database system called LogBase Can be elastically deployed in the cloud Can provide sustained write throughput and effective recovery time The in-memory indexes support efficient data retrieval Provides the widely accepted snapshot isolation for transactions Extensive experiments Future works: the design and implementation of efficient secondary indexes and query processing for LogBase
31
Thanks
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.