HBase Accelerated: In-Memory Flush and Compaction

Slides:



Advertisements
Similar presentations
Mendel Rosenblum and John K. Ousterhout Presented by Travis Bale 1.
Advertisements

File Systems.
The Hadoop RDBMS Replace Oracle with Hadoop John Leach CTO and Co-Founder J.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 11.
Low-Cost Data Deduplication for Virtual Machine Backup in Cloud Storage Wei Zhang, Tao Yang, Gautham Narayanasamy University of California at Santa Barbara.
1 External Sorting Chapter Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing.
G Robert Grimm New York University Sprite LFS or Let’s Log Everything.
G Robert Grimm New York University Recoverable Virtual Memory.
The design and implementation of a log-structured file system The design and implementation of a log-structured file system M. Rosenblum and J.K. Ousterhout.
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
Memory Management 2010.
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
Performance Evaluation
1 External Sorting Chapter Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing.
External Sorting 198:541. Why Sort?  A classic problem in computer science!  Data requested in sorted order e.g., find students in increasing gpa order.
G Robert Grimm New York University Sprite LFS or Let’s Log Everything.
External Sorting Chapter 13.. Why Sort? A classic problem in computer science! Data requested in sorted order  e.g., find students in increasing gpa.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.
Key/Value Stores CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook.
The Design and Implementation of Log-Structure File System M. Rosenblum and J. Ousterhout.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.
CS 153 Design of Operating Systems Spring 2015 Lecture 22: File system optimizations.
Empirical Quantification of Opportunities for Content Adaptation in Web Servers Michael Gopshtein and Dror Feitelson School of Engineering and Computer.
Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
Lecture 10 Page 1 CS 111 Summer 2013 File Systems Control Structures A file is a named collection of information Primary roles of file system: – To store.
Threads. Readings r Silberschatz et al : Chapter 4.
Introduction to Database Systems1 External Sorting Query Processing: Topic 0.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapters 13: 13.1—13.5.
CPSC Why do we need Sorting? 2.Complexities of few sorting algorithms ? 3.2-Way Sort 1.2-way external merge sort 2.Cost associated with external.
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
Embedded System Lab. 정영진 The Design and Implementation of a Log-Structured File System Mendel Rosenblum and John K. Ousterhout ACM Transactions.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 11.
External Sorting. Why Sort? A classic problem in computer science! Data requested in sorted order –e.g., find students in increasing gpa order Sorting.
LIOProf: Exposing Lustre File System Behavior for I/O Middleware
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)
Presenter: Yue Zhu, Linghan Zhang A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: a Case Study by PowerPoint.
Practical Hadoop: do’s and don’ts by example Kacper Surdy, Zbigniew Baranowski.
Log-Structured Memory for DRAM-Based Storage Stephen Rumble and John Ousterhout Stanford University.
Jonathan Walpole Computer Science Portland State University
Chapter 11: File System Implementation
Running virtualized Hadoop, does it make sense?
Java 9: The Quest for Very Large Heaps
Lecture 16: Data Storage Wednesday, November 6, 2006.
Selective Code Compression Scheme for Embedded System
Database Management Systems (CS 564)
Steve Ko Computer Sciences and Engineering University at Buffalo
CSE-291 (Cloud Computing) Fall 2016
Database Management Systems (CS 564)
Swapping Segmented paging allows us to have non-contiguous allocations
Operating Systems ECE344 Lecture 11: SSD Ding Yuan
Lecture 11: DMBS Internals
HashKV: Enabling Efficient Updates in KV Storage via Hashing
Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin
Yu Su, Yi Wang, Gagan Agrawal The Ohio State University
Steve Ko Computer Sciences and Engineering University at Buffalo
Lecture#12: External Sorting (R&G, Ch13)
Optimizing MapReduce for GPUs with Effective Shared Memory Usage
Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform
Selected Topics: External Sorting, Join Algorithms, …
Admission Control and Request Scheduling in E-Commerce Web Sites
M. Rosenblum and J.K. Ousterhout The design and implementation of a log-structured file system Proceedings of the 13th ACM Symposium on Operating.
External Sorting.
Outline Introduction LSM-tree and LevelDB Architecture WiscKey.
CENG 351 Data Management and File Structures
The Design and Implementation of a Log-Structured File System
Presentation transcript:

HBase Accelerated: In-Memory Flush and Compaction Eshcar Hillel, Anastasia Braginsky, Edward Bortnikov ⎪ Hbase Meetup, San Francisco, December 8, 2016 Hello Everybody! My name is Anastasia and I am from Yahoo Research. Today, I am going to present you two new enhancements for the in-memory component in Hbase and how they can affect the HBase performance. This is a work we have done together with Eshcar Hillel and Eddie Bortnikov, both from Yahoo Research.

Outline Motivation & Background In-Memory Basic Compaction In-Memory Eager Compaction Evaluation Here is the plan for this talk: First, I am going to give you some background and motivation for what we have done And what we have done, IN A NUTSHELL, is (first) compact the data in-memory and (second) reduce per-cell overhead. In both cases the changes are only in MemStore component. So after the background I am going to explain how we apply the in-memory compaction and what do we gain from it. And the last item is about the twist that we gave to the conventional index in the MemStore component.

HBase Accelerated: Mission Definition Goal: Real-time performance in persistent KV-stores How: Use the memory more efficiently  less I/O So what do we want? We want better performance! Just like everyone…  And how do we plan to get the performance? We plan to use less memory. I.e. we plan to squeeze into the memory more then previously and so the first flush to disk will be delayed.

Use the memory more efficiently  less I/O Upon write request: data is first written into the MemStore, when certain memory limit is met MemStore data is flushed into Hfile. write 4 3 2 1 MemStore Hfile memory HDFS

Use the memory more efficiently  less I/O Upon write request: data is first written into the MemStore, when certain memory limit is met MemStore data is flushed into Hfile. write 6 5 4 3 2 1 MemStore Hfile memory HDFS

Use the memory more efficiently  less I/O Due to efficient usage we let MemStore to keep more data in the same limited memory, so less flushes to disk are required write 6 5 4 3 2 1 MemStore Hfile memory HDFS

Prolong in-memory lifetime, before flushing to disk Reduce amount of new Hfiles Reduce write amplification effect (overall I/O) Reduce retrieval latencies

HBase Accelerated: Two Basic Ideas In-Memory Index Compaction Reduce the index memory footprint, less overhead per cell In-Memory Data Compaction Exploit redundancies in the workload to eliminate duplicates in memory As I have already said in order to squeeze more into memory we have two ideas. First one – the compaction – which means eliminating in-memory duplications. So in case the same key is updated again and again its obsolete versions can be removed already in memory. No need to flush to the disk and compact there. Following this idea the more duplications you have, the more you are going to enjoy the in-memory compaction. And the second idea – the reduction – what can we reduce? If cell is written and it is not a duplication, then it needs to be there, right? Yes this is right. We reduce just the overhead per cell, we reduce the metadata that serves the cell. And in this case the smaller the cells are, the better the relative performance is. So what do we get? We try to keep the MemStore component small and so it takes longer untll the flush to disk happens. Every Computer Science student knows if you can use less memory – it is better. And also everybody knows that to stay in RAM is better than getting to disk. But I want to pay your attention that there are little more benefits. Delayed flush to disk means less files on disk and less compactions on disk – and those on-disk compactions are expensive. It takes us disk bendwidth , CPU power, network between machines in the HDFS cluster, etc.

on-disk compaction HBase Writes write Hfiles DefaultMemStore WAL write active Hfiles prepare-for-flush flush on-disk compaction Snap-shot memory HDFS Random writes are absorbed in an active segment When active segment is full Becomes immutable segment (snapshot) A new mutable (active) segment serves writes Flushed to disk, truncate WAL On-Disk compaction reads a few files, merge-sorts them, writes back new files

Outline In-Memory Basic Compaction Motivation & Background In-Memory Eager Compaction Evaluation In-Memory Basic Compaction Let’s talk about the details of the in-memory compaction. I am going to refresh briefly how read and writes work in Hbase.

In-Memory Flush We present cell segments Key to Cell Map (Index) Cells’ Data We present cell segments a set of cells inserted in a period of time MemStore is a collection of segments active segment active segment

Segments in MemStore We present cell segments Key to Cell Map (Index) Cells’ Data We present cell segments a set of cells inserted in a period of time MemStore is a collection of segments Immutable segments are simpler inherently thread-safe active segment immutable segment snapshot segment

Hbase In-Memory Flush Hfiles CompactingMemStore Block cache active in-memory-flush WAL Compaction pipeline Hfiles flush Snap-shot memory HDFS New compaction pipeline: Active segment flushed to pipeline Flush to disk only when needed

Efficient index representation Block cache active in-memory-flush WAL Hfiles flush Snap-shot memory HDFS

Efficient index representation Block cache Cm in-memory-flush WAL Hfiles flush memory HDFS

Efficient index representation Mutable Segment Concurrent Skip-List Map The “ordering” engine Cell redirecting objects Hhjjuuyrqaass iutkldfk;wjt;wiej;iwjerg;iopppqqqyrtaaajeutkiyt Jkkgkykytktgkg;diwpoqqqaaabHfilesddeiuyoweuoweiuoieu qqqwertyuioasdfghjklrtyuioplkjhgfpppwwwmnbvcmnb Jkkgkykytktgkg;diwpjjjjjjoooooooqqqaaabbyfjtdhghfhfngfhfgcHfilesddeiuyoweuoweiuoieu Bytes for the entire Cell’s information MSLAB JAVA HEAP Hhjjuuyrqaass iutkldfk;wjt;wiej;iwjerg;iopppqqqyrtaaajeutkiyt Jkkgkykytktgkg;diwpoqqqaaabbbccHfilesddeiuyoweuoweiuo Poiuytrewqasdfaaabbbccceeeghjkl;;mnppppbvcxqqaaaxcvb qqqwertyuioasdfghjklrtyuioplkjhgfpppwwwmnbvcmnb Jkdddfkgbbbkykytktgkg;diwpoqqqaaabbbccHfilesddeiuyuoweiuoieu

No Dynamic Ordering for Immutable Segment Hhjjuuyrqaass iutkldfk;wjt;wiej;iwjerg;iopppqqqyrtaaajeutkiyt Jkkgkykytktgkg;diwpoqqqaaabHfilesddeiuyoweuoweiuoieu qqqwertyuioasdfghjklrtyuioplkjhgfpppwwwmnbvcmnb Jkkgkykytktgkg;diwpjjjjjjoooooooqqqaaabbhghfhfngfhfgbccHfilesddeiuyoweuoweiuoieu MSLAB JAVA HEAP Hhjjuuyrqaass iutkldfk;wjt;wiej;iwjerg;iopppqqqyrtaaajeutkiyt Jkkgkykytktgkg;diwpoqqqaaabbbccHfilesddeiuyoweuoweiu Poiuytrewqasdfaaabbbccceeeghjkl;;mnppppbvcxqqaaaxcvb qqqwertyuioasdfghjklrtyuioplkjhgfpppwwwmnbvcmnb Jkdddfkgbbbkykytktgkg;diwpoqqqaaabbbccHsddeiuyoweuoweiuoieu

No Dynamic Ordering for Immutable Segment Hhjjuuyrqaass iutkldfk;wjt;wiej;iwjerg;iopppqqqyrtaaajeutkiyt Jkkgkykytktgkg;diwpoqqqaaabHfilesddeiuyoweuoweiuoieu qqqwertyuioasdfghjklrtyuioplkjhgfpppwwwmnbvcmnb Jkkgkykytktgkg;diwpjjjjjjoooooooqqqaaabbhghfhfngfhfgbccHfilesddeiuyoweuoweiuoieu MSLAB JAVA HEAP Hhjjuuyrqaass iutkldfk;wjt;wiej;iwjerg;iopppqqqyrtaaajeutkiyt Jkkgkykytktgkg;diwpoqqqaaabbbccesddeiuyoweuoweiuoieu Poiuytrewqasdfaaabbbccceeeghjkl;;mnppppbvcxqqaaaxcvb qqqwertyuioasdfghjklrtyuioplkjhgfpppwwwmnbvcmnb Jkdddfkgbbbkykytktgkg;diwpoqqqaabccHfilesddeiuyoweuoweiuoieu

No Dynamic Ordering for Immutable Segment New Immutable Segment Cell Array binary search to the cell’s info Hhjjuuyrqaass iutkldfk;wjt;wiej;iwjerg;iopppqqqyrtaaajeutkiyt Jkkgkykytktgkg;diwpoqqqaaabHfilesddeiuyoweuoweiuoieu qqqwertyuioasdfghjklrtyuioplkjhgfpppwwwmnbvcmnb Jkkgkykytktgkg;diwpjjjjjjoooooooqqqaaabbhghfhfngfhfgbccHfilesddeiuyoweuoweiuoieu MSLAB JAVA HEAP Hhjjuuyrqaass iutkldfk;wjt;wiej;iwjerg;iopppqqqyrtaaajeutkiyt Jkkgkykytktgkg;diwpoqqqaaabbbccHsddeiuyoweuoweiuoieu Poiuytrewqasdfaaabbbccceeeghjkl;;mnppppbvcxqqaaaxcvb qqqwertyuioasdfghjklrtyuioplkjhgfpppwwwmnbvcmnb Jfkgbbbkykytktgkg;diwpoqqqaaabbbccHfilesddeiuyoweuoweiuoieu

Exploit the Immutability of a Segment after In-Memory Flush New Design: Flat layout for immutable segments index Less overhead per cell Manage (allocate, store, release) data buffers off-heap Pros Better utilization of memory and CPU Locality in access to index Reduce memory fragmentation

Outline In-Memory Eager Compaction Motivation & Background In-Memory Basic Compaction In-Memory Eager Compaction Evaluation

Hbase In-Memory Compaction CompactingMemStore Block cache active in-memory compaction in-memory-flush WAL Compaction pipeline Hfiles flush Snap-shot memory HDFS New compaction pipeline: Active segment flushed to pipeline Pipeline segments compacted in memory Flush to disk only when needed

Remove the Duplicates of Cells while still in memory New Design: eliminate redundancy while still in memory Less I/O upon flush Less work for on disk compaction Pros Data stays in memory for longer Larger files flushed to disk Less write amplification It includes index compaction

Three policies to compact a MemStore NONE No CompactingMemStore is used  everything as before BASIC CompactingMemStore only with flattening (changing the index) Each active segment is flattened upon in-memory flush Multiple segments remain in the compaction pipeline EAGER CompactingMemStore with data compaction Each active segment is compacted with compaction pipeline upon in-memory flush Results of the previous compaction is the single segment in the compaction pipeline

How to use? Globally and per column family By default, all tables apply basic in­memory compaction This global configuration can be overridden in hbase­site.xml, as follows: <property> <name>hbase.hregion.compacting.memstore.type</name> <value>< none | eager ></value> </property>

How to use per column family? The policy can be also configured in Hbase shell per column family: create ‘<tablename>’,{NAME => ‘<cfname>’, IN_MEMORY_COMPACTION => ‘ <NONE|BASIC|EAGER>’

Outline In-Memory Compaction Evaluation Motivation & Background In-Memory Basic Compaction In-Memory Eager Compaction Evaluation In-Memory Compaction Evaluation

Metrics Write Throughput Write Latency Write Amplification Data written to disk / data submitted by the client Read Latency

Framework Benchmarking tool: YCSB Data: 1 table, 100 regions (50 per RS) Cell size 100 byte Cluster 3 SSD machines, 48GB RAM, 12 cores Hbase: 2RS + Master. HDFS: 3 datanodes. Configuration: 16GB heap, from which 40% allocated to MemStore and 40% to Block Cache 10 flush threads

Workload #1: 100% Writes Focus: Write throughput, latency, amplification Zipfian distribution sampled from 200M keys 100GB data written 20 threads driving workload

Performance

Data Written

Write Amplification Policy Write Amplification Improvement None 2.201 Basic 1.865 15% Eager 1.743 21%

A Very Different Experiment … HDD hardware 1 table, 1 region small heap (1G, simulating contention at a large scale) 5G data 50% writes, 50% reads

Zoom-In Read Latency over Time

Zoom-In Read Latency over Time

Zoom-In Read Latency over Time

Read Latency

Summary Targeted for HBase 2.0.0 New design pros over default implementation Less compaction on disk reduces write amplification effect Less disk I/O and network traffic reduces load on HDFS New space efficient index representation Reducing retrieval latency by serving (mainly) from memory We would like to thank the reviewers Michael Stack, Anoop Sam John, Ramkrishna s. Vasudevan, Ted Yu

Thank You!