HBase Accelerated: In-Memory Flush and Compaction

HBase Accelerated: In-Memory Flush and Compaction
Eshcar Hillel, Anastasia Braginsky, Edward Bortnikov ⎪ Hbase Meetup, San Francisco, December 8, 2016 Hello Everybody! My name is Anastasia and I am from Yahoo Research. Today, I am going to present you two new enhancements for the in-memory component in Hbase and how they can affect the HBase performance. This is a work we have done together with Eshcar Hillel and Eddie Bortnikov, both from Yahoo Research.

Outline Motivation & Background In-Memory Basic Compaction
In-Memory Eager Compaction Evaluation Here is the plan for this talk: First, I am going to give you some background and motivation for what we have done And what we have done, IN A NUTSHELL, is (first) compact the data in-memory and (second) reduce per-cell overhead. In both cases the changes are only in MemStore component. So after the background I am going to explain how we apply the in-memory compaction and what do we gain from it. And the last item is about the twist that we gave to the conventional index in the MemStore component.

HBase Accelerated: Mission Definition
Goal: Real-time performance in persistent KV-stores How: Use the memory more efficiently  less I/O So what do we want? We want better performance! Just like everyone…  And how do we plan to get the performance? We plan to use less memory. I.e. we plan to squeeze into the memory more then previously and so the first flush to disk will be delayed.

Use the memory more efficiently  less I/O
Upon write request: data is first written into the MemStore, when certain memory limit is met MemStore data is flushed into Hfile. write 4 3 2 1 MemStore Hfile memory HDFS

Upon write request: data is first written into the MemStore, when certain memory limit is met MemStore data is flushed into Hfile. write 6 5 4 3 2 1 MemStore Hfile memory HDFS

Due to efficient usage we let MemStore to keep more data in the same limited memory, so less flushes to disk are required write 6 5 4 3 2 1 MemStore Hfile memory HDFS

Prolong in-memory lifetime, before flushing to disk
Reduce amount of new Hfiles Reduce write amplification effect (overall I/O) Reduce retrieval latencies

HBase Accelerated: Two Basic Ideas
In-Memory Index Compaction Reduce the index memory footprint, less overhead per cell In-Memory Data Compaction Exploit redundancies in the workload to eliminate duplicates in memory As I have already said in order to squeeze more into memory we have two ideas. First one – the compaction – which means eliminating in-memory duplications. So in case the same key is updated again and again its obsolete versions can be removed already in memory. No need to flush to the disk and compact there. Following this idea the more duplications you have, the more you are going to enjoy the in-memory compaction. And the second idea – the reduction – what can we reduce? If cell is written and it is not a duplication, then it needs to be there, right? Yes this is right. We reduce just the overhead per cell, we reduce the metadata that serves the cell. And in this case the smaller the cells are, the better the relative performance is. So what do we get? We try to keep the MemStore component small and so it takes longer untll the flush to disk happens. Every Computer Science student knows if you can use less memory – it is better. And also everybody knows that to stay in RAM is better than getting to disk. But I want to pay your attention that there are little more benefits. Delayed flush to disk means less files on disk and less compactions on disk – and those on-disk compactions are expensive. It takes us disk bendwidth , CPU power, network between machines in the HDFS cluster, etc.

on-disk compaction HBase Writes write Hfiles
DefaultMemStore WAL write active Hfiles prepare-for-flush flush on-disk compaction Snap-shot memory HDFS Random writes are absorbed in an active segment When active segment is full Becomes immutable segment (snapshot) A new mutable (active) segment serves writes Flushed to disk, truncate WAL On-Disk compaction reads a few files, merge-sorts them, writes back new files

Outline In-Memory Basic Compaction Motivation & Background
In-Memory Eager Compaction Evaluation In-Memory Basic Compaction Let’s talk about the details of the in-memory compaction. I am going to refresh briefly how read and writes work in Hbase.

In-Memory Flush We present cell segments
Key to Cell Map (Index) Cells’ Data We present cell segments a set of cells inserted in a period of time MemStore is a collection of segments active segment active segment

Segments in MemStore We present cell segments
Key to Cell Map (Index) Cells’ Data We present cell segments a set of cells inserted in a period of time MemStore is a collection of segments Immutable segments are simpler inherently thread-safe active segment immutable segment snapshot segment

Hbase In-Memory Flush Hfiles CompactingMemStore
Block cache active in-memory-flush WAL Compaction pipeline Hfiles flush Snap-shot memory HDFS New compaction pipeline: Active segment flushed to pipeline Flush to disk only when needed

Efficient index representation
Block cache active in-memory-flush WAL Hfiles flush Snap-shot memory HDFS

Block cache Cm in-memory-flush WAL Hfiles flush memory HDFS

Mutable Segment Concurrent Skip-List Map The “ordering” engine Cell redirecting objects Hhjjuuyrqaass iutkldfk;wjt;wiej;iwjerg;iopppqqqyrtaaajeutkiyt Jkkgkykytktgkg;diwpoqqqaaabHfilesddeiuyoweuoweiuoieu qqqwertyuioasdfghjklrtyuioplkjhgfpppwwwmnbvcmnb Jkkgkykytktgkg;diwpjjjjjjoooooooqqqaaabbyfjtdhghfhfngfhfgcHfilesddeiuyoweuoweiuoieu Bytes for the entire Cell’s information MSLAB JAVA HEAP Hhjjuuyrqaass iutkldfk;wjt;wiej;iwjerg;iopppqqqyrtaaajeutkiyt Jkkgkykytktgkg;diwpoqqqaaabbbccHfilesddeiuyoweuoweiuo Poiuytrewqasdfaaabbbccceeeghjkl;;mnppppbvcxqqaaaxcvb qqqwertyuioasdfghjklrtyuioplkjhgfpppwwwmnbvcmnb Jkdddfkgbbbkykytktgkg;diwpoqqqaaabbbccHfilesddeiuyuoweiuoieu

No Dynamic Ordering for Immutable Segment
Hhjjuuyrqaass iutkldfk;wjt;wiej;iwjerg;iopppqqqyrtaaajeutkiyt Jkkgkykytktgkg;diwpoqqqaaabHfilesddeiuyoweuoweiuoieu qqqwertyuioasdfghjklrtyuioplkjhgfpppwwwmnbvcmnb Jkkgkykytktgkg;diwpjjjjjjoooooooqqqaaabbhghfhfngfhfgbccHfilesddeiuyoweuoweiuoieu MSLAB JAVA HEAP Hhjjuuyrqaass iutkldfk;wjt;wiej;iwjerg;iopppqqqyrtaaajeutkiyt Jkkgkykytktgkg;diwpoqqqaaabbbccHfilesddeiuyoweuoweiu Poiuytrewqasdfaaabbbccceeeghjkl;;mnppppbvcxqqaaaxcvb qqqwertyuioasdfghjklrtyuioplkjhgfpppwwwmnbvcmnb Jkdddfkgbbbkykytktgkg;diwpoqqqaaabbbccHsddeiuyoweuoweiuoieu

Hhjjuuyrqaass iutkldfk;wjt;wiej;iwjerg;iopppqqqyrtaaajeutkiyt Jkkgkykytktgkg;diwpoqqqaaabHfilesddeiuyoweuoweiuoieu qqqwertyuioasdfghjklrtyuioplkjhgfpppwwwmnbvcmnb Jkkgkykytktgkg;diwpjjjjjjoooooooqqqaaabbhghfhfngfhfgbccHfilesddeiuyoweuoweiuoieu MSLAB JAVA HEAP Hhjjuuyrqaass iutkldfk;wjt;wiej;iwjerg;iopppqqqyrtaaajeutkiyt Jkkgkykytktgkg;diwpoqqqaaabbbccesddeiuyoweuoweiuoieu Poiuytrewqasdfaaabbbccceeeghjkl;;mnppppbvcxqqaaaxcvb qqqwertyuioasdfghjklrtyuioplkjhgfpppwwwmnbvcmnb Jkdddfkgbbbkykytktgkg;diwpoqqqaabccHfilesddeiuyoweuoweiuoieu

New Immutable Segment Cell Array binary search to the cell’s info Hhjjuuyrqaass iutkldfk;wjt;wiej;iwjerg;iopppqqqyrtaaajeutkiyt Jkkgkykytktgkg;diwpoqqqaaabHfilesddeiuyoweuoweiuoieu qqqwertyuioasdfghjklrtyuioplkjhgfpppwwwmnbvcmnb Jkkgkykytktgkg;diwpjjjjjjoooooooqqqaaabbhghfhfngfhfgbccHfilesddeiuyoweuoweiuoieu MSLAB JAVA HEAP Hhjjuuyrqaass iutkldfk;wjt;wiej;iwjerg;iopppqqqyrtaaajeutkiyt Jkkgkykytktgkg;diwpoqqqaaabbbccHsddeiuyoweuoweiuoieu Poiuytrewqasdfaaabbbccceeeghjkl;;mnppppbvcxqqaaaxcvb qqqwertyuioasdfghjklrtyuioplkjhgfpppwwwmnbvcmnb Jfkgbbbkykytktgkg;diwpoqqqaaabbbccHfilesddeiuyoweuoweiuoieu

Exploit the Immutability of a Segment after In-Memory Flush
New Design: Flat layout for immutable segments index Less overhead per cell Manage (allocate, store, release) data buffers off-heap Pros Better utilization of memory and CPU Locality in access to index Reduce memory fragmentation

Outline In-Memory Eager Compaction Motivation & Background
In-Memory Basic Compaction In-Memory Eager Compaction Evaluation

Hbase In-Memory Compaction
CompactingMemStore Block cache active in-memory compaction in-memory-flush WAL Compaction pipeline Hfiles flush Snap-shot memory HDFS New compaction pipeline: Active segment flushed to pipeline Pipeline segments compacted in memory Flush to disk only when needed

Remove the Duplicates of Cells while still in memory
New Design: eliminate redundancy while still in memory Less I/O upon flush Less work for on disk compaction Pros Data stays in memory for longer Larger files flushed to disk Less write amplification It includes index compaction

Three policies to compact a MemStore
NONE No CompactingMemStore is used  everything as before BASIC CompactingMemStore only with flattening (changing the index) Each active segment is flattened upon in-memory flush Multiple segments remain in the compaction pipeline EAGER CompactingMemStore with data compaction Each active segment is compacted with compaction pipeline upon in-memory flush Results of the previous compaction is the single segment in the compaction pipeline

How to use? Globally and per column family
By default, all tables apply basic inmemory compaction This global configuration can be overridden in hbasesite.xml, as follows: <property> <name>hbase.hregion.compacting.memstore.type</name> <value>< none | eager ></value> </property>

How to use per column family?
The policy can be also configured in Hbase shell per column family: create ‘<tablename>’,{NAME => ‘<cfname>’, IN_MEMORY_COMPACTION => ‘ <NONE|BASIC|EAGER>’

Outline In-Memory Compaction Evaluation Motivation & Background
In-Memory Basic Compaction In-Memory Eager Compaction Evaluation In-Memory Compaction Evaluation

Metrics Write Throughput Write Latency Write Amplification
Data written to disk / data submitted by the client Read Latency

Framework Benchmarking tool: YCSB Data:
1 table, 100 regions (50 per RS) Cell size 100 byte Cluster 3 SSD machines, 48GB RAM, 12 cores Hbase: 2RS + Master. HDFS: 3 datanodes. Configuration: 16GB heap, from which 40% allocated to MemStore and 40% to Block Cache 10 flush threads

Workload #1: 100% Writes Focus: Write throughput, latency, amplification Zipfian distribution sampled from 200M keys 100GB data written 20 threads driving workload

Performance

Data Written

Write Amplification Policy Write Amplification Improvement None 2.201
Basic 1.865 15% Eager 1.743 21%

A Very Different Experiment …
HDD hardware 1 table, 1 region small heap (1G, simulating contention at a large scale) 5G data 50% writes, 50% reads

Zoom-In Read Latency over Time

Read Latency

Summary Targeted for HBase 2.0.0
New design pros over default implementation Less compaction on disk reduces write amplification effect Less disk I/O and network traffic reduces load on HDFS New space efficient index representation Reducing retrieval latency by serving (mainly) from memory We would like to thank the reviewers Michael Stack, Anoop Sam John, Ramkrishna s. Vasudevan, Ted Yu

Thank You!

HBase Accelerated: In-Memory Flush and Compaction

Similar presentations

Presentation on theme: "HBase Accelerated: In-Memory Flush and Compaction"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

HBase Accelerated: In-Memory Flush and Compaction

Similar presentations

Presentation on theme: "HBase Accelerated: In-Memory Flush and Compaction"— Presentation transcript:

Similar presentations

About project

Feedback