Amit Ohayon, seminar in databases, 2017

Amit Ohayon, seminar in databases, 2017
HBase Amit Ohayon, seminar in databases, 2017

What is HBase? TL;DR: HBase is a distributed column(-family)-oriented database built on top of HDFS. The longer version, from the project’s main page: “Apache HBase™ is the Hadoop database, a distributed, scalable, big data store. “… This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware. “Apache HBase is an open-source, distributed, versioned, non-relational database modeled after Google's Bigtable…”

So, What is BigTable? A BigTable is a sparse, distributed, persistent multidimensional sorted map. Lets break it down, one word at a time…

So, What is BigTable? Map – A datatype composed of keys and values, where each key is associated with a value. Persistent – The state of a BigTable outlives the process that created it. Distributed – BigTable is built upon Google File System (GFS), which replicates its data across several nodes (somewhat like RAID systems). Sorted – The keys of the map are ordered alphabetically (or rather – byte ordered). Multidimensional Sparse

So, What is HBase? Map – A datatype composed of keys and values, where each key is associated with a value. Persistent – The state of an HBase table outlives the process that created it. Distributed – HBase is built upon Hadoop File System (HDFS), which replicates its data across several nodes (somewhat like RAID systems). Sorted – The keys of the map are ordered alphabetically (or rather – byte ordered). Multidimensional Sparse

Some Terminology Table - An HBase table consists of multiple rows.
Row - A row in HBase consists of a row key and one or more columns with values associated with them. Rows are sorted alphabetically by the row key as they are stored. Column - A column in HBase consists of a column family and a column qualifier, which are delimited by a : (colon) character. Column Family - Column families physically colocate a set of columns and their values. Column Qualifier - A column qualifier is added to a column family to provide the index for a given piece of data. Cell - A cell is a combination of row, column family, and column qualifier, and contains a value and a timestamp, which represents the value’s version. A timestamp is written alongside each value, and is the identifier for a given version of a value.

An HBase Table Can be viewed as a multidimensional map
Row Key (Ordered) An HBase Table Column Qualifier Cell (Timestamp, Value) Can be viewed as a multidimensional map Column Family

An HBase Table Can be viewed as a multidimensional map
Or as some kind of complex table

A Word About Sparseness
Notice the empty cell in row 00…02, column info:geo A given row can have any number of columns in each column family, or none at all. There also exists a row-based sparseness, which means HBase allows gaps between keys

Data Model Operations Four primary data model operations:
Get - returns attributes for a specified row. Put - either adds new rows to a table (if the key is new) or can update existing rows (if the key already exists). Scan - allow iteration over multiple rows for specified attributes. Delete - removes a row from a table. Other operations: batch put, incrementColumnValue, checkAndPut.

Guarantees – ACID? HBase is not ACID-compliant, but does guarantee:
Atomicity All mutations are atomic within a row. Any put will either wholely succeed or wholely fail. Mutations to several rows will not be atomic across the multiple rows. For example, a multiput that operates on rows 'a', 'b', and 'c' may return having mutated some but not all of the rows. The order of mutations is seen to happen in a well-defined order for each row, with no interleaving. User 1: write "a=1,b=1,c=1“. User 2: write "a=2,b=2,c=2“. Result: either "a=1,b=1,c=1" or "a=2,b=2,c=2“.

Consistency and Isolation Singular row consistency and Isolation - All rows returned via any access API will consist of a complete row that existed at some point in the table's history. Consistency of several rows – the previous guarantee does not hold for scans, and different rows from several points in history can be returned on a scan operation. User 1: write “a=1” in row 1, then scans the table. User 2: write “a=0” in row 1, then “a=1” in row 2. Result can be “a=1” in rows 1 and 2 even though there is no such history.

Durability All visible data is also durable data. That is to say, a read will never return data that has not been made durable on disk. Any operation that returns a "success" code (e.g. does not throw an exception) will be made durable. Any operation that returns a "failure" code will not be made durable (subject to the Atomicity guarantees above). All reasonable failure scenarios will not affect any of the guarantees above.

Diving Deeper The architecture of hbase

Column Family All column family members have a common prefix (info:format and info:geo are members of the info column family, whereas contents:image belongs to the contents family). The prefix must be composed of printable characters (as opposed to row keys and column qualifiers). Column families must be specified up front as part of the table schema definition But new column family members (column qualifiers) can be added on demand. Physically, all column family members are stored together on the filesystem. Tuning and optimizations are done at the column family level, so it is advised that all column family members have the same general access pattern and size characteristics. HBase currently does not do well with anything above two or three column families. Only introduce a second and third column family in the case where you usually query one column family or the other, but not both at the same time.

Region Table Region Store MemStore StoreFile Block …

Region Regions are the basic element of availability and distribution for tables. Tables are automatically partitioned horizontally by HBASE into regions. A region is denoted by the table it belongs to, its first row (inclusive), and its last row (exclusive). When a region grows over a configurable threshold, it splits into two (approximately) equal regions. Regions are the units that get distributed over an HBase cluster. This way the table is distributed across several servers – benefits both storage requirements and load balancing. The online set of sorted regions comprises the table’s total content.

Inside a Region Store MemStore StoreFile Block …

Inside a Region Each Column Family has its own Store, which is composed of a single MemStore and several StoreFiles. MemStore – When a write arrives, it is added to an in-memory MemStore. When a memstore fills, its content is flushed to the filesystem. Row locks are only needed here – Read operations first look for the data in MemStore and only then in StoreFiles. If an old data is present, we lock and update, otherwise we just write the new data to the MemStore. StoreFile are where your data lives These files are persistent and are stored on HDFS. HDFS takes care of replication of the data (default – one replica on local rack, two replicas on different nodes of a remote rack).

RegionServer Hosts zero or more regions.
Serves client read/write requests. Manages region splits. When a write arrives at a RegionServer, it first appends it to a commit log (Write Ahead Log - WAL) and only after the write succeeds it is added to the in-memory MemStore. The WAL is hosted on HDFS, so it remains available through a RegionServer crash. If writing to the WAL fails, the entire operation to modify the data fails.

Master The HBase master is responsible for assigning regions to registered RegionServers and for recovering RegionServer failures. The master node is lightly loaded. When the master notices that a RegionServer is no longer reachable (how?), it splits the dead RegionServer’s commit log by region.

hbase:meta The hbase:meta table keeps a list of all regions in the system. It is a table just like any other (although hidden from HBase’s shell). The key’s format is ([table],[region start key],[region id]). Note the usage of the key order feature! The data itself is less important for us.

A Side Note on ZooKeeper
ZooKeeper is Apache’s project for enabling highly reliable distributed coordination. Can tell which nodes are online, Can host small chunks of data (usually text, metadata), And generally, manage distributed systems in a simpler way. More on that in Oren’s lecture!

All Together Now! Fresh clients connect to the ZooKeeper cluster first to learn the location of hbase:meta. The client then does a lookup against the appropriate hbase:meta region to figure out the hosting user-space region and its location. Thereafter, the client interacts directly with the hosting RegionServer.

Some Caching The previously described process takes three round-trips per row operation – ZooKeeper, then hbase:meta and then a region. What can we do to save time? Cache! Users cache locations as well as user-space region start and stop rows, so they can figure out hosting regions themselves without having to go back to the hbase:meta table. Clients continue to use the cached entries as they work, until there is a fault. When this happens—i.e., when the region has moved, the client consults the hbase:meta table again to learn the new location. If the consulted hbase:meta region has moved, then ZooKeeper is reconsulted.

Worth Mentioning Read Replicas – Some applications prefer higher availability over stronger consistency. For them, Read Replicas were introduced. These replicas offer higher availability at the price of possibly stale reads (timeline consistency). Cluster Replication – support for synchronization between clusters. It uses a source-push methodology, where each cluster push its updates to the WALs of other clusters. HBase’s replication system provides at-least-once delivery of client edits, and does not guarantee message ordering.

Thank You for Listening!
Questions? Thank You for Listening!

Amit Ohayon, seminar in databases, 2017

Similar presentations

Presentation on theme: "Amit Ohayon, seminar in databases, 2017"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Amit Ohayon, seminar in databases, 2017

Similar presentations

Presentation on theme: "Amit Ohayon, seminar in databases, 2017"— Presentation transcript:

Similar presentations

About project

Feedback