Amit Ohayon, seminar in databases, 2017

Slides:



Advertisements
Similar presentations
HBase. OUTLINE Basic Data Model Implementation – Architecture of HDFS Hbase Server HRegionServer 2.
Advertisements

CS525: Special Topics in DBs Large-Scale Data Management HBase Spring 2013 WPI, Mohamed Eltabakh 1.
Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
-A APACHE HADOOP PROJECT
7/2/2015EECS 584, Fall Bigtable: A Distributed Storage System for Structured Data Jing Zhang Reference: Handling Large Datasets at Google: Current.
Google File System.
Bigtable: A Distributed Storage System for Structured Data
Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Gowtham Rajappan. HDFS – Hadoop Distributed File System modeled on Google GFS. Hadoop MapReduce – Similar to Google MapReduce Hbase – Similar to Google.
Thanks to our Sponsors! To connect to wireless 1. Choose Uguest in the wireless list 2. Open a browser. This will open a Uof U website 3. Choose Login.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
Bigtable: A Distributed Storage System for Structured Data F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach M. Burrows, T. Chandra, A. Fikes, R.E.
1 The Google File System Reporter: You-Wei Zhang.
Software Engineer, #MongoDBDays.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
SOFTWARE SYSTEMS DEVELOPMENT MAP-REDUCE, Hadoop, HBase.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Google’s Big Table 1 Source: Chang et al., 2006: Bigtable: A Distributed Storage System for Structured Data.
1 Dennis Kafura – CS5204 – Operating Systems Big Table: Distributed Storage System For Structured Data Sergejs Melderis 1.
+ Hbase: Hadoop Database B. Ramamurthy. + Motivation-0 Think about the goal of a typical application today and the data characteristics Application trend:
Big Table - Slides by Jatin. Goals wide applicability Scalability high performance and high availability.
Key/Value Stores CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook.
1 HBase Intro 王耀聰 陳威宇
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation MongoDB Architecture.
CS 347Lecture 9B1 CS 347: Parallel and Distributed Data Management Notes 13: BigTable, HBASE, Cassandra Hector Garcia-Molina.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface.
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture Chunkservers Master Consistency Model File Mutation Garbage.
Bigtable: A Distributed Storage System for Structured Data
1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN, Geneva James Kinley EMEA Solutions Architect, Cloudera.
Data Model and Storage in NoSQL Systems (Bigtable, HBase) 1 Slides from Mohamed Eltabakh.
Bigtable: A Distributed Storage System for Structured Data Google Inc. OSDI 2006.
Apache Accumulo CMSC 491 Hadoop-Based Distributed Computing Spring 2016 Adam Shook.
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
Bigtable A Distributed Storage System for Structured Data.
Presenter: Yue Zhu, Linghan Zhang A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: a Case Study by PowerPoint.
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
1 Gaurav Kohli Xebia Breaking with DBMS and Dating with Relational Hbase.
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
and Big Data Storage Systems
Column-Based.
HBase Mohamed Eltabakh
Hadoop.
Software Systems Development
How did it start? • At Google • • • • Lots of semi structured data
INTRODUCTION TO PIG, HIVE, HBASE and ZOOKEEPER
Google File System.
CLOUDERA TRAINING For Apache HBase
CSE-291 (Cloud Computing) Fall 2016
Gowtham Rajappan.
CHAPTER 3 Architectures for Distributed Systems
Google Filesystem Some slides taken from Alan Sussman.
NOSQL databases and Big Data Storage Systems
Ministry of Higher Education
GARRETT SINGLETARY.
آزمايشگاه سيستمهای هوشمند علی کمالی زمستان 95
Introduction to Apache
Hbase – NoSQL Database Presented By: 13MCEC13.
Cloud Computing for Data Analysis Pig|Hive|Hbase|Zookeeper
THE GOOGLE FILE SYSTEM.
by Mikael Bjerga & Arne Lange
Pig Hive HBase Zookeeper
Presentation transcript:

Amit Ohayon, seminar in databases, 2017 HBase Amit Ohayon, seminar in databases, 2017

What is HBase? TL;DR: HBase is a distributed column(-family)-oriented database built on top of HDFS. The longer version, from the project’s main page: “Apache HBase™ is the Hadoop database, a distributed, scalable, big data store. “… This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware. “Apache HBase is an open-source, distributed, versioned, non-relational database modeled after Google's Bigtable…”

So, What is BigTable? A BigTable is a sparse, distributed, persistent multidimensional sorted map. Lets break it down, one word at a time…

So, What is BigTable? Map – A datatype composed of keys and values, where each key is associated with a value. Persistent – The state of a BigTable outlives the process that created it. Distributed – BigTable is built upon Google File System (GFS), which replicates its data across several nodes (somewhat like RAID systems). Sorted – The keys of the map are ordered alphabetically (or rather – byte ordered). Multidimensional Sparse

So, What is BigTable? Map – A datatype composed of keys and values, where each key is associated with a value. Persistent – The state of a BigTable outlives the process that created it. Distributed – BigTable is built upon Google File System (GFS), which replicates its data across several nodes (somewhat like RAID systems). Sorted – The keys of the map are ordered alphabetically (or rather – byte ordered). Multidimensional Sparse

So, What is HBase? Map – A datatype composed of keys and values, where each key is associated with a value. Persistent – The state of an HBase table outlives the process that created it. Distributed – HBase is built upon Hadoop File System (HDFS), which replicates its data across several nodes (somewhat like RAID systems). Sorted – The keys of the map are ordered alphabetically (or rather – byte ordered). Multidimensional Sparse

Some Terminology Table - An HBase table consists of multiple rows. Row - A row in HBase consists of a row key and one or more columns with values associated with them. Rows are sorted alphabetically by the row key as they are stored. Column - A column in HBase consists of a column family and a column qualifier, which are delimited by a : (colon) character. Column Family - Column families physically colocate a set of columns and their values. Column Qualifier - A column qualifier is added to a column family to provide the index for a given piece of data. Cell - A cell is a combination of row, column family, and column qualifier, and contains a value and a timestamp, which represents the value’s version. A timestamp is written alongside each value, and is the identifier for a given version of a value.

An HBase Table Can be viewed as a multidimensional map Row Key (Ordered) An HBase Table Column Qualifier Cell (Timestamp, Value) Can be viewed as a multidimensional map Column Family

An HBase Table Can be viewed as a multidimensional map Or as some kind of complex table

A Word About Sparseness Notice the empty cell in row 00…02, column info:geo A given row can have any number of columns in each column family, or none at all. There also exists a row-based sparseness, which means HBase allows gaps between keys

Data Model Operations Four primary data model operations: Get - returns attributes for a specified row. Put - either adds new rows to a table (if the key is new) or can update existing rows (if the key already exists). Scan - allow iteration over multiple rows for specified attributes. Delete - removes a row from a table. Other operations: batch put, incrementColumnValue, checkAndPut.

Guarantees – ACID? HBase is not ACID-compliant, but does guarantee: Atomicity All mutations are atomic within a row. Any put will either wholely succeed or wholely fail. Mutations to several rows will not be atomic across the multiple rows. For example, a multiput that operates on rows 'a', 'b', and 'c' may return having mutated some but not all of the rows. The order of mutations is seen to happen in a well-defined order for each row, with no interleaving. User 1: write "a=1,b=1,c=1“. User 2: write "a=2,b=2,c=2“. Result: either "a=1,b=1,c=1" or "a=2,b=2,c=2“.

Guarantees – ACID? HBase is not ACID-compliant, but does guarantee: Consistency and Isolation Singular row consistency and Isolation - All rows returned via any access API will consist of a complete row that existed at some point in the table's history. Consistency of several rows – the previous guarantee does not hold for scans, and different rows from several points in history can be returned on a scan operation. User 1: write “a=1” in row 1, then scans the table. User 2: write “a=0” in row 1, then “a=1” in row 2. Result can be “a=1” in rows 1 and 2 even though there is no such history.

Guarantees – ACID? HBase is not ACID-compliant, but does guarantee: Durability All visible data is also durable data. That is to say, a read will never return data that has not been made durable on disk. Any operation that returns a "success" code (e.g. does not throw an exception) will be made durable. Any operation that returns a "failure" code will not be made durable (subject to the Atomicity guarantees above). All reasonable failure scenarios will not affect any of the guarantees above.

DEMO

Diving Deeper The architecture of hbase

Column Family All column family members have a common prefix (info:format and info:geo are members of the info column family, whereas contents:image belongs to the contents family). The prefix must be composed of printable characters (as opposed to row keys and column qualifiers). Column families must be specified up front as part of the table schema definition But new column family members (column qualifiers) can be added on demand. Physically, all column family members are stored together on the filesystem. Tuning and optimizations are done at the column family level, so it is advised that all column family members have the same general access pattern and size characteristics. HBase currently does not do well with anything above two or three column families. Only introduce a second and third column family in the case where you usually query one column family or the other, but not both at the same time.

Region Table Region Store MemStore StoreFile Block …

Region Regions are the basic element of availability and distribution for tables. Tables are automatically partitioned horizontally by HBASE into regions. A region is denoted by the table it belongs to, its first row (inclusive), and its last row (exclusive). When a region grows over a configurable threshold, it splits into two (approximately) equal regions. Regions are the units that get distributed over an HBase cluster. This way the table is distributed across several servers – benefits both storage requirements and load balancing. The online set of sorted regions comprises the table’s total content.

Inside a Region Store MemStore StoreFile Block …

Inside a Region Each Column Family has its own Store, which is composed of a single MemStore and several StoreFiles. MemStore – When a write arrives, it is added to an in-memory MemStore. When a memstore fills, its content is flushed to the filesystem. Row locks are only needed here – Read operations first look for the data in MemStore and only then in StoreFiles. If an old data is present, we lock and update, otherwise we just write the new data to the MemStore. StoreFile are where your data lives These files are persistent and are stored on HDFS. HDFS takes care of replication of the data (default – one replica on local rack, two replicas on different nodes of a remote rack).

RegionServer Hosts zero or more regions. Serves client read/write requests. Manages region splits. When a write arrives at a RegionServer, it first appends it to a commit log (Write Ahead Log - WAL) and only after the write succeeds it is added to the in-memory MemStore. The WAL is hosted on HDFS, so it remains available through a RegionServer crash. If writing to the WAL fails, the entire operation to modify the data fails.

Master The HBase master is responsible for assigning regions to registered RegionServers and for recovering RegionServer failures. The master node is lightly loaded. When the master notices that a RegionServer is no longer reachable (how?), it splits the dead RegionServer’s commit log by region.

hbase:meta The hbase:meta table keeps a list of all regions in the system. It is a table just like any other (although hidden from HBase’s shell). The key’s format is ([table],[region start key],[region id]). Note the usage of the key order feature! The data itself is less important for us.

A Side Note on ZooKeeper ZooKeeper is Apache’s project for enabling highly reliable distributed coordination. Can tell which nodes are online, Can host small chunks of data (usually text, metadata), And generally, manage distributed systems in a simpler way. More on that in Oren’s lecture!

All Together Now! Fresh clients connect to the ZooKeeper cluster first to learn the location of hbase:meta. The client then does a lookup against the appropriate hbase:meta region to figure out the hosting user-space region and its location. Thereafter, the client interacts directly with the hosting RegionServer.

Some Caching The previously described process takes three round-trips per row operation – ZooKeeper, then hbase:meta and then a region. What can we do to save time? Cache! Users cache locations as well as user-space region start and stop rows, so they can figure out hosting regions themselves without having to go back to the hbase:meta table. Clients continue to use the cached entries as they work, until there is a fault. When this happens—i.e., when the region has moved, the client consults the hbase:meta table again to learn the new location. If the consulted hbase:meta region has moved, then ZooKeeper is reconsulted.

Worth Mentioning Read Replicas – Some applications prefer higher availability over stronger consistency. For them, Read Replicas were introduced. These replicas offer higher availability at the price of possibly stale reads (timeline consistency). Cluster Replication – support for synchronization between clusters. It uses a source-push methodology, where each cluster push its updates to the WALs of other clusters. HBase’s replication system provides at-least-once delivery of client edits, and does not guarantee message ordering.

Thank You for Listening! Questions? Thank You for Listening!