Hbase – NoSQL Database Presented By: 13MCEC13
What is Hbase? HBase is column-oriented, distributed, scalable,versioned bigdata store. Hbase can manage Stuctured and semi-structured data. It is Databse mangement system runs on the top of HDFS. Hbase uses HDFS for storage.
Installation Download Installing hbase linux Installing hbase windows Www.apache.org/dyn/closer.cgi/hbase Installing hbase linux https://hbase.apache.org/book/quickstart.html Installing hbase windows https://Hbase.apache.org/cygwin.html
HDFS vs. HBase HDFS is a distributed file system that is well suited for storing large files. HDFS Is suited for High Latency operations batch processing Data is primarily accessed through MapReduce Is designed for batch processing and hence doesn’t have a concept of random reads/writes HBase Is built for Low Latency operations. Provides access to single rows from billions of records. Data is accessed through shell commands, Client APIs in Java, REST, Avro or Thrift
Hbase run modes Standalone Hbase doesn't use HDFS. Used local file system. Doesn't provide durability. Distributed Pseudo distributed (Local File System and HDFS) Fully distributed (HDFS)
Hbase architecture
Hmaster and Region Server Manages and monitors cluster. Assign regions to Region Server. Check health of Region Servers. Load balancing. Region Server Contains multiple Regions. Split regions automatically. Handles read-write request. Communicates with client directly.
Zookeeper Zookeeper Keeps track of region servers in Hbase Recover region server crashes. Master gets details of Region Servers by contaction Zookeeper.
Hbase Data Model Data Model in Hbase is designed to accomodate Semi- structured data which varies in size,data type,columns. Data model makes it easier to partition data and distribute it across the cluster.
Data model elements Data model consistes of Tables Rows Column families Columns Cells Version
( row , column family , column, timestamp )-> value
Hbase features Horizontal Scalability Consistent read write Automatic Sharding Automatic failover support between Region Servers
Jruby-based Shell COMMAND GROUPS: 1) Group name: general Commands: version, whoami 2)Group name: ddl Commands: alter, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, is_disabled, is_enabled, list 3) Group name: dml Commands: count, delete, deleteall, get, get_counter, incr, put, scan, truncate
Contd.. 4) Group name: security Commands: grant, revoke you can get detailed help for group : help 'security' you can get detailed help for commands : help 'grant'
Basic Shell Create table and column family Create 'table' , 'f1','f2' Create 'table' , { NAME=>'f1'},{ NAME=>'f2'} Add column family to table hbase> alter 't1', NAME => 'f1', VERSIONS => 5 To delete the 'f1' column family in table 't1', do: hbase> alter 't1', NAME => 'f1', METHOD => 'delete' or hbase> alter 't1', 'delete' => 'f1'
Contd. Manually Insert Data into Hbase create 'cars', 'vi' Let’s insert 3 column qualifies (make, model, year) and the associated values into the first row (row1). 1) put 'cars', 'row1', 'vi:make', 'bmw', timestamp put 'cars', 'row1', 'vi:model', '5 series' put 'cars', 'row1', 'vi:year', '2012' 2) put 'cars', 'row2', 'vi:make', 'mercedes' put 'cars', 'row2', 'vi:model', 'e class'
Contd. Scan a Table Scan 'cars' scan 'cars', {COLUMNS => ['vi:make']} Get A single row get 'cars', 'row1' get 'cars', 'row1', {TIMERANGE => [ts1, ts2]} get 'cars', 'row1', {COLUMN => ['vi:model', 'vi:year']} Delete a Cell (Value) delete 'cars', 'row2', 'vi:year'
Contd. Count(counts number of rows in a table) count 'cars' Incr(Increments a cell 'value') incr 't1', 'r1', 'c1' incr 't1', 'r1', 'c1', 1 incr 't1', 'r1', 'c1', 10 Disable and Delete a Table disable 'cars' drop 'cars'
Contd. Enable table enable 'cars' List (List all tables in hbase. Optional regular expression parameter could be used to filter the output) list list 'abc.*' Truncate Disables, drops and recreates the specified table. truncate 'cars'
Thank You