Introduction of HBase Reporter: Hu Yi
Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed Computing Environment. Data is logically organized into tables, rows and columns.
Outline Data Model Architecture and Implementation Examples & Tests
Conceptual View A data row has a sortable row key and an arbitrary number of columns. A Time Stamp is designated automatically if not artificially. : Row key Time Stamp Column “ contents: ” Column “ anchor: ” “com.apach e.www” t12“ …” t11“ …” t10 “anchor:apache. com” “APACHE” “ com.cnn.w ww ” t15 “ anchor:cnnsi.com ”“ CNN ” t13 “ anchor:my.look.c a ” “ CNN.com ” t6 “ …” t5 “ …” t3 “ …” :
Physical Storage View Physically, tables are stored on a per-column family basis. Empty cells are not stored in a column- oriented storage format. Each column family is managed by an HStore. Row keyTS Column “ contents: ” “com.apache.w ww” t12 “ …” t11 “ …” “ com.cn.www ” t6 “ …” t5 “ …” t3 “ …” Row keyTS Column “ anchor: ” “com.apache. www”t10 “anchor: apache.com” “APACHE” com.cn.www ” t9 “ anchor: cnnsi.com ” “ CNN ” t8 “ anchor: my.look.ca ” “ CNN.co m ” HStore Data MapFile Index MapFile Key/Value Index key HStore Memcache
Row Ranges: Regions Row key/ Column ascending, Timestamp descending Physically, tables are broken into row ranges contain rows from start-key to end-key Row key Time Stamp Column “ contents: ” Column “ anchor: ” aaaa t15 anchor:ccvalue t13 ba t12 bb t11 anchor:cdvalue t10 bc aaab t14 aaac anchor:bevalue aaad anchor:advalue aaae t5 ae t3 af
Outline Data Model Architecture and Implementation Examples & Tests
Three major components The HBaseMaster The HRegionServer The HBase client
HBaseMaster Assign regions to HRegionServers. 1. ROOT region locates all the META regions. 2. META region maps a number of user regions. 3. Assign user regions to the HRegionServers. Enable/Disable table and change table schema Monitor the health of each Server
ROOT/META Table Each row in the ROOT and META tables is approximately 1KB in size. At the default size of 256MB TB
HRegionServer Write Requests Read Requests Cache Flushes Compactions Region Splits write Hstore1 Hstore2 Memcache1 HLog Row key Time Stam p Column “ contents: ” Column “ anchor: ” “com.apac he.ww w” t12“ …” t11“ …” t10 “anchor:apache.com” “APACH E” “ com.cnn.w ww ” t9 “ anchor:cnnsi.co m ” “ CNN ” t8 “ anchor:my.look. ca ” “ CNN.co m ” t6 “ …” t5 “ …” t3 “ …” Memcache2 Mapfile1.1 Mapfile1.2
HRegionServer Write Requests Read Requests Cache Flushes Compactions Region Splits Read Hstore1 Memcache1 Mapfile1.1 Mapfile1.2 Row key Time Stam p Column “ contents: ” Column “ anchor: ” “com.apach e.www” t12“ …” t11“ …” t10 “anchor:apache. com” “APACHE” “ com.cnn.w ww ” t9 “ anchor:cnnsi.co m ” “ CNN ” t8 “ anchor:my.look.c a ” “ CNN.com ” t6 “ …” t5 “ …” t3 “ …”
HRegionServer Write Requests Read Requests Cache Flushes Compactions Region Splits Cache Flushes Hstore1 Memcache1 Mapfile1.1 Mapfile1.2 HLog Row key Time Stam p Column “ contents: ” Column “ anchor: ” “com.apach e.www” t12“ …” t11“ …” t10 “anchor:apache. com” “APACHE” “ com.cnn.w ww ” t9 “ anchor:cnnsi.co m ” “ CNN ” t8 “ anchor:my.look.c a ” “ CNN.com ” t6 “ …” t5 “ …” t3 “ …” Mapfile1.1 Mapfile1.2 Mapfile1.3
HRegionServer Write Requests Read Requests Cache Flushes Compactions Region Splits Compaction s Hstore1 Memcache1 Mapfile1.1 Mapfile1.2 Mapfile1 Row key Time Stam p Column “ contents: ” Column “ anchor: ” “com.apach e.www” t12“ …” t11“ …” t10 “anchor:apache. com” “APACHE” “ com.cnn.w ww ” t9 “ anchor:cnnsi.co m ” “ CNN ” t8 “ anchor:my.look.c a ” “ CNN.com ” t6 “ …” t5 “ …” t3 “ …”
HRegionServer Write Requests Read Requests Cache Flushes Compactions Region Splits Hstore1 Memcache1 Mapfile1 Row key Time Stam p Column “ contents : ” Column “ anchor: ” “com.apac he.ww w” t12“ …” t11“ …” t10 “anchor:apache.com” “APACH E” “ com.cnn.w ww ” t9 “ anchor:cnnsi.co m ” “ CNN ” t8 “ anchor:my.look. ca ” “ CNN.co m ” t6 “ …” t5 “ …” t3 “ …”
HBase Client
ROOT Region
HBase Client META Region
HBase Client User Region Information cached
Outline Data Model Architecture and Implementation Examples & Tests
Create MyTable HBaseAdmin admin= new HBaseAdmin(config); HColumnDescriptor []column; column= new HColumnDescriptor[2]; column[0]=new HColumnDescriptor("columnFamily1:"); column[1]=new HColumnDescriptor("columnFamily2:"); HTableDescriptor desc= new HTableDescriptor(Bytes.toBytes("MyTable")); desc.addFamily(column[0]); desc.addFamily(column[1]); admin.createTable(desc); Row KeyTimestampcolumnFamily1:columnFamily2:
Insert Values BatchUpdate batchUpdate = new BatchUpdate("myRow",timestamp); batchUpdate.put("columnFamily1:labela",Bytes.toBytes("l abela value")); batchUpdate.put("columnFamily1:labelb",Bytes.toBytes(“l abelb value")); table.commit(batchUpdate); Row KeyTimestampcolumnFamily1: myRow ts1labelalabela value ts2 labelb labelb value
Search Row key Time Stamp Column “ anchor: ” “com.apache.www” t12 t11 t10 “anchor:apache.com”“APACHE” “ com.cnn.www ” t9 “ anchor:cnnsi.com ”“ CNN ” t8 “ anchor:my.look.ca ”“ CNN.com ” t6 t5 t3 Select value from table where key=‘com.apache.www’ AND label=‘anchor:apache.com’
Search Scanner Select value from table where anchor=‘cnnsi.com’ Row key Time Stamp Column “ anchor: ” “com.apache.www” t12 t11 t10 “anchor:apache.com”“APACHE” “ com.cnn.www ” t9 “ anchor:cnnsi.com ”“ CNN ” t8 “ anchor:my.look.ca ”“ CNN.com ” t6 t5 t3
Summary Column-oriented modification more flexible. Higher performance on row key clusters.
Future work More test work Optimization on search
Thank you