Download presentation
Presentation is loading. Please wait.
Published byAmberly Cole Modified over 9 years ago
1
Introduction of HBase Reporter: Hu Yi 2009-3-11
2
Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed Computing Environment. Data is logically organized into tables, rows and columns.
3
Outline Data Model Architecture and Implementation Examples & Tests
4
Conceptual View A data row has a sortable row key and an arbitrary number of columns. A Time Stamp is designated automatically if not artificially. : Row key Time Stamp Column “ contents: ” Column “ anchor: ” “com.apach e.www” t12“ …” t11“ …” t10 “anchor:apache. com” “APACHE” “ com.cnn.w ww ” t15 “ anchor:cnnsi.com ”“ CNN ” t13 “ anchor:my.look.c a ” “ CNN.com ” t6 “ …” t5 “ …” t3 “ …” :
5
Physical Storage View Physically, tables are stored on a per-column family basis. Empty cells are not stored in a column- oriented storage format. Each column family is managed by an HStore. Row keyTS Column “ contents: ” “com.apache.w ww” t12 “ …” t11 “ …” “ com.cn.www ” t6 “ …” t5 “ …” t3 “ …” Row keyTS Column “ anchor: ” “com.apache. www”t10 “anchor: apache.com” “APACHE” com.cn.www ” t9 “ anchor: cnnsi.com ” “ CNN ” t8 “ anchor: my.look.ca ” “ CNN.co m ” HStore Data MapFile Index MapFile Key/Value Index key HStore Memcache
6
Row Ranges: Regions Row key/ Column ascending, Timestamp descending Physically, tables are broken into row ranges contain rows from start-key to end-key Row key Time Stamp Column “ contents: ” Column “ anchor: ” aaaa t15 anchor:ccvalue t13 ba t12 bb t11 anchor:cdvalue t10 bc aaab t14 aaac anchor:bevalue aaad anchor:advalue aaae t5 ae t3 af
7
Outline Data Model Architecture and Implementation Examples & Tests
8
Three major components The HBaseMaster The HRegionServer The HBase client
9
HBaseMaster Assign regions to HRegionServers. 1. ROOT region locates all the META regions. 2. META region maps a number of user regions. 3. Assign user regions to the HRegionServers. Enable/Disable table and change table schema Monitor the health of each Server
10
ROOT/META Table Each row in the ROOT and META tables is approximately 1KB in size. At the default size of 256MB. 2 24 TB
11
HRegionServer Write Requests Read Requests Cache Flushes Compactions Region Splits write Hstore1 Hstore2 Memcache1 HLog Row key Time Stam p Column “ contents: ” Column “ anchor: ” “com.apac he.ww w” t12“ …” t11“ …” t10 “anchor:apache.com” “APACH E” “ com.cnn.w ww ” t9 “ anchor:cnnsi.co m ” “ CNN ” t8 “ anchor:my.look. ca ” “ CNN.co m ” t6 “ …” t5 “ …” t3 “ …” Memcache2 Mapfile1.1 Mapfile1.2
12
HRegionServer Write Requests Read Requests Cache Flushes Compactions Region Splits Read Hstore1 Memcache1 Mapfile1.1 Mapfile1.2 Row key Time Stam p Column “ contents: ” Column “ anchor: ” “com.apach e.www” t12“ …” t11“ …” t10 “anchor:apache. com” “APACHE” “ com.cnn.w ww ” t9 “ anchor:cnnsi.co m ” “ CNN ” t8 “ anchor:my.look.c a ” “ CNN.com ” t6 “ …” t5 “ …” t3 “ …”
13
HRegionServer Write Requests Read Requests Cache Flushes Compactions Region Splits Cache Flushes Hstore1 Memcache1 Mapfile1.1 Mapfile1.2 HLog Row key Time Stam p Column “ contents: ” Column “ anchor: ” “com.apach e.www” t12“ …” t11“ …” t10 “anchor:apache. com” “APACHE” “ com.cnn.w ww ” t9 “ anchor:cnnsi.co m ” “ CNN ” t8 “ anchor:my.look.c a ” “ CNN.com ” t6 “ …” t5 “ …” t3 “ …” Mapfile1.1 Mapfile1.2 Mapfile1.3
14
HRegionServer Write Requests Read Requests Cache Flushes Compactions Region Splits Compaction s Hstore1 Memcache1 Mapfile1.1 Mapfile1.2 Mapfile1 Row key Time Stam p Column “ contents: ” Column “ anchor: ” “com.apach e.www” t12“ …” t11“ …” t10 “anchor:apache. com” “APACHE” “ com.cnn.w ww ” t9 “ anchor:cnnsi.co m ” “ CNN ” t8 “ anchor:my.look.c a ” “ CNN.com ” t6 “ …” t5 “ …” t3 “ …”
15
HRegionServer Write Requests Read Requests Cache Flushes Compactions Region Splits Hstore1 Memcache1 Mapfile1 Row key Time Stam p Column “ contents : ” Column “ anchor: ” “com.apac he.ww w” t12“ …” t11“ …” t10 “anchor:apache.com” “APACH E” “ com.cnn.w ww ” t9 “ anchor:cnnsi.co m ” “ CNN ” t8 “ anchor:my.look. ca ” “ CNN.co m ” t6 “ …” t5 “ …” t3 “ …”
16
HBase Client
17
ROOT Region
18
HBase Client META Region
19
HBase Client User Region Information cached
20
Outline Data Model Architecture and Implementation Examples & Tests
21
Create MyTable HBaseAdmin admin= new HBaseAdmin(config); HColumnDescriptor []column; column= new HColumnDescriptor[2]; column[0]=new HColumnDescriptor("columnFamily1:"); column[1]=new HColumnDescriptor("columnFamily2:"); HTableDescriptor desc= new HTableDescriptor(Bytes.toBytes("MyTable")); desc.addFamily(column[0]); desc.addFamily(column[1]); admin.createTable(desc); Row KeyTimestampcolumnFamily1:columnFamily2:
22
Insert Values BatchUpdate batchUpdate = new BatchUpdate("myRow",timestamp); batchUpdate.put("columnFamily1:labela",Bytes.toBytes("l abela value")); batchUpdate.put("columnFamily1:labelb",Bytes.toBytes(“l abelb value")); table.commit(batchUpdate); Row KeyTimestampcolumnFamily1: myRow ts1labelalabela value ts2 labelb labelb value
25
Search Row key Time Stamp Column “ anchor: ” “com.apache.www” t12 t11 t10 “anchor:apache.com”“APACHE” “ com.cnn.www ” t9 “ anchor:cnnsi.com ”“ CNN ” t8 “ anchor:my.look.ca ”“ CNN.com ” t6 t5 t3 Select value from table where key=‘com.apache.www’ AND label=‘anchor:apache.com’
26
Search Scanner Select value from table where anchor=‘cnnsi.com’ Row key Time Stamp Column “ anchor: ” “com.apache.www” t12 t11 t10 “anchor:apache.com”“APACHE” “ com.cnn.www ” t9 “ anchor:cnnsi.com ”“ CNN ” t8 “ anchor:my.look.ca ”“ CNN.com ” t6 t5 t3
27
Summary Column-oriented modification more flexible. Higher performance on row key clusters.
28
Future work More test work Optimization on search
29
Thank you
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.