Download presentation
Presentation is loading. Please wait.
Published byEstella McCarthy Modified over 6 years ago
1
How did it start? • At Google • • • • Lots of semi structured data
Commodity hardware Horizontal scalability • • • Tight integration with MapReduce 2
2
Why NoSQL? • RDBMS don’t scale • Buzzword! • • •
Typically large monolithic systems Hard to shard • • Specialized hardware.. expensive! • Buzzword! 3
3
Google BigTable • • • • • • • Distributed multi level map
Fault tolerant, persistent Scalable • • • Runs on commodity hardware Self managing • • Large number of read/write ops Fast scans • 4
4
HBase • Open source BigTable • HDFS as underlying DFS
• ZooKeeper as lock service • Tight integration with Hadoop MapReduce 5
5
HBase • • • • • • • Data model Architecture, implementation API
Regions, Region Servers etc • API • Current status and future direction Use cases • • How to think HBase (or NoSQL)? 6
6
• Sparse, multi dimensional map
Data Model • Sparse, multi dimensional map (row, column, timestamp) cell • Column = Column Family:Column Qualifier Columns Fam1:Qual1 Rows t1 AK v1 Timestamps 7
7
• Sparse, multi dimensional map
Data Model • Sparse, multi dimensional map (row, column, timestamp) cell • Column = Column Family:Column Qualifier Columns Fam1:Qual1 Rows t1 AK v1 t2 v2 Timestamps t2>t1 7
8
Regions • Region: Contiguous set of lexicographically
sorted rows • hbase.hregion.max.filesize (default 256MB) • Regions hosted by Region Servers 8
9
Regions and Splitting row1 row256 row257 row600 9
10
Regions and Splitting row1 row256 row257 row600 Writes 9
11
Regions and Splitting row1 row256 row257 row400 row401 row600 9
12
System Structure Region Servers Master HDFS ZooKeeper M a p R e d u c
10
13
Master • Region splitting • Load balancing • Metadata operations
• Multiple masters for failover 11
14
ZooKeeper • Master election • Locate -ROOT- region
• Region Server membership 12
15
Where is my row? • 3 level hierarchical lookup scheme 13 MyTable MyRow
.META. MyRow -ROOT- ZooKeeper 13
16
Where is my row? • 3 level hierarchical lookup scheme 13 MyTable MyRow
.META. MyRow -ROOT- ZooKeeper 13
17
Where is my row? • 3 level hierarchical lookup scheme
MyTable .META. MyRow -ROOT- ZooKeeper Row per META region 13
18
Where is my row? • 3 level hierarchical lookup scheme
MyTable .META. MyRow -ROOT- ZooKeeper Row per META region Row per table region 13
19
Where is my row? • 3 level hierarchical lookup scheme
MyTable .META. MyRow -ROOT- ZooKeeper Row per META region Row per table region 13
20
Memstore (Append only HFile: Immutable sorted map (byte[] byte[])
Region Memstore HLog (Append only WAL on HDFS) (Sequence File) HFile (on HDFS) HFile (on HDFS) (one per RS) Region HFile: Immutable sorted map (byte[] byte[]) (row, column, timestamp) 14 cell value
21
Region Memstore (Append only Write HFile: Immutable sorted map (byte[]
HLog (Append only WAL on HDFS) (Sequence File) HFile (on HDFS) HFile (on HDFS) (one per RS) Region HFile: Immutable sorted map (byte[] byte[]) (row, column, timestamp) 14 cell value
22
Memstore (Append only HFile: Immutable sorted map (byte[] byte[])
Region Memstore HLog (Append only WAL on HDFS) (Sequence File) HFile (on HDFS) HFile (on HDFS) (one per RS) Region HFile: Immutable sorted map (byte[] byte[]) (row, column, timestamp) 14 cell value
23
Region Memstore (Append only Small HFile Flush
HLog (Append only WAL on HDFS) (Sequence File) HFile (on HDFS) HFile (on HDFS) Small HFile (one per RS) Region HFile: Immutable sorted map (byte[] byte[]) (row, column, timestamp) 14 cell value
24
Memstore (Append only Small HFile HFile: Immutable sorted map (byte[]
Region Memstore HLog (Append only WAL on HDFS) (Sequence File) HFile (on HDFS) HFile (on HDFS) Small HFile (one per RS) Region HFile: Immutable sorted map (byte[] byte[]) (row, column, timestamp) 14 cell value
25
Memstore (Append only Small HFile (on HDFS) Compaction
Region Memstore HLog (Append only WAL on HDFS) (Sequence File) (one per RS) HFile (on HDFS) HFile (on HDFS) Compaction Small HFile Region HFile: Immutable sorted map (byte[] byte[]) (row, column, timestamp) 14 cell value
26
Memstore HLog (Append only Compaction
Region Memstore HLog (Append only WAL on HDFS) (Sequence File) (one per RS) Compaction Region HFile: Immutable sorted map (byte[] byte[]) (row, column, timestamp) 14 cell value
27
Memstore (Append only HFile: Immutable sorted map (byte[] byte[])
Region Memstore HLog (Append only WAL on HDFS) (Sequence File) HFile (on HDFS) (one per RS) Region HFile: Immutable sorted map (byte[] byte[]) (row, column, timestamp) 14 cell value
28
Memstore (Append only 15 WAL on HDFS) (Sequence File) (on HDFS)
Region Memstore HLog (Append only WAL on HDFS) (Sequence File) HFile (on HDFS) HFile (on HDFS) HFile (on HDFS) (one per RS) Region 15
29
Region Memstore (Append only Read 15 WAL on HDFS) (Sequence File)
HLog (Append only WAL on HDFS) (Sequence File) HFile (on HDFS) HFile (on HDFS) HFile (on HDFS) (one per RS) Region 15
30
Ways to access • • • • • • • • Java REST Thrift Scala Jython
Groovy DSL Ruby shell • • Java MR, Cascading, Pig, Hive 16
31
Java API • • • • • • • Get Put Delete Scan IncrementColumnValue
TableInputFormat - MapReduce Source TableOutputFormat - MapReduce Sink • 17
32
Other Features • • • • • • • Compression In memory column families
Multiple masters Rolling restart Bloom filters • • • • Efficient bulk loads • Source and sink for Hive, Pig, Cascading 18
33
How to think in HBase?
34
HBase v/s RDBMS • Neither solves all problems • •
It’s really a wrong comparison But puts things in context • 29
35
HBase v/s RDBMS Column oriented Flexible schema, add columns on the fly
Good with sparse tables No query language Wide tables Joins using MR - not optimized Tight integration with MR RDBMS Row oriented (mostly) Fixed schema Not optimized for sparse tables SQL Narrow tables Optimized for joins (small, fast ones too!) Not really... 30
36
HBase v/s RDBMS De-normalize your data
Horizontal scalability. Just add hardware Consistent No transactions Good for semi structured data as well as structured data RDBMS Normalize as you can Hard to shard and scale Consistent Transactional Good for structured data 31
37
HBase v/s RDBMS data can easily fit and be processed on a single
Rule:You probably don’t need HBase if your data can easily fit and be processed on a single RDBMS box. 32
38
HBase v/s RDBMS data can easily fit and be processed on a single
Rule:You probably don’t need HBase if your data can easily fit and be processed on a single RDBMS box. But then, you are at Hadoop Day, so it probably can’t! 32
39
Q&A
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.