How did it start? • At Google • • • • Lots of semi structured data

How did it start? • At Google • • • • Lots of semi structured data
Commodity hardware Horizontal scalability • • • Tight integration with MapReduce 2

Why NoSQL? • RDBMS don’t scale • Buzzword! • • •
Typically large monolithic systems Hard to shard • • Specialized hardware.. expensive! • Buzzword! 3

Google BigTable • • • • • • • Distributed multi level map
Fault tolerant, persistent Scalable • • • Runs on commodity hardware Self managing • • Large number of read/write ops Fast scans • 4

HBase • Open source BigTable • HDFS as underlying DFS
• ZooKeeper as lock service • Tight integration with Hadoop MapReduce 5

HBase • • • • • • • Data model Architecture, implementation API
Regions, Region Servers etc • API • Current status and future direction Use cases • • How to think HBase (or NoSQL)? 6

• Sparse, multi dimensional map
Data Model • Sparse, multi dimensional map (row, column, timestamp) cell • Column = Column Family:Column Qualiﬁer Columns Fam1:Qual1 Rows t1 AK v1 Timestamps 7

• Sparse, multi dimensional map
Data Model • Sparse, multi dimensional map (row, column, timestamp) cell • Column = Column Family:Column Qualiﬁer Columns Fam1:Qual1 Rows t1 AK v1 t2 v2 Timestamps t2>t1 7

Regions • Region: Contiguous set of lexicographically
sorted rows • hbase.hregion.max.ﬁlesize (default 256MB) • Regions hosted by Region Servers 8

Regions and Splitting row1 row256 row257 row600 9

Regions and Splitting row1 row256 row257 row600 Writes 9

Regions and Splitting row1 row256 row257 row400 row401 row600 9

System Structure Region Servers Master HDFS ZooKeeper M a p R e d u c
10

Master • Region splitting • Load balancing • Metadata operations
• Multiple masters for failover 11

ZooKeeper • Master election • Locate -ROOT- region
• Region Server membership 12

Where is my row? • 3 level hierarchical lookup scheme 13 MyTable MyRow
.META. MyRow -ROOT- ZooKeeper 13

Where is my row? • 3 level hierarchical lookup scheme
MyTable .META. MyRow -ROOT- ZooKeeper Row per META region 13

Where is my row? • 3 level hierarchical lookup scheme
MyTable .META. MyRow -ROOT- ZooKeeper Row per META region Row per table region 13

Memstore (Append only HFile: Immutable sorted map (byte[] byte[])
Region Memstore HLog (Append only WAL on HDFS) (Sequence File) HFile (on HDFS) HFile (on HDFS) (one per RS) Region HFile: Immutable sorted map (byte[] byte[]) (row, column, timestamp) 14 cell value

Region Memstore (Append only Write HFile: Immutable sorted map (byte[]
HLog (Append only WAL on HDFS) (Sequence File) HFile (on HDFS) HFile (on HDFS) (one per RS) Region HFile: Immutable sorted map (byte[] byte[]) (row, column, timestamp) 14 cell value

Region Memstore HLog (Append only WAL on HDFS) (Sequence File) HFile (on HDFS) HFile (on HDFS) (one per RS) Region HFile: Immutable sorted map (byte[] byte[]) (row, column, timestamp) 14 cell value

Region Memstore (Append only Small HFile Flush
HLog (Append only WAL on HDFS) (Sequence File) HFile (on HDFS) HFile (on HDFS) Small HFile (one per RS) Region HFile: Immutable sorted map (byte[] byte[]) (row, column, timestamp) 14 cell value

Memstore (Append only Small HFile HFile: Immutable sorted map (byte[]
Region Memstore HLog (Append only WAL on HDFS) (Sequence File) HFile (on HDFS) HFile (on HDFS) Small HFile (one per RS) Region HFile: Immutable sorted map (byte[] byte[]) (row, column, timestamp) 14 cell value

Memstore (Append only Small HFile (on HDFS) Compaction
Region Memstore HLog (Append only WAL on HDFS) (Sequence File) (one per RS) HFile (on HDFS) HFile (on HDFS) Compaction Small HFile Region HFile: Immutable sorted map (byte[] byte[]) (row, column, timestamp) 14 cell value

Memstore HLog (Append only Compaction
Region Memstore HLog (Append only WAL on HDFS) (Sequence File) (one per RS) Compaction Region HFile: Immutable sorted map (byte[] byte[]) (row, column, timestamp) 14 cell value

Region Memstore HLog (Append only WAL on HDFS) (Sequence File) HFile (on HDFS) (one per RS) Region HFile: Immutable sorted map (byte[] byte[]) (row, column, timestamp) 14 cell value

Memstore (Append only 15 WAL on HDFS) (Sequence File) (on HDFS)
Region Memstore HLog (Append only WAL on HDFS) (Sequence File) HFile (on HDFS) HFile (on HDFS) HFile (on HDFS) (one per RS) Region 15

Region Memstore (Append only Read 15 WAL on HDFS) (Sequence File)
HLog (Append only WAL on HDFS) (Sequence File) HFile (on HDFS) HFile (on HDFS) HFile (on HDFS) (one per RS) Region 15

Ways to access • • • • • • • • Java REST Thrift Scala Jython
Groovy DSL Ruby shell • • Java MR, Cascading, Pig, Hive 16

Java API • • • • • • • Get Put Delete Scan IncrementColumnValue
TableInputFormat - MapReduce Source TableOutputFormat - MapReduce Sink • 17

Other Features • • • • • • • Compression In memory column families
Multiple masters Rolling restart Bloom ﬁlters • • • • Efﬁcient bulk loads • Source and sink for Hive, Pig, Cascading 18

How to think in HBase?

HBase v/s RDBMS • Neither solves all problems • •
It’s really a wrong comparison But puts things in context • 29

HBase v/s RDBMS Column oriented Flexible schema, add columns on the ﬂy
Good with sparse tables No query language Wide tables Joins using MR - not optimized Tight integration with MR RDBMS Row oriented (mostly) Fixed schema Not optimized for sparse tables SQL Narrow tables Optimized for joins (small, fast ones too!) Not really... 30

HBase v/s RDBMS De-normalize your data
Horizontal scalability. Just add hardware Consistent No transactions Good for semi structured data as well as structured data RDBMS Normalize as you can Hard to shard and scale Consistent Transactional Good for structured data 31

HBase v/s RDBMS data can easily ﬁt and be processed on a single
Rule:You probably don’t need HBase if your data can easily ﬁt and be processed on a single RDBMS box. 32

HBase v/s RDBMS data can easily ﬁt and be processed on a single
Rule:You probably don’t need HBase if your data can easily ﬁt and be processed on a single RDBMS box. But then, you are at Hadoop Day, so it probably can’t! 32

How did it start? • At Google • • • • Lots of semi structured data

Similar presentations

Presentation on theme: "How did it start? • At Google • • • • Lots of semi structured data"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

How did it start? • At Google • • • • Lots of semi structured data

Similar presentations

Presentation on theme: "How did it start? • At Google • • • • Lots of semi structured data"— Presentation transcript:

Similar presentations

About project

Feedback