Presentation is loading. Please wait.

Presentation is loading. Please wait.

How did it start? • At Google • • • • Lots of semi structured data

Similar presentations


Presentation on theme: "How did it start? • At Google • • • • Lots of semi structured data"— Presentation transcript:

1 How did it start? • At Google • • • • Lots of semi structured data
Commodity hardware Horizontal scalability Tight integration with MapReduce 2

2 Why NoSQL? • RDBMS don’t scale • Buzzword! • • •
Typically large monolithic systems Hard to shard Specialized hardware.. expensive! • Buzzword! 3

3 Google BigTable • • • • • • • Distributed multi level map
Fault tolerant, persistent Scalable Runs on commodity hardware Self managing Large number of read/write ops Fast scans 4

4 HBase • Open source BigTable • HDFS as underlying DFS
• ZooKeeper as lock service • Tight integration with Hadoop MapReduce 5

5 HBase • • • • • • • Data model Architecture, implementation API
Regions, Region Servers etc API Current status and future direction Use cases How to think HBase (or NoSQL)? 6

6 • Sparse, multi dimensional map
Data Model • Sparse, multi dimensional map (row, column, timestamp) cell • Column = Column Family:Column Qualifier Columns Fam1:Qual1 Rows t1 AK v1 Timestamps 7

7 • Sparse, multi dimensional map
Data Model • Sparse, multi dimensional map (row, column, timestamp) cell • Column = Column Family:Column Qualifier Columns Fam1:Qual1 Rows t1 AK v1 t2 v2 Timestamps t2>t1 7

8 Regions • Region: Contiguous set of lexicographically
sorted rows hbase.hregion.max.filesize (default 256MB) • Regions hosted by Region Servers 8

9 Regions and Splitting row1 row256 row257 row600 9

10 Regions and Splitting row1 row256 row257 row600 Writes 9

11 Regions and Splitting row1 row256 row257 row400 row401 row600 9

12 System Structure Region Servers Master HDFS ZooKeeper M a p R e d u c
10

13 Master • Region splitting • Load balancing • Metadata operations
• Multiple masters for failover 11

14 ZooKeeper • Master election • Locate -ROOT- region
• Region Server membership 12

15 Where is my row? • 3 level hierarchical lookup scheme 13 MyTable MyRow
.META. MyRow -ROOT- ZooKeeper 13

16 Where is my row? • 3 level hierarchical lookup scheme 13 MyTable MyRow
.META. MyRow -ROOT- ZooKeeper 13

17 Where is my row? • 3 level hierarchical lookup scheme
MyTable .META. MyRow -ROOT- ZooKeeper Row per META region 13

18 Where is my row? • 3 level hierarchical lookup scheme
MyTable .META. MyRow -ROOT- ZooKeeper Row per META region Row per table region 13

19 Where is my row? • 3 level hierarchical lookup scheme
MyTable .META. MyRow -ROOT- ZooKeeper Row per META region Row per table region 13

20 Memstore (Append only HFile: Immutable sorted map (byte[] byte[])
Region Memstore HLog (Append only WAL on HDFS) (Sequence File) HFile (on HDFS) HFile (on HDFS) (one per RS) Region HFile: Immutable sorted map (byte[] byte[]) (row, column, timestamp) 14 cell value

21 Region Memstore (Append only Write HFile: Immutable sorted map (byte[]
HLog (Append only WAL on HDFS) (Sequence File) HFile (on HDFS) HFile (on HDFS) (one per RS) Region HFile: Immutable sorted map (byte[] byte[]) (row, column, timestamp) 14 cell value

22 Memstore (Append only HFile: Immutable sorted map (byte[] byte[])
Region Memstore HLog (Append only WAL on HDFS) (Sequence File) HFile (on HDFS) HFile (on HDFS) (one per RS) Region HFile: Immutable sorted map (byte[] byte[]) (row, column, timestamp) 14 cell value

23 Region Memstore (Append only Small HFile Flush
HLog (Append only WAL on HDFS) (Sequence File) HFile (on HDFS) HFile (on HDFS) Small HFile (one per RS) Region HFile: Immutable sorted map (byte[] byte[]) (row, column, timestamp) 14 cell value

24 Memstore (Append only Small HFile HFile: Immutable sorted map (byte[]
Region Memstore HLog (Append only WAL on HDFS) (Sequence File) HFile (on HDFS) HFile (on HDFS) Small HFile (one per RS) Region HFile: Immutable sorted map (byte[] byte[]) (row, column, timestamp) 14 cell value

25 Memstore (Append only Small HFile (on HDFS) Compaction
Region Memstore HLog (Append only WAL on HDFS) (Sequence File) (one per RS) HFile (on HDFS) HFile (on HDFS) Compaction Small HFile Region HFile: Immutable sorted map (byte[] byte[]) (row, column, timestamp) 14 cell value

26 Memstore HLog (Append only Compaction
Region Memstore HLog (Append only WAL on HDFS) (Sequence File) (one per RS) Compaction Region HFile: Immutable sorted map (byte[] byte[]) (row, column, timestamp) 14 cell value

27 Memstore (Append only HFile: Immutable sorted map (byte[] byte[])
Region Memstore HLog (Append only WAL on HDFS) (Sequence File) HFile (on HDFS) (one per RS) Region HFile: Immutable sorted map (byte[] byte[]) (row, column, timestamp) 14 cell value

28 Memstore (Append only 15 WAL on HDFS) (Sequence File) (on HDFS)
Region Memstore HLog (Append only WAL on HDFS) (Sequence File) HFile (on HDFS) HFile (on HDFS) HFile (on HDFS) (one per RS) Region 15

29 Region Memstore (Append only Read 15 WAL on HDFS) (Sequence File)
HLog (Append only WAL on HDFS) (Sequence File) HFile (on HDFS) HFile (on HDFS) HFile (on HDFS) (one per RS) Region 15

30 Ways to access • • • • • • • • Java REST Thrift Scala Jython
Groovy DSL Ruby shell Java MR, Cascading, Pig, Hive 16

31 Java API • • • • • • • Get Put Delete Scan IncrementColumnValue
TableInputFormat - MapReduce Source TableOutputFormat - MapReduce Sink 17

32 Other Features • • • • • • • Compression In memory column families
Multiple masters Rolling restart Bloom filters Efficient bulk loads Source and sink for Hive, Pig, Cascading 18

33 How to think in HBase?

34 HBase v/s RDBMS • Neither solves all problems • •
It’s really a wrong comparison But puts things in context 29

35 HBase v/s RDBMS Column oriented Flexible schema, add columns on the fly
Good with sparse tables No query language Wide tables Joins using MR - not optimized Tight integration with MR RDBMS Row oriented (mostly) Fixed schema Not optimized for sparse tables SQL Narrow tables Optimized for joins (small, fast ones too!) Not really... 30

36 HBase v/s RDBMS De-normalize your data
Horizontal scalability. Just add hardware Consistent No transactions Good for semi structured data as well as structured data RDBMS Normalize as you can Hard to shard and scale Consistent Transactional Good for structured data 31

37 HBase v/s RDBMS data can easily fit and be processed on a single
Rule:You probably don’t need HBase if your data can easily fit and be processed on a single RDBMS box. 32

38 HBase v/s RDBMS data can easily fit and be processed on a single
Rule:You probably don’t need HBase if your data can easily fit and be processed on a single RDBMS box. But then, you are at Hadoop Day, so it probably can’t! 32

39 Q&A


Download ppt "How did it start? • At Google • • • • Lots of semi structured data"

Similar presentations


Ads by Google