Download presentation
Presentation is loading. Please wait.
Published byGodfrey Miller Modified over 8 years ago
1
1 Gaurav Kohli Xebia Breaking with DBMS and Dating with Relational Hbase
2
2 me Gaurav Kohli gaurav.in@gmail.com About Consultant Xebia IT Architects
3
3 Why are we here ? Something about RDBMS Limitations of RDBMS Why Hbase or any NoSql solution Overview of Hbase Specific Use cases Paradigm shift in Schema Design Architecture of Hbase Hbase Interface – Java API, Thrift Conclusion Agenda
4
4 Databases Relational
5
5 Relational Databases have a lot of limitations
6
6 Limitations Data Set going into PetaBytes RDBMS don't scale inherently Scale up/Scale out ( Load Balancing + Replication) Hard to shard / partition Both read / write throughput not possible Transactional / Analytical databases Specialized Hardware …... is very expensive Oracle clustering
7
7 Replicatio n Master Slave Maste r Slav e Replication Scaling Out
8
8 Master - Many Slave Scaling Out MySQL master becomes a problem All Slaves must have the same write capacity as master Single point of failure, no easy failover Maste r Read s Write s Slave nodes
9
9 Dual Master Maste r Slav e Replication
10
10 NoSQL
11
11
12
12 2006.11 Google releases paper on BigTable 2007.2 Initial HBase prototype created as Hadoop contrib. 2007.10 First usable HBase 2008.1 Hadoop become Apache top-level project and HBase becomes subproject 2010.5~ Hbase becomes Apache top-level project 2010.6 Hbase 0.26.5 released. 2010.10 HBase 0.89.2010092 – third developer release Background
13
13 Distributed uses HDFS for storage Column-Oriented Multi-Dimensional versions High-Availability High-Performance Storage System Hbase
14
14 A Sql Database No Joins, no query engine, no datatypes, no sql No Schema Denormalized data Wide and sparsely populated data structure(key- value) No DBA needed Hbase is Not
15
15 Bigness Big data, big number of users, big number of computers Massive write performance Facebook needs 135 billion messages a month Twitter stores 7 TB data per day Fast key-value access Write availability No Single point of failure Use Case
16
16 Managing large streams of non-transactional data: Apache logs, application logs, MySQL logs, etc. Real-time inserts, updates, and queries. Fraud detection by comparing transactions to known patterns in real-time. Analytics - Use MapReduce, Hive, or Pig to perform analytical queries Specific Use Case
17
17 Column-oriented database Table are sorted by Row Table schema only defines Column families column family can have any number of columns Each cell value has a timestamp Storage Model
18
18 Storage Model
19
19 Storage Model
20
20 Storage Model Sorted Map( RowKey, List( SortedMap( Column, List( value, Timestamp ) SortedMap(RowKey,List(SortedMap(Column,List(Value,Timestamp)))
21
21 A BIG SORTED MAP Row Key+ Column Key + timestamp => value 2 Versions of this row Timestamp is a long value Column Qualifier/Name Sorted by Row key and column key Column family Schema Design Student table
22
22 Schema Design Example of a Student and Subject mn
23
23 Example of a Student and Subject RDBMS Schema Design Three tables Student table Subject table Student-Subject table
24
24 Hbase Student-Subject schema - Hbase Schema Design Only two table Student table Subject table
25
25 Hbase Schema Design Student-Subject schema - Hbase Student table Subject table Only two table
26
26 Column families attributes
27
27 Region: Contiguous set of lexicographically sorted rows hbase.hregion.max.filesize (default:256 Mb) Region hosted by Region Servers Each Table is partitioned into Regions Regions
28
28 Regions and Splitting row20 0 row20 1 row50 0 row 1 new row
29
29 Regions and Splitting row20 0 row20 1 row35 0 row 1 row 351 row 501
30
30 Master Zookeeper RegionServers HDFS MapReduce Architectur e
31
31 Architectur e
32
32 – Java API, Thrift... Tools
33
33 – Java API, Thrift... Tools Java Thrift ( Ruby, Php, Python, Perl, C++... ) REST Groovy DSL MapReduce Hbase Shell
34
34 – Java API, Thrift... Tools Java Get Put Delete Scan IncrementalColumnValue
35
35
36
36 Hbase v/s RDBMS Not a replacement Solves only a small subset(~5%) Conclusio n
37
37 Where Sql makes life easy Joining Secondary Indexing Referential Integrity (updates) ACID Where Hbase makes life easy Dataset scale Read/Write scale Replication Batch analysis Conclusio n
38
38
39
39
40
40 Hbase Apache (http://hbase.apache.org/) Hbase Wiki (wiki.apache.org/hadoop/Hbase) Hbase blog (blog.hbase.org) Images from Google Search http://www.larsgeorge.com/2009/10/hbase- architecture-101-storage.html http://highscalability.com/blog/2010/12/6/what-the- heck-are-you-actually-using-nosql-for.html References & Credit
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.