1 Gaurav Kohli Xebia Breaking with DBMS and Dating with Relational Hbase
2 me Gaurav Kohli About Consultant Xebia IT Architects
3 Why are we here ? Something about RDBMS Limitations of RDBMS Why Hbase or any NoSql solution Overview of Hbase Specific Use cases Paradigm shift in Schema Design Architecture of Hbase Hbase Interface – Java API, Thrift Conclusion Agenda
4 Databases Relational
5 Relational Databases have a lot of limitations
6 Limitations Data Set going into PetaBytes RDBMS don't scale inherently Scale up/Scale out ( Load Balancing + Replication) Hard to shard / partition Both read / write throughput not possible Transactional / Analytical databases Specialized Hardware …... is very expensive Oracle clustering
7 Replicatio n Master Slave Maste r Slav e Replication Scaling Out
8 Master - Many Slave Scaling Out MySQL master becomes a problem All Slaves must have the same write capacity as master Single point of failure, no easy failover Maste r Read s Write s Slave nodes
9 Dual Master Maste r Slav e Replication
10 NoSQL
11
Google releases paper on BigTable Initial HBase prototype created as Hadoop contrib First usable HBase Hadoop become Apache top-level project and HBase becomes subproject ~ Hbase becomes Apache top-level project Hbase released HBase – third developer release Background
13 Distributed uses HDFS for storage Column-Oriented Multi-Dimensional versions High-Availability High-Performance Storage System Hbase
14 A Sql Database No Joins, no query engine, no datatypes, no sql No Schema Denormalized data Wide and sparsely populated data structure(key- value) No DBA needed Hbase is Not
15 Bigness Big data, big number of users, big number of computers Massive write performance Facebook needs 135 billion messages a month Twitter stores 7 TB data per day Fast key-value access Write availability No Single point of failure Use Case
16 Managing large streams of non-transactional data: Apache logs, application logs, MySQL logs, etc. Real-time inserts, updates, and queries. Fraud detection by comparing transactions to known patterns in real-time. Analytics - Use MapReduce, Hive, or Pig to perform analytical queries Specific Use Case
17 Column-oriented database Table are sorted by Row Table schema only defines Column families column family can have any number of columns Each cell value has a timestamp Storage Model
18 Storage Model
19 Storage Model
20 Storage Model Sorted Map( RowKey, List( SortedMap( Column, List( value, Timestamp ) SortedMap(RowKey,List(SortedMap(Column,List(Value,Timestamp)))
21 A BIG SORTED MAP Row Key+ Column Key + timestamp => value 2 Versions of this row Timestamp is a long value Column Qualifier/Name Sorted by Row key and column key Column family Schema Design Student table
22 Schema Design Example of a Student and Subject mn
23 Example of a Student and Subject RDBMS Schema Design Three tables Student table Subject table Student-Subject table
24 Hbase Student-Subject schema - Hbase Schema Design Only two table Student table Subject table
25 Hbase Schema Design Student-Subject schema - Hbase Student table Subject table Only two table
26 Column families attributes
27 Region: Contiguous set of lexicographically sorted rows hbase.hregion.max.filesize (default:256 Mb) Region hosted by Region Servers Each Table is partitioned into Regions Regions
28 Regions and Splitting row20 0 row20 1 row50 0 row 1 new row
29 Regions and Splitting row20 0 row20 1 row35 0 row 1 row 351 row 501
30 Master Zookeeper RegionServers HDFS MapReduce Architectur e
31 Architectur e
32 – Java API, Thrift... Tools
33 – Java API, Thrift... Tools Java Thrift ( Ruby, Php, Python, Perl, C++... ) REST Groovy DSL MapReduce Hbase Shell
34 – Java API, Thrift... Tools Java Get Put Delete Scan IncrementalColumnValue
35
36 Hbase v/s RDBMS Not a replacement Solves only a small subset(~5%) Conclusio n
37 Where Sql makes life easy Joining Secondary Indexing Referential Integrity (updates) ACID Where Hbase makes life easy Dataset scale Read/Write scale Replication Batch analysis Conclusio n
38
39
40 Hbase Apache ( Hbase Wiki (wiki.apache.org/hadoop/Hbase) Hbase blog (blog.hbase.org) Images from Google Search architecture-101-storage.html heck-are-you-actually-using-nosql-for.html References & Credit