Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Hadoop RDBMS Replace Oracle with Hadoop John Leach CTO and Co-Founder J.

Similar presentations


Presentation on theme: "The Hadoop RDBMS Replace Oracle with Hadoop John Leach CTO and Co-Founder J."— Presentation transcript:

1 The Hadoop RDBMS Replace Oracle with Hadoop John Leach CTO and Co-Founder J

2 The Hadoop RDBMS 2 Standard ANSI SQL Horizontal Scale-Out Real-Time Updates ACID Transactions Powers OLAP and OLTP Seamless BI Integration who we are Splice Machine Proprietary and Confidential

3 serialization and write pipelining Serialization Goals Disk Usage Parity with Data Supplied Predicate evaluation use byte[] comparisons (sorted) Memory and CPU efficient (fast) Lazy Serialization and Deserialization Write Pipelining Goals Non-blocking Writes Transactional Awareness Small Network Footprint Handle Failure, Location, and Retry Semantics 3

4 Single Column Encoding All Columns encoded in a single cell separated by 0x00 byte Nulls are encoded either as “explicit null” or as an absent field Cell value prefixed by an Index containing which fields are present in cell whether the field is Scalar (1-9 Bytes) Float (4 Bytes) Double (8 Bytes) Other (1 – N Bytes) 4

5 Example Insert Table Schema: (a int, b string) Insert row (1,’bob’): All columns packed together 1 0x00 ‘bob’ Index prepended {1(s),2(o)}0x00 1 0x00 ‘bob’ 5

6 Example Insert w/ nulls Row (1,null) nulls left absent 1 Index prepended (field B is not present) {1(s)} 0x00 1 6

7 Example: Update Row already present: {1(s),2(o)} set a = 2 Pack entry 2 prepend index (field B is not present) {1(s)}0x00 2 7

8 Decoding Indexes are cached Most data looks like it’s predecessor Values are read in reverse timestamp order Updates before inserts Seek through bytes for fields of interest Once a field is populated, ignore all other values for that field. 8

9 Example Decoding Start with (NULL,NULL) 2 KeyValues present: {1(s)}0x00 2 {1(s),2(o)} 0x00 1 0x00 ‘bob’ Read first KeyValue, fill field 1 Row: (2,NULL) Read second KeyValue, skip field 1(already filled), fill field 2: Row: (2,’bob’) 9

10 Index Decoding Index encoded differently depending on number of columns present and type Uncompressed: 1 bit for present, 2 bits for type Compressed: Run-length encoded (field 1-3, scalar, 5-8 double…) Sparse: Delta encoded (index,type) pairs Sparse compressed: Run-length encoded (index,type) pairs 10

11 Write Pipeline Asynchronous but guaranteed delivery Operate in Bulk Row or Size bounded Highly Configurable Utilizes Cached Region Locations Server component modeled after Java’s NIO Attach Handlers for different RDBMS features Handle retries, failure, and SQL semantics Wrong Region, Region Too Busy, Primary Key Violation, Unique Constraint Violation 11

12 Write Pipeline Base Element Rows are encoded into custom KVPairs all rows for a family and column are grouped together Exploded into Put only to write to HBase Timestamps added on server side Supports snappy compression 12

13 Write Pipeline Client Tree Based Buffer Table -> Region -> N Buffers Rows are buffered on client side in memory N is configurable When buffer fills asynchronously write batch to Region Handles HBase “difficulties” gracefully Wrong Region Re-bucket Too Busy Add delay and possibly back-off etc. 13

14 Write Pipeline Server Side Coprocessor based Limited number of concurrent writes to a server excess write requests are rejected prevents IPC thread starvation SQL Based Handlers for parallel writes Indexes, Primary Key Constraints, Unique Constraints Writes occur in a single WALEdit on each region 14

15 Interests Other items we have done or interested in… Burstable Tries Implementation of Memstore Pluggable Cost Based Genetic Algorithm for Assignment Manager Columnar Representations and in-memory processing. Concurrent Bloom Filter (i.e. Thread Safe BitSet) We are hiring Just Completed $15M Series B Raise careers@splicemachine.com 15


Download ppt "The Hadoop RDBMS Replace Oracle with Hadoop John Leach CTO and Co-Founder J."

Similar presentations


Ads by Google