Cloudera Kudu Introduction Zbigniew Baranowski Based on: http://slideshare.net/cloudera/kudu-new-hadoop-storage-for-fast-analytics-on-fast-data
What is KUDU? New storage engine for structured data (tables) – does not use HDFS! Columnar store Mutable (insert, update, delete, scan) Written in C++ Apache-licensed – open source Currently in beta version
Hadoop and KUDU
Yet another engine to store data? HDFS excels at Scanning of large amount of data at speed Accumulating data with high throughput HBASE (on HDFS) excels at Fast random lookups and writing by key Making data mutable
KUDU tries to fill the gap
Addressing evolution in hardware Hard drives -> solid state disks Random scans in milliseconds Analytics no longer IO bound -> CPU bound RAM is getting cheaper – can be used at greater scale
Table oriented storage Table has Hive/RDBMS like schema Primary key (one or many columns), NO secondary indexes Finite number of columns Each column has name and type Horizontally partitioned (range, hash) – called tablets Tablets typically have 3 or 5 replicas
Table access and manipulations Operations on tables (NoSQL) insert, update, delete, scan Java and C++ API Integrated with Impala, MapReduce, Spark more are coming
KUDU table with Impala Source: https://github.com/cloudera/kudu-examples/tree/master/java/collectl CREATE TABLE `metrics` ( `host` STRING, `metric` STRING, `timestamp` INT, `value` DOUBLE ) TBLPROPERTIES( 'storage_handler' = 'com.cloudera.kudu.hive.KuduStorageHandler', 'kudu.table_name' = 'metrics', 'kudu.master_addresses' = 'quickstart.cloudera:7051', 'kudu.key_columns' = 'host, metric, timestamp' ); insert into metrics values (‘myhost’,’temp1’, 1449005452,0.5343242); select count(distinct metric) from metrics; delete from metrics where host=‘myhost’ and metric=’temp1’ …..;
KUDU with Spark import org.kududb.mapreduce._ import org.apache.hadoop.conf.Configuration import org.kududb.client._ import org.apache.hadoop.io.NullWritable; val conf = new Configuration conf.set("kudu.mapreduce.master.address", "quickstart.cloudera"); conf.set("kudu.mapreduce.input.table", "metrics"); conf.set("kudu.mapreduce.column.projection", "host,metric,timestamp,value"); val kuduRdd = sc.newAPIHadoopRDD(conf, classOf[KuduTableInputFormat], classOf[NullWritable], classOf[RowResult]) // Print the first five values kuduRdd.values.map(r => r.rowToString()).take(5).foreach(x => print(x + "\n"))
Data Consistency Reading Writing Snapshot consistency Point in time queries (based on provided timestamp) Writing Single row mutations done atomically across all columns No multi-row transactions
Architecture overview Master server (single) Keeps metadata replicated Catalog (tables definitions) in a KUDU table (cached) Coordination (full view of the cluster) Tablets directory (tablets locations) (in memory) Fast failover supported Tablets Server (worker nodes) Stores/servers tablets On local disks (no HDFS) Tracks status of tablets replicas (followers)
Tables and tablets Metadata Data
When to use? When sequential and random data access is required simultaneously When simplification of a data ingest is needed When updates on data are required Examples Time series Streaming data, immediately available Online reporting
Typical low latency ingestion flow
Simplified ingestion flow with KUDU
Benchmarking by Cloudera
Benchmarking by customer (Xiaomi)
KUDU under the hood
Data replication - Raft consensus Master Get tablets locations Client Tablet server X Write WAL Tablet 1 (leader) Success Confirm Write Write Confirm Tablet server Y Tablet server Z WAL WAL Tablet 1 (follower) Tablet 1 (follower)
Data Insertion (without uniqueness check) Tablets Server MemRowSet B+tree Row: Col1,Col2, Col3 INSERT Leafs sorted by Primary Key Row1,Row2,Row3 Flush Columnar store encoded similarly to Parquet Rows sorted by PK. DiskRowSet1 (32MB) PK {min, max} PK Bloom filters Bloom filters for PK ranges. Stored in cached btree Col1 Col2 Col3 Interval tree Interval tree keeps track of PK ranges within DiskRowSets DiskRowSet2 (32MB) PK {min, max} PK Bloom filters Col1 Col2 Col3 There might be Ks of sets per tablet
Column encoding within DiskRowSet Pages with data Index for given column For PK: maps row keys to pages Values Size 256KB Page metadata For a standard column: maps row offsets to pages Values Pages are encoded witha variety of encodings, such as dictionary encoding, bitshuffle, or front coding Btree index Page metadata Values Page metadata Values Pages can be compressed: LZ4, gzip, or bzip2 Page metadata
DiskRowSet compaction Periodical task Removes deleted rows Reduces the number of sets with overlapping PK ranges Does not create bigger DiskRowSets 32MB size for each DRS is preserver DiskRowSet1 (32MB) PK {A, G} DiskRowSet1 (32MB) PK {A, D} Compact DiskRowSet2 (32MB) PK {B, E} DiskRowSet2 (32MB) PK {E, G}
Data updates PK has to be provided for UPDATE and DELETE MemRowSet Set Col2=x where Col1=y Using DRS PK ranges and boom filters B+tree Row1,Row2,Row3 Look for Col1=y DiskRowSet1 (32MB) PK {min, max} Look for Col1=y (row_offset, time): Col2=x If the row is there, get its offset PK Bloom filters Col1 Col2 Col3 Compactions are done periodically DiskRowSet2 (32MB) PK {min, max} MemDeltaStore DeltaStore (on disk) B+tree PK Bloom filters Compactions Col1 Col2 Col3 Flush (row_offset,time),…
Summary KUDU is NOT KUDU is trying to be a compromise between a SQL database, a filesystem, an in-memory database not a direct replacement for Hbase or HDFS KUDU is trying to be a compromise between Fast sequential scans Fast random reads Simplifies data ingestion model
Learn more Video: https://www.oreilly.com/ideas/kudu-resolving-transactional-and-analytic-trade-offs-in-hadoop Whitepaper: http://getkudu.io/kudu.pdf KUDU project: https://github.com/cloudera/kudu Get Cloudera Quickstart VM and test it