Hypertable Doug Judd www.hypertable.org. Background  Zvents plan is to become the “Google” of local search  Identified the need for a scalable DB 

Slides:



Advertisements
Similar presentations
Tomcy Thankachan  Introduction  Data model  Building Blocks  Implementation  Refinements  Performance Evaluation  Real applications  Conclusion.
Advertisements

CS525: Special Topics in DBs Large-Scale Data Management HBase Spring 2013 WPI, Mohamed Eltabakh 1.
Big Table Alon pluda.
The Hadoop RDBMS Replace Oracle with Hadoop John Leach CTO and Co-Founder J.
Kafka high-throughput, persistent, multi-reader streams
SwatI Agarwal, Thomas Pan eBay Inc.
Bigtable: A Distributed Storage System for Structured Data Presenter: Guangdong Liu Jan 24 th, 2012.
Google File System 1Arun Sundaram – Operating Systems.
Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.
Lecture 7 – Bigtable CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation is licensed.
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
7/2/2015EECS 584, Fall Bigtable: A Distributed Storage System for Structured Data Jing Zhang Reference: Handling Large Datasets at Google: Current.
 Pouria Pirzadeh  3 rd year student in CS  PhD  Vandana Ayyalasomayajula  1 st year student in CS  Masters.
Distributed storage for structured data
BigTable CSE 490h, Autumn What is BigTable? z “A BigTable is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Gowtham Rajappan. HDFS – Hadoop Distributed File System modeled on Google GFS. Hadoop MapReduce – Similar to Google MapReduce Hbase – Similar to Google.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Bigtable: A Distributed Storage System for Structured Data F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach M. Burrows, T. Chandra, A. Fikes, R.E.
Google Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber.
Hadoop File Formats and Data Ingestion
Introduction to Hadoop 趨勢科技研發實驗室. Copyright Trend Micro Inc. Outline Introduction to Hadoop project HDFS (Hadoop Distributed File System) overview.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Goodbye rows and tables, hello documents and collections.
Introduction to Hadoop and HDFS
VLDB2012 Hoang Tam Vo #1, Sheng Wang #2, Divyakant Agrawal †3, Gang Chen §4, Beng Chin Ooi #5 #National University of Singapore, †University of California,
Bigtable: A Distributed Storage System for Structured Data Google’s NoSQL Solution 2013/4/1Title1 Chao Wang Fay Chang, Jeffrey Dean, Sanjay.
NoSQL Databases Oracle - Berkeley DB. Content A brief intro to NoSQL About Berkeley Db About our application.
Google Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber.
Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.
Bigtable: A Distributed Storage System for Structured Data 1.
Google Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber.
Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows,
Key/Value Stores CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook.
1 HBase Intro 王耀聰 陳威宇
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Data storing and data access. Adding a row with Java API import org.apache.hadoop.hbase.* 1.Configuration creation Configuration config = HBaseConfiguration.create();
CS 347Lecture 9B1 CS 347: Parallel and Distributed Data Management Notes 13: BigTable, HBASE, Cassandra Hector Garcia-Molina.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface.
GFS : Google File System Ömer Faruk İnce Fatih University - Computer Engineering Cloud Computing
HBase Elke A. Rundensteiner Fall 2013
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition File System Implementation.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
1 MSRBot Web Crawler Dennis Fetterly Microsoft Research Silicon Valley Lab © Microsoft Corporation.
CSC590 Selected Topics Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A.
 Introduction  Architecture NameNode, DataNodes, HDFS Client, CheckpointNode, BackupNode, Snapshots  File I/O Operations and Replica Management File.
HDB++: High Availability with
Cloudera Kudu Introduction
Bigtable : A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach Mike Burrows,
Bigtable: A Distributed Storage System for Structured Data
Lock Tuning. Overview Data definition language (DDL) statements are considered harmful DDL is the language used to access and manipulate catalog or metadata.
Bigtable: A Distributed Storage System for Structured Data Google Inc. OSDI 2006.
Apache Accumulo CMSC 491 Hadoop-Based Distributed Computing Spring 2016 Adam Shook.
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)
Bigtable A Distributed Storage System for Structured Data.
Google Cloud computing techniques (Lecture 03) 18th Jan 20161Dr.S.Sridhar, Director, RVCT, RVCE, Bangalore
JAFKA A fast MQ. overview ● ● 271KB single jar ● 3.5MB with all dependencies and configurations.
1 Gaurav Kohli Xebia Breaking with DBMS and Dating with Relational Hbase.
Bigtable A Distributed Storage System for Structured Data
HBase Mohamed Eltabakh
Bigtable: A Distributed Storage System for Structured Data
How did it start? • At Google • • • • Lots of semi structured data
Data Management in the Cloud
CSE-291 (Cloud Computing) Fall 2016
Google Filesystem Some slides taken from Alan Sussman.
آزمايشگاه سيستمهای هوشمند علی کمالی زمستان 95
THE GOOGLE FILE SYSTEM.
by Mikael Bjerga & Arne Lange
Presentation transcript:

Hypertable Doug Judd

Background  Zvents plan is to become the “Google” of local search  Identified the need for a scalable DB  No solutions existed  Bigtable was the logical choice  Project started February 2007

Zvents Deployment  Traffic Reports  Change Log  Writing 1 Billion cells/day

Baidu Deployment  Log processing/viewing app injecting approximately 500GB of data per day  120-node cluster running Hypertable and HDFS  16GB RAM  4x dual core Xeon  8TB storage  Developed in-house fork with modifications for scale  Working on a new crawl DB to store up to 1 petabyte of crawl data

hypertable.org Hypertable  What is it?  Open source Bigtable clone  Manages massive sparse tables with timestamped cell versions  Single primary key index  What is it not?  No joins  No secondary indexes (not yet)  No transactions (not yet)

hypertable.org Scaling (part I)

hypertable.org Scaling (part II)

hypertable.org Scaling (part III)

System Overview

hypertable.org Table: Visual Representation

hypertable.org Table: Actual Representation

hypertable.org Anatomy of a Key  MVCC - snapshot isolation  Bigtable uses copy-on-write  Timestamp and revision shared by default  Simple byte-wise comparison

hypertable.org Range Server  Manages ranges of table data  Caches updates in memory (CellCache)  Periodically spills (compacts) cached updates to disk (CellStore)

Range Server: CellStore  Sequence of 65K blocks of compressed key/value pairs

hypertable.org Compression  CellStore and CommitLog Blocks  Supported Compression Schemes  zlib --best  zlib --fast  lzo  quicklz  bmz  none

hypertable.org Performance Optimizations  Block Cache  Caches CellStore blocks  Blocks are cached uncompressed  Bloom Filter  Avoids unnecessary disk access  Filter by rows or rows+columns  Configurable false positive rate  Access Groups  Physically store co-accessed columns together  Improves performance by minimizing I/O

Commit Log  One per RangeServer  Updates destined for many Ranges  One commit log write  One commit log sync  Log is directory  100MB fragment files  Append by creating a new fragment file  NO_LOG_SYNC option  Group commit (TBD)

Request Throttling  RangeServer tracks memory usage  Config properties  Hypertable.RangeServer.MemoryLimit  Hypertable.RangeServer.MemoryLimit.Percentage (70%)  Request queue is paused when memory usage hits threshold  Heap fragmentation  tcmalloc - good  glibc - not so good

hypertable.org C++ vs. Java  Hypertable is CPU intensive  Manages large in-memory key/value map  Lots of key manipulation and comparisons  Alternate compression codecs (e.g. BMZ)  Hypertable is memory intensive  GC less efficient than explicitly managed memory  Less memory means more merging compactions  Inefficient memory usage = poor cache performance

Language Bindings  Primary API is C++  Thrift Broker provides bindings for:  Java  Python  PHP  Ruby  And more ( Perl, Erlang, Haskell, C#, Cocoa, Smalltalk, and Ocaml )

Client API class Client { void create_table(const String &name, const String &schema); Table *open_table(const String &name); void alter_table(const String &name, const String &schema); String get_schema(const String &name); void get_tables(vector &tables); void drop_table(const String &name, bool if_exists); };

hypertable.org Client API (cont.) class Table { TableMutator *create_mutator(); TableScanner *create_scanner(ScanSpec &scan_spec); }; class TableMutator { void set(KeySpec &key, const void *value, int value_len); void set_delete(KeySpec &key); void flush(); }; class TableScanner { bool next(CellT &cell); };

hypertable.org Client API (cont.) class ScanSpecBuilder { void set_row_limit(int n); void set_max_versions(int n); void add_column(const String &name); void add_row(const String &row_key); void add_row_interval(const String &start, bool sinc, const String &end, bool einc); void add_cell(const String &row, const String &column); void add_cell_interval(…) void set_time_interval(int64_t start, int64_t end); void clear(); ScanSpec &get(); }

Testing: Failure Inducer  Command line argument --induce-failure= : :  Command line argument --induce-failure= : :  Class definition class FailureInducer { public: void parse_option(String option); void maybe_fail(const String &label); };  In the code if (failure_inducer) failure_inducer->maybe_fail("split-1");

hypertable.org 1TB Load Test  1TB data  8 node cluster  GHz dual-core Opteron  4 GB RAM  3 x 7200 RPM 250MB SATA drives  Key size = 10 bytes  Value size = 20KB (compressible text)  Replication factor: 3  4 simultaneous insert clients  ~ 50 MB/s load (sustained)  ~ 30 MB/s scan

Performance Test (random read/write)  Single machine  1 x 1.8 GHz dual-core Opteron  4 GB RAM  Local Filesystem  250MB / 1KB values  Normal Table / lzo compression Batched writes 31K inserts/s (31MB/s) Non-batched writes (serial) 500 inserts/s (500KB/s) Random reads (serial) 5800 queries/s (5.8MB/s)

hypertable.org Project Status  Current release is “alpha”  Waiting for Hadoop 0.21 (fsync)  TODO for “beta”  Namespaces  Master directed RangeServer recovery  Range balancing

hypertable.org Questions? 