Databases Architectures & Hypertable

Databases Architectures & Hypertable
Doug Judd CEO, Hypertable, Inc.

Database Terminology

Structured, Semi-Structured, and Unstructured Data
Structured is what RDBMS store Data is broken into discrete components Types associated with each component: integer, floating point, date, string Unstructured is free-form text Semi-structured is combination of sturctured and semi-structured

Document-Oriented Semi-structured documents
Accepts documents in a format such as JSON, XML, YAML Often Schema-less Auto-index fields Examples: CouchDB, MongoDB Best Fit: XML or Web documents

Graph Databases Database designed to represent graphs
APIs for performing graph operations Traversal (depth-first, breadth-first) Shortest/Cheapest path Partitioning Some allow Hypergraphs Examples: Neo4j, HyperGraphDB, InfoGrid, AllegroGraph, Sones, DEX, FlockDB, OrientDB, VertexDB, InfiniteGraph, Filament More info: sones graphdb landscape

Column-Oriented Data physically stored by column
RDBMS typically row-oriented Improved performance for column operations Better data compression Examples: Hypertable, HBase, Cassandra, Vertica

In-Memory Data set stored in RAM Extremely fast access
Limited capacity Examples: Memcached, Redis, MonetDB, VoltDB

Horizontal Scalability
Scale out Increase capacity by adding machines Opposite of vertical scalability (scale up) Commodity Hardware

Distributed Hash Table (DHT)
Horizontally Scalable Decentralized Fast access Restricted API: GET,SET,DELETE Peer-to-peer file sharing systems: BitTorrent, Napster, Gnutella, Freenet Examples: Dynamo, Cassandra, Riak, Project Voldemort, SimpleDB, S3, Redis, Scalaris, Membase

Scalable Database Architectures

Auto-Sharding Splits table data into horizontal “shards”
Shards managed by traditional RDBMS (e.g. MySQL, Postgres) Automated “glue” code to handle sharding and request routing Examples: MongoDB, AsterData, Greenplum

MongoDB

Dynamo Developed by Amazon.com for their Shopping Cart
Designed for high write availability Eventually Consistent DHT Implementations: Cassandra Project Voldemort Riak Dynomite

Eventual Consistency Database update semantics in a distributed system with data replication Strong Consistency - after an update completes all processes see the updated value Eventual Consistency - eventually all processes will see the updated value Most well-known eventual consistency system is DNS

Eventual Consistency

Consistent Hashing

Amazon AWS S3 SimpleDB RDS Online storage web service
Designed for larger amounts of data Cost $0.15/GB per month SimpleDB Designed for smaller amounts of data Provides indexing and richer query capability Cost $027/GB per month + machine utilization fee RDS Managed MySQL instances

Order Preserving Partitioner (Cassandra)
… + … / 2 = …

Order Preserving Partitioner Balance Problem

Bigtable: the infrastructure that Google is built on
Bigtable underpins 100+ Google services, including: YouTube, Blogger, Google Earth, Google Maps, Orkut, Gmail, Google Analytics, Google Book Search, Google Code, Crawl Database… Implementations Hypertable HBase Describe the 360 degree panoramic view feature of Google Maps

Google Stack GFS - Replicates data inter-machine
MapReduce - Efficiently process data in GFS Bigtable - Indexed table structure

Google File System

System Overview

Data Model Sparse, two-dimensional table with cell versions
Cells are identified by a 4-part key Row (string) Column Family (byte) Column Qualifier (string) Timestamp (long integer) Spend some time

Table: Visual Representation
Spend some time.

Table: Actual Representation

Scaling (part I)

Scaling (part II)

Scaling (part III)

Request Routing

Hypertable

Hypertable Overview Massively Scalable Database
Modeled after Google’s Bigtable High Performance Implementation (C++) Thrift Interface for all popular High Level Languages: Java, Ruby, Python, PHP, etc Open Source (GPL license) Project started March Zvents

Hypertable In Use Today

Hypertable vs. HBase

Hypertable vs. HBase Test Hypertable Advantage Relative to HBase (%)
Random Read Zipfian 80 GB 925 Random Read Zipfian 20 GB 777 Random Read Zipfian 2.5 GB 100 Random Write 10KB values 51 Random Write 1KB values 102 Random Write 100 byte values 427 Random Write 10 byte values 931 Sequential Read 10KB values 1060 Sequential Read 1KB values 68 Sequential Read 100 byte values 129 Scan 10KB values 2 Scan 1KB values 58 Scan 100 byte values 75 Scan 10 byte values 220

Annual EC2 Cost Savings Assuming 200% improvement
Extra large reserved instances

Resources Project Site Twitter Commercial Support
Twitter hypertable Commercial Support Performance Evaluation Write-up blog.hypertable.com/?p=14

Databases Architectures & Hypertable

Similar presentations

Presentation on theme: "Databases Architectures & Hypertable"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Databases Architectures & Hypertable

Similar presentations

Presentation on theme: "Databases Architectures & Hypertable"— Presentation transcript:

Similar presentations

About project

Feedback