Presentation is loading. Please wait.

Presentation is loading. Please wait.

Databases Architectures & Hypertable

Similar presentations


Presentation on theme: "Databases Architectures & Hypertable"— Presentation transcript:

1 Databases Architectures & Hypertable
Doug Judd CEO, Hypertable, Inc.

2 Database Terminology

3 Structured, Semi-Structured, and Unstructured Data
Structured is what RDBMS store Data is broken into discrete components Types associated with each component: integer, floating point, date, string Unstructured is free-form text Semi-structured is combination of sturctured and semi-structured

4 Document-Oriented Semi-structured documents
Accepts documents in a format such as JSON, XML, YAML Often Schema-less Auto-index fields Examples: CouchDB, MongoDB Best Fit: XML or Web documents

5 Graph Databases Database designed to represent graphs
APIs for performing graph operations Traversal (depth-first, breadth-first) Shortest/Cheapest path Partitioning Some allow Hypergraphs Examples: Neo4j, HyperGraphDB, InfoGrid, AllegroGraph, Sones, DEX, FlockDB, OrientDB, VertexDB, InfiniteGraph, Filament More info: sones graphdb landscape

6 Column-Oriented Data physically stored by column
RDBMS typically row-oriented Improved performance for column operations Better data compression Examples: Hypertable, HBase, Cassandra, Vertica

7 In-Memory Data set stored in RAM Extremely fast access
Limited capacity Examples: Memcached, Redis, MonetDB, VoltDB

8 Horizontal Scalability
Scale out Increase capacity by adding machines Opposite of vertical scalability (scale up) Commodity Hardware

9 Distributed Hash Table (DHT)
Horizontally Scalable Decentralized Fast access Restricted API: GET,SET,DELETE Peer-to-peer file sharing systems: BitTorrent, Napster, Gnutella, Freenet Examples: Dynamo, Cassandra, Riak, Project Voldemort, SimpleDB, S3, Redis, Scalaris, Membase

10 Scalable Database Architectures

11 Auto-Sharding Splits table data into horizontal “shards”
Shards managed by traditional RDBMS (e.g. MySQL, Postgres) Automated “glue” code to handle sharding and request routing Examples: MongoDB, AsterData, Greenplum

12 MongoDB

13 Dynamo Developed by Amazon.com for their Shopping Cart
Designed for high write availability Eventually Consistent DHT Implementations: Cassandra Project Voldemort Riak Dynomite

14 Eventual Consistency Database update semantics in a distributed system with data replication Strong Consistency - after an update completes all processes see the updated value Eventual Consistency - eventually all processes will see the updated value Most well-known eventual consistency system is DNS

15 Eventual Consistency

16 Consistent Hashing

17 Amazon AWS S3 SimpleDB RDS Online storage web service
Designed for larger amounts of data Cost $0.15/GB per month SimpleDB Designed for smaller amounts of data Provides indexing and richer query capability Cost $027/GB per month + machine utilization fee RDS Managed MySQL instances

18 Order Preserving Partitioner (Cassandra)
+ / 2 =

19 Order Preserving Partitioner Balance Problem

20 Bigtable: the infrastructure that Google is built on
Bigtable underpins 100+ Google services, including: YouTube, Blogger, Google Earth, Google Maps, Orkut, Gmail, Google Analytics, Google Book Search, Google Code, Crawl Database… Implementations Hypertable HBase Describe the 360 degree panoramic view feature of Google Maps

21 Google Stack GFS - Replicates data inter-machine
MapReduce - Efficiently process data in GFS Bigtable - Indexed table structure

22 Google File System

23 Google File System

24 System Overview

25 Data Model Sparse, two-dimensional table with cell versions
Cells are identified by a 4-part key Row (string) Column Family (byte) Column Qualifier (string) Timestamp (long integer) Spend some time

26 Table: Visual Representation
Spend some time.

27 Table: Actual Representation

28 Scaling (part I)

29 Scaling (part II)

30 Scaling (part III)

31 Request Routing

32 Hypertable

33 Hypertable Overview Massively Scalable Database
Modeled after Google’s Bigtable High Performance Implementation (C++) Thrift Interface for all popular High Level Languages: Java, Ruby, Python, PHP, etc Open Source (GPL license) Project started March Zvents

34 Hypertable In Use Today

35 Hypertable vs. HBase

36 Hypertable vs. HBase Test Hypertable Advantage Relative to HBase (%)
Random Read Zipfian 80 GB 925 Random Read Zipfian 20 GB 777 Random Read Zipfian 2.5 GB 100 Random Write 10KB values 51 Random Write 1KB values 102 Random Write 100 byte values 427 Random Write 10 byte values 931 Sequential Read 10KB values 1060 Sequential Read 1KB values 68 Sequential Read 100 byte values 129 Scan 10KB values 2 Scan 1KB values 58 Scan 100 byte values 75 Scan 10 byte values 220

37 Annual EC2 Cost Savings Assuming 200% improvement
Extra large reserved instances

38 Resources Project Site Twitter Commercial Support
Twitter hypertable Commercial Support Performance Evaluation Write-up blog.hypertable.com/?p=14

39 Q&A


Download ppt "Databases Architectures & Hypertable"

Similar presentations


Ads by Google