Download presentation
Presentation is loading. Please wait.
1
Databases Architectures & Hypertable
Doug Judd CEO, Hypertable, Inc.
2
Database Terminology
3
Structured, Semi-Structured, and Unstructured Data
Structured is what RDBMS store Data is broken into discrete components Types associated with each component: integer, floating point, date, string Unstructured is free-form text Semi-structured is combination of sturctured and semi-structured
4
Document-Oriented Semi-structured documents
Accepts documents in a format such as JSON, XML, YAML Often Schema-less Auto-index fields Examples: CouchDB, MongoDB Best Fit: XML or Web documents
5
Graph Databases Database designed to represent graphs
APIs for performing graph operations Traversal (depth-first, breadth-first) Shortest/Cheapest path Partitioning Some allow Hypergraphs Examples: Neo4j, HyperGraphDB, InfoGrid, AllegroGraph, Sones, DEX, FlockDB, OrientDB, VertexDB, InfiniteGraph, Filament More info: sones graphdb landscape
6
Column-Oriented Data physically stored by column
RDBMS typically row-oriented Improved performance for column operations Better data compression Examples: Hypertable, HBase, Cassandra, Vertica
7
In-Memory Data set stored in RAM Extremely fast access
Limited capacity Examples: Memcached, Redis, MonetDB, VoltDB
8
Horizontal Scalability
Scale out Increase capacity by adding machines Opposite of vertical scalability (scale up) Commodity Hardware
9
Distributed Hash Table (DHT)
Horizontally Scalable Decentralized Fast access Restricted API: GET,SET,DELETE Peer-to-peer file sharing systems: BitTorrent, Napster, Gnutella, Freenet Examples: Dynamo, Cassandra, Riak, Project Voldemort, SimpleDB, S3, Redis, Scalaris, Membase
10
Scalable Database Architectures
11
Auto-Sharding Splits table data into horizontal “shards”
Shards managed by traditional RDBMS (e.g. MySQL, Postgres) Automated “glue” code to handle sharding and request routing Examples: MongoDB, AsterData, Greenplum
12
MongoDB
13
Dynamo Developed by Amazon.com for their Shopping Cart
Designed for high write availability Eventually Consistent DHT Implementations: Cassandra Project Voldemort Riak Dynomite
14
Eventual Consistency Database update semantics in a distributed system with data replication Strong Consistency - after an update completes all processes see the updated value Eventual Consistency - eventually all processes will see the updated value Most well-known eventual consistency system is DNS
15
Eventual Consistency
16
Consistent Hashing
17
Amazon AWS S3 SimpleDB RDS Online storage web service
Designed for larger amounts of data Cost $0.15/GB per month SimpleDB Designed for smaller amounts of data Provides indexing and richer query capability Cost $027/GB per month + machine utilization fee RDS Managed MySQL instances
18
Order Preserving Partitioner (Cassandra)
… + … / 2 = …
19
Order Preserving Partitioner Balance Problem
20
Bigtable: the infrastructure that Google is built on
Bigtable underpins 100+ Google services, including: YouTube, Blogger, Google Earth, Google Maps, Orkut, Gmail, Google Analytics, Google Book Search, Google Code, Crawl Database… Implementations Hypertable HBase Describe the 360 degree panoramic view feature of Google Maps
21
Google Stack GFS - Replicates data inter-machine
MapReduce - Efficiently process data in GFS Bigtable - Indexed table structure
22
Google File System
23
Google File System
24
System Overview
25
Data Model Sparse, two-dimensional table with cell versions
Cells are identified by a 4-part key Row (string) Column Family (byte) Column Qualifier (string) Timestamp (long integer) Spend some time
26
Table: Visual Representation
Spend some time.
27
Table: Actual Representation
28
Scaling (part I)
29
Scaling (part II)
30
Scaling (part III)
31
Request Routing
32
Hypertable
33
Hypertable Overview Massively Scalable Database
Modeled after Google’s Bigtable High Performance Implementation (C++) Thrift Interface for all popular High Level Languages: Java, Ruby, Python, PHP, etc Open Source (GPL license) Project started March Zvents
34
Hypertable In Use Today
35
Hypertable vs. HBase
36
Hypertable vs. HBase Test Hypertable Advantage Relative to HBase (%)
Random Read Zipfian 80 GB 925 Random Read Zipfian 20 GB 777 Random Read Zipfian 2.5 GB 100 Random Write 10KB values 51 Random Write 1KB values 102 Random Write 100 byte values 427 Random Write 10 byte values 931 Sequential Read 10KB values 1060 Sequential Read 1KB values 68 Sequential Read 100 byte values 129 Scan 10KB values 2 Scan 1KB values 58 Scan 100 byte values 75 Scan 10 byte values 220
37
Annual EC2 Cost Savings Assuming 200% improvement
Extra large reserved instances
38
Resources Project Site Twitter Commercial Support
Twitter hypertable Commercial Support Performance Evaluation Write-up blog.hypertable.com/?p=14
39
Q&A
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.