the NewSQL database you’ll never outgrow Taming the Big Data Fire Hose John Hugg Sr. Software Engineer, VoltDB
VoltDB2 Big Data Defined Velocity + Moves at very high rates (think sensor-driven systems) + Valuable in its temporal, high velocity state Volume + Fast-moving data creates massive historical archives + Valuable for mining patterns, trends and relationships Variety + Structured (logs, business transactions) + Semi-structured and unstructured
VoltDB3 Lower-frequency operations High-frequency operations Data Source Example Big Data Use Cases Capital markets Write/index all trades, store tick data Show consolidated risk across traders Call initiation requestReal-time authorizationFraud detection/analysis Inbound HTTP requests Visitor logging, analysis, alerting Traffic pattern analytics Online game Rank scores: Defined intervals Player “bests” Leaderboard lookups Real-time ad trading systems Match form factor, placement criteria, bid/ask Report ad performance from exhaust stream Mobile device location sensor Location updates, QoS, transactions Analytics on transactions
VoltDB4 Big Data and You Incoming data streams are different than traditional business apps +You need to write data quickly and reliably, but … It’s not just about high speed writes +You need to validate in real-time +You need to count and aggregate +You need to analyze in real-time +You need to scale on demand +You may need to transact Big Data and You
VoltDB5 Big Data Management Infrastructure Online gaming Ad serving Sensor data Internet commerce SaaS, Web 2.0 Mobile platforms Financial trade Structured data ACID guarantees Relational/SQL Real-time analytics NewSQL Unstructured data Eventual consistency Schemaless KV, document NoSQL Other OLAP data stores Analytic Datastore High VelocityHigh Volume
VoltDB6 Big Data Management Infrastructure Online gaming Ad serving Sensor data Internet commerce SaaS, Web 2.0 Mobile platforms Financial trade NewSQL NoSQL Other OLAP data stores Analytic Datastore High VelocityHigh Volume
High Velocity Data Management
VoltDB8 High Velocity DBMS Requirements Ingest at very high speeds and rates Scale easily to meet growth and demand peaks Support integrated fault tolerance Support a wide range of real-time (or “near-time”) analytics Integrate easily with high volume analytic datastores
VoltDB9 High Speed Data Ingestion Support millions of write operations per second at scale Read and write latencies below 50 milliseconds Provide ACID-level consistency guarantees (maybe) Support one or more well-known application interfaces + SQL + Key/Value + Document
VoltDB10 Scale to Meet Growth and Demand Scale-out on commodity hardware Built-in database partitioning + Manual sharding and/or add-on solutions are brittle, require apps to do “heavy lifting”, and can be an operational nightmare Database must automatically implement defined partitioning strategy + Application should “see” a single database instance Database should encourage scalability best practices + For example, replication of reference data minimizes need for multi-partition operations
VoltDB11 A Look Inside Partitioning knife 2spoon 3fork Partition knife 2spoon 3fork Partition knife 2spoon 3fork Partition 3 table orders : customer_id (partition key) (partitioned)order_id product_id table products : product_id (replicated)product_name select count(*) from orders where customer_id = 5 single-partition select count(*) from orders where product_id = 3 multi-partition insert into orders (customer_id, order_id, product_id) values (3,303,2) single-partition update products set product_name = ‘spork’ where product_id = 3 multi-partition
VoltDB12 Integrated Fault Tolerance Database should transparently support built-in “Tandem-style” HA + Users should be able to easily increase/decrease fault tolerance levels Database should be easily and quickly recoverable in the event of severe hardware failures Database should be able to automatically detect and manage a variety of partition fault conditions Downed nodes should be “rejoinable” without the need for service windows
VoltDB13 Partition Detection & Recovery Server A Server B Server C Network fault protection Detects partition event Determines which side of fault to disable Snapshots and disables orphaned node(s) Server A Server B Server C Live node rejoin Allows “downed” nodes to rejoin live cluster Automatically re-synchs all node data Coordinates transactions during re-synch
VoltDB14 Real-time Analytics Database should support a wide variety of high performance reads + High-frequency single-partition + Lower-frequency multi-partition Common analytic queries should be optimized in the database + Multi-partition aggregations, limits, etc. Database should accommodate a flexible range of relational data operations + Particularly relevant to structured data
VoltDB15 Integration with Analytic Datastores Database should offer high performance, transactional export Export should allow a wide variety of common data enrichment operations + Normalize and de-normalize + De-duplicate + Aggregate Architecture should support loosely-coupled integrations + Impedance mismatches + Durability
VoltDB16 VoltDB Export Data Flow Loosely-coupled, asynchronous Queue must be durable Bi-directional durability High Velocity Database Cluster
VoltDB17 Summary Big Data infrastructures will usually require more than one engine + High velocity engine for “fast” data + Analytic engine for “deep” data Data characteristics will often determine which high velocity engine to use + NewSQL is often well-suited to structured data + NoSQL is often a good fit for unstructured data Choose solutions that suit your needs and are designed for interoperability