Introduction to VoltDB

Introduction to VoltDB
06/27/17 Introduction to VoltDB Big Data & Analytics – Unites States AFPOA Fred Holahan, CMO, VoltDB, Inc. e: p: February 2012

Big Data – 3 Vs Properties Applications Solutions Velocity Volume
06/27/17 Big Data – 3 Vs Properties Applications Solutions Velocity Data that’s moving at very high speeds, often coming from real-time acquisition sources such as scanners, sensors and software-based monitors/collectors. Hot caching Real-time analytics Real-time alerting Pre-export enrichment VoltDB and other in-memory RDBMSs Volume Data coming from a variety of sources, accumulating into massive (Petabyte+) historical volumes. Cold storage Batch analytics (patterns, trends, anomalies) Hadoop and analytic datastores Variety Data with properties that are best supported by purpose-built datastores. Examples include document, graph and scientific data. Blogs Online forums Social networks NoSQL datastores

High Volume Analytic Engine
Connecting Velocity and Volume 06/27/17 DEEP ANALYTICS (hours and up of latency) TRANSACTIONS, DASHBOARDS, FAST ANALYTICS (milliseconds of latency) High Volume Analytic Engine Incoming Events High Velocity Engine Processed Events Gigabytes to Terabytes of hot state Terabytes and up of cold history Do we put a Variety “stream” in this image? Skipping the Velociy Engine? Others 3

High Velocity Database Requirements
06/27/17 Handle lots of independent events are at a very high frequency Update state, decisioning, transactions, enrichment, etc… Stay up in the face of failures Make handling failures and recovery as automatic as possible Support complex manipulations of state per event Support a range of real-time (or “near-time”) analytics Integrate easily with high volume analytic datastores Raw, enriched or sampled data is migrated to companion stores VoltDB 4 4

What Is VoltDB? In-memory relational DBMS Ultra-high performance
06/27/17 What Is VoltDB? In-memory relational DBMS Ultra-high performance Millions of ACID TPS Single-millisecond latencies Scale out on commodity gear Choose a partitioning key, VoltDB does the heavy lifting Built-in fault tolerance and crash recovery Standard programming interfaces Build apps in the language of your choice Call Java stored procedures with parameterized, embedded SQL Open source (GPL3) and commercial licenses

SQL in Stored Procedures
06/27/17 SQL in Stored Procedures SQL can be parameterized, but not dynamic “select * from foo where bar = ?;” (YES) “select * from ? where bar = ?;” (NO)

Schema Changes Traditional OLTP VoltDB
06/27/17 Schema Changes Traditional OLTP add table… alter table… VoltDB modify schema and stored procedures build catalog deploy catalog V1.0: Add/drop users, stored procedures V1.1: Add/drop tables Future: Add/drop column, …

Table/Index Storage VoltDB is entirely in-memory
06/27/17 Table/Index Storage VoltDB is entirely in-memory Cluster must collectively have enough RAM to hold all tables/indexes (k + 1 copies) Even data distribution is important

Throughput & Scaling Scales to dozens of node
06/27/17 Scales to dozens of node Can easily scale to millions of events/transactions per second Most deployments use fewer than 10 nodes ----- Meeting Notes (6/2/11 13:30) ----- Barron Schwartz quote

Technical Overview – Partitions (1/3)
06/27/17 Technical Overview – Partitions (1/3) 1 partition per physical CPU core Each physical server has multiple VoltDB partitions Data - Two types of tables Partitioned Single column serves as partitioning key Rows are spread across all VoltDB partitions by partition column Transactional data (high frequency of modification) Replicated All rows exist within all VoltDB partitions Relatively static data (low frequency of modification) Code - Two types of work – both ACID Single-Partition All insert/update/delete operations within single partition Majority of transactional workload Multi-Partition CRUD against partitioned tables across multiple partitions Insert/update/delete on replicated tables X

06/27/17 Technical Overview – Partitions (2/3) Single-partition vs. Multi-partition select count(*) from orders where customer_id = 5 single-partition select count(*) from orders where product_id = 3 multi-partition insert into orders (customer_id, order_id, product_id) values (3,303,2) single-partition update products set product_name = ‘spork’ where product_id = 3 multi-partition 1 knife 2 spoon 3 fork Partition 1 Partition 2 Partition 3 table orders : customer_id (partition key) (partitioned) order_id product_id table products : product_id (replicated) product_name

06/27/17 Technical Overview – Partitions (3/3) Looking inside a VoltDB partition… Each partition contains data and an execution engine. The execution engine contains a queue for transaction requests. Requests are executed sequentially (single threaded). Work Queue execution engine Table Data Index Data - Complete copy of all replicated tables - Portion of rows (about 1/partitions) of all partitioned tables

VoltDB Scaling Model Tables are horizontally split into partitions
06/27/17 Tables are horizontally split into partitions Partitions deployed to CPU cores – scale up and out Infrequently-changing tables replicated across partitions

Inside a VoltDB Partition
06/27/17 Inside a VoltDB Partition Each partition contains data and an execution engine The execution engine contains a queue for transaction requests Requests run to completion, serially, at each partition Work Queue execution engine Table Data Index Data

Technical Overview – Compiling
06/27/17 Technical Overview – Compiling CREATE TABLE HELLOWORLD ( HELLO CHAR(15), WORLD CHAR(15), DIALECT CHAR(15), PRIMARY KEY (DIALECT) ); Schema import org.voltdb. * ; partitionInfo = "HELLOWORLD.DIA @ProcInfo( singlePartition = true ) public class Insert extends VoltPr public final SQLStmt sql = new SQLStmt("INSERT INTO HELLO public VoltTable[] run( String hel partitionInfo = "HE singlePartition = t public final SQLStmt public VoltTable[] run Stored Procedures The database is constructed from The schema (DDL) The work load (Java stored procedures) The Project (users, groups, partitioning) VoltCompiler creates application catalog Copy to servers along with 1 .jar and 1 .so Start servers <?xml version="1.0"?> <project> <database name='data <schema path='ddl. <partition table=‘ </database> </project> Project.xml

SQL Technical Overview - Transactions
06/27/17 Technical Overview - Transactions SQL All access to VoltDB is via Java stored procedures (Java + SQL) A single invocation of a stored procedure is a transaction (committed on success) Limits round trips between DBMS and application High performance client applications communicate asynchronously with VoltDB

VoltDB Transactions SQL
06/27/17 VoltDB Transactions Transaction == Single SQL Statement or Stored Procedure Invocation Committed on Success Java Stored Procedures Java statements with embedded, parameterized SQL Efficiently process SQL at the server Move the code to the data, not the other way around SQL

Client Application Interfaces
06/27/17 Client Application Interfaces Client Options Libraries for Java, C++, C#, PHP, Python, Node.js (Javascript) and other popular languages JSON via HTTP Client connects to the cluster Data location is transparent Topology is transparent Cluster manages routing, data movement and consistency

Procedures routed to, ordered and run at partitions
VoltDB Transaction Model 06/27/17 Procedures routed to, ordered and run at partitions VoltDB 19 1919

Transaction Execution
06/27/17 Transaction Execution VoltDB Cluster Single partition transactions All data is in one partition Each partition operates autonomously Multi-partition transactions One partition distributes and coordinates work plans Server 1 Partition 1 Partition 2 Partition 3 Server 2 Partition 4 Partition 5 Partition 6 Server 3 Partition 7 Partition 8 Partition 9

Data Availability and Durability
06/27/17 Data Availability and Durability High Availability Data stored on server replicas (user configurable) Failover data redundancy No single point of failure Database Snapshots Simplifies backup/restore Scheduled, continuous, on demand Cluster-wide consistent copy of all data Command Logging Between Snapshots, every transaction is durable to disk

Tunable fsynch* frequency
06/27/17 Command Logging Tunable fsynch* frequency Tunable snapshot interval Synchronous logging provides highest durability at reduced performance Asynchronous logging best performance at reduced durability * fsynch is when command log buffers are flushed to disk (or SSD)

Database Management & Monitoring
06/27/17 Database Management & Monitoring

06/27/17 VoltDB Customers

VoltDB Resources Technical white papers VoltDB documentation
06/27/17 VoltDB Resources Technical white papers VoltDB documentation Software downloads Community forums Sales contact

06/27/17 - Thank You - Questions?

Introduction to VoltDB

Similar presentations

Presentation on theme: "Introduction to VoltDB"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Introduction to VoltDB

Similar presentations

Presentation on theme: "Introduction to VoltDB"— Presentation transcript:

Similar presentations

About project

Feedback