Introduction to VoltDB

Slides:



Advertisements
Similar presentations
Introduction to VoltDB
Advertisements

Yukon – What is New Rajesh Gala. Yukon – What is new.NET Framework Programming Data Types Exception Handling Batches Databases Database Engine Administration.
Tableau Software Australia
Database Architectures and the Web
The open source database you’ll never outgrow Big Data. Fast Data. June 2011 Ryan Betts, VoltDB Engineering
The NewSQL database you’ll never outgrow Taming the Big Data Fire Hose John Hugg Sr. Software Engineer, VoltDB.
VoltDB: an SQL Developer’s Perspective Tim Callaghan, VoltDB Field Engineer
A Fast Growing Market. Interesting New Players Lyzasoft.
Chapter 13 (Web): Distributed Databases
© 2011 Citrusleaf. All rights reserved.1 A Real-Time NoSQL DB That Preserves ACID Citrusleaf Srini V. Srinivasan Brian Bulkowski VLDB, 09/01/11.
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
Module 14: Scalability and High Availability. Overview Key high availability features available in Oracle and SQL Server Key scalability features available.
Database Technical Session By: Prof. Adarsh Patel.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
By Lecturer / Aisha Dawood 1.  You can control the number of dispatcher processes in the instance. Unlike the number of shared servers, the number of.
Goodbye rows and tables, hello documents and collections.
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
1 Yasin N. Silva Arizona State University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
NoSQL Databases Oracle - Berkeley DB. Content A brief intro to NoSQL About Berkeley Db About our application.
IMDGs An essential part of your architecture. About me
Copyright 2006 MySQL AB The World’s Most Popular Open Source Database MySQL Cluster: An introduction Geert Vanderkelen MySQL AB.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
SQL Server 2005 Implementation and Maintenance Chapter 12: Achieving High Availability Through Replication.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
Cloudera Kudu Introduction
CPT-S Advanced Databases 11 Yinghui Wu EME 49.
Amazon Web Services. Amazon Web Services (AWS) - robust, scalable and affordable infrastructure for cloud computing. This session is about:
1 Gaurav Kohli Xebia Breaking with DBMS and Dating with Relational Hbase.
Platform as a Service (PaaS)
CSCI5570 Large Scale Data Processing Systems
CPT-S 415 Big Data Yinghui Wu EME B45 1.
Platform as a Service (PaaS)
Hadoop.
Introduction to Distributed Platforms
Database Architectures and the Web
Curator: Self-Managing Storage for Enterprise Clusters
An Open Source Project Commonly Used for Processing Big Data Sets
INTRODUCTION TO PIG, HIVE, HBASE and ZOOKEEPER
Docker Birthday #3.
MongoDB Er. Shiva K. Shrestha ME Computer, NCIT
Open Source distributed document DB for an enterprise
Spark Presentation.
Operational & Analytical Database
Maximum Availability Architecture Enterprise Technology Centre.
NOSQL.
A Technical Overview of Microsoft® SQL Server™ 2005 High Availability Beta 2 Matthew Stephen IT Pro Evangelist (SQL Server)
Couchbase Server is a NoSQL Database with a SQL-Based Query Language
Enabling Scalable and HA Ingestion and Real-Time Big Data Insights for the Enterprise OCJUG, 2014.
Introduction What is a Database?.
Introduction to NewSQL
Building Modern Transaction Systems on SQL Server
Database Architectures and the Web
NOSQL databases and Big Data Storage Systems
Agenda VoltDB Technical Overview Comparing VoltDB to Traditional OLTP
Database Performance Tuning and Query Optimization
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Predictive Performance
Ch 4. The Evolution of Analytic Scalability
Clouds & Containers: Case Studies for Big Data
CS703 - Advanced Operating Systems
Overview of big data tools
Distributed Databases
Taming the Big Data Fire Hose
Chapter 11 Database Performance Tuning and Query Optimization
Introduction of Week 14 Return assignment 12-1
CMPE 280 Web UI Design and Development March 14 Class Meeting
Database System Concepts and Architecture
NoSQL databases An introduction and comparison between Mongodb and Mysql document store.
Presentation transcript:

Introduction to VoltDB 06/27/17 Introduction to VoltDB Big Data & Analytics – Unites States AFPOA Fred Holahan, CMO, VoltDB, Inc. e: fholahan@voltdb.com p: +1.978.528.0560 February 2012

Big Data – 3 Vs Properties Applications Solutions Velocity Volume 06/27/17 Big Data – 3 Vs Properties Applications Solutions Velocity Data that’s moving at very high speeds, often coming from real-time acquisition sources such as scanners, sensors and software-based monitors/collectors. Hot caching Real-time analytics Real-time alerting Pre-export enrichment VoltDB and other in-memory RDBMSs Volume Data coming from a variety of sources, accumulating into massive (Petabyte+) historical volumes. Cold storage Batch analytics (patterns, trends, anomalies) Hadoop and analytic datastores Variety Data with properties that are best supported by purpose-built datastores. Examples include document, graph and scientific data. Blogs Online forums Social networks NoSQL datastores

High Volume Analytic Engine Connecting Velocity and Volume 06/27/17 DEEP ANALYTICS (hours and up of latency) TRANSACTIONS, DASHBOARDS, FAST ANALYTICS (milliseconds of latency) High Volume Analytic Engine Incoming Events High Velocity Engine Processed Events Gigabytes to Terabytes of hot state Terabytes and up of cold history Do we put a Variety “stream” in this image? Skipping the Velociy Engine? Others 3

High Velocity Database Requirements 06/27/17 Handle lots of independent events are at a very high frequency Update state, decisioning, transactions, enrichment, etc… Stay up in the face of failures Make handling failures and recovery as automatic as possible Support complex manipulations of state per event Support a range of real-time (or “near-time”) analytics Integrate easily with high volume analytic datastores Raw, enriched or sampled data is migrated to companion stores VoltDB 4 4

What Is VoltDB? In-memory relational DBMS Ultra-high performance 06/27/17 What Is VoltDB? In-memory relational DBMS Ultra-high performance Millions of ACID TPS Single-millisecond latencies Scale out on commodity gear Choose a partitioning key, VoltDB does the heavy lifting Built-in fault tolerance and crash recovery Standard programming interfaces Build apps in the language of your choice Call Java stored procedures with parameterized, embedded SQL Open source (GPL3) and commercial licenses

SQL in Stored Procedures 06/27/17 SQL in Stored Procedures SQL can be parameterized, but not dynamic “select * from foo where bar = ?;” (YES) “select * from ? where bar = ?;” (NO)

Schema Changes Traditional OLTP VoltDB 06/27/17 Schema Changes Traditional OLTP add table… alter table… VoltDB modify schema and stored procedures build catalog deploy catalog V1.0: Add/drop users, stored procedures V1.1: Add/drop tables Future: Add/drop column, …

Table/Index Storage VoltDB is entirely in-memory 06/27/17 Table/Index Storage VoltDB is entirely in-memory Cluster must collectively have enough RAM to hold all tables/indexes (k + 1 copies) Even data distribution is important

Throughput & Scaling Scales to dozens of node 06/27/17 Scales to dozens of node Can easily scale to millions of events/transactions per second Most deployments use fewer than 10 nodes ----- Meeting Notes (6/2/11 13:30) ----- Barron Schwartz quote

Technical Overview – Partitions (1/3) 06/27/17 Technical Overview – Partitions (1/3) 1 partition per physical CPU core Each physical server has multiple VoltDB partitions Data - Two types of tables Partitioned Single column serves as partitioning key Rows are spread across all VoltDB partitions by partition column Transactional data (high frequency of modification) Replicated All rows exist within all VoltDB partitions Relatively static data (low frequency of modification) Code - Two types of work – both ACID Single-Partition All insert/update/delete operations within single partition Majority of transactional workload Multi-Partition CRUD against partitioned tables across multiple partitions Insert/update/delete on replicated tables X

Technical Overview – Partitions (2/3) 06/27/17 Technical Overview – Partitions (2/3) Single-partition vs. Multi-partition select count(*) from orders where customer_id = 5 single-partition select count(*) from orders where product_id = 3 multi-partition insert into orders (customer_id, order_id, product_id) values (3,303,2) single-partition update products set product_name = ‘spork’ where product_id = 3 multi-partition 1 101 2 1 101 3 4 401 2 1 knife 2 spoon 3 fork Partition 1 2 201 1 5 501 3 5 502 2 Partition 2 3 201 1 6 601 1 6 601 2 Partition 3 table orders : customer_id (partition key) (partitioned) order_id product_id table products : product_id (replicated) product_name

Technical Overview – Partitions (3/3) 06/27/17 Technical Overview – Partitions (3/3) Looking inside a VoltDB partition… Each partition contains data and an execution engine. The execution engine contains a queue for transaction requests. Requests are executed sequentially (single threaded). Work Queue execution engine Table Data Index Data - Complete copy of all replicated tables - Portion of rows (about 1/partitions) of all partitioned tables

VoltDB Scaling Model Tables are horizontally split into partitions 06/27/17 Tables are horizontally split into partitions Partitions deployed to CPU cores – scale up and out Infrequently-changing tables replicated across partitions

Inside a VoltDB Partition 06/27/17 Inside a VoltDB Partition Each partition contains data and an execution engine The execution engine contains a queue for transaction requests Requests run to completion, serially, at each partition Work Queue execution engine Table Data Index Data

Technical Overview – Compiling 06/27/17 Technical Overview – Compiling CREATE TABLE HELLOWORLD ( HELLO CHAR(15), WORLD CHAR(15), DIALECT CHAR(15), PRIMARY KEY (DIALECT) ); Schema import org.voltdb. * ; partitionInfo = "HELLOWORLD.DIA @ProcInfo( singlePartition = true ) public class Insert extends VoltPr public final SQLStmt sql = new SQLStmt("INSERT INTO HELLO public VoltTable[] run( String hel partitionInfo = "HE singlePartition = t public final SQLStmt public VoltTable[] run Stored Procedures The database is constructed from The schema (DDL) The work load (Java stored procedures) The Project (users, groups, partitioning) VoltCompiler creates application catalog Copy to servers along with 1 .jar and 1 .so Start servers <?xml version="1.0"?> <project> <database name='data <schema path='ddl. <partition table=‘ </database> </project> Project.xml

SQL Technical Overview - Transactions 06/27/17 Technical Overview - Transactions SQL All access to VoltDB is via Java stored procedures (Java + SQL) A single invocation of a stored procedure is a transaction (committed on success) Limits round trips between DBMS and application High performance client applications communicate asynchronously with VoltDB

VoltDB Transactions SQL 06/27/17 VoltDB Transactions Transaction == Single SQL Statement or Stored Procedure Invocation Committed on Success Java Stored Procedures Java statements with embedded, parameterized SQL Efficiently process SQL at the server Move the code to the data, not the other way around SQL

Client Application Interfaces 06/27/17 Client Application Interfaces Client Options Libraries for Java, C++, C#, PHP, Python, Node.js (Javascript) and other popular languages JSON via HTTP Client connects to the cluster Data location is transparent Topology is transparent Cluster manages routing, data movement and consistency

Procedures routed to, ordered and run at partitions VoltDB Transaction Model 06/27/17 Procedures routed to, ordered and run at partitions VoltDB 19 1919

Transaction Execution 06/27/17 Transaction Execution VoltDB Cluster Single partition transactions All data is in one partition Each partition operates autonomously Multi-partition transactions One partition distributes and coordinates work plans Server 1 Partition 1 Partition 2 Partition 3 Server 2 Partition 4 Partition 5 Partition 6 Server 3 Partition 7 Partition 8 Partition 9

Data Availability and Durability 06/27/17 Data Availability and Durability High Availability Data stored on server replicas (user configurable) Failover data redundancy No single point of failure Database Snapshots Simplifies backup/restore Scheduled, continuous, on demand Cluster-wide consistent copy of all data Command Logging Between Snapshots, every transaction is durable to disk

Tunable fsynch* frequency 06/27/17 Command Logging Tunable fsynch* frequency Tunable snapshot interval Synchronous logging provides highest durability at reduced performance Asynchronous logging best performance at reduced durability * fsynch is when command log buffers are flushed to disk (or SSD)

Database Management & Monitoring 06/27/17 Database Management & Monitoring

06/27/17 VoltDB Customers

VoltDB Resources Technical white papers VoltDB documentation 06/27/17 VoltDB Resources Technical white papers http://voltdb.com/resources/whitepapers VoltDB documentation http://community.voltdb.com/documentation Software downloads http://voltdb.com/products-services/downloads Community forums http://community.voltdb.com/forum Sales contact +1.978.528.4660 sales@voltdb.com

06/27/17 - Thank You - Questions?