VoltDB: an SQL Developer’s Perspective Tim Callaghan, VoltDB Field Engineer

Slides:



Advertisements
Similar presentations
Yukon – What is New Rajesh Gala. Yukon – What is new.NET Framework Programming Data Types Exception Handling Batches Databases Database Engine Administration.
Advertisements

Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Database System Concepts and Architecture
Database Architectures and the Web
Andy Pavlo April 13, 2015April 13, 2015April 13, 2015 NewS QL.
The open source database you’ll never outgrow Big Data. Fast Data. June 2011 Ryan Betts, VoltDB Engineering
The NewSQL database you’ll never outgrow Taming the Big Data Fire Hose John Hugg Sr. Software Engineer, VoltDB.
Management Information Systems, Sixth Edition
SQL Server Replication
Chapter 13 (Web): Distributed Databases
NoSQL Databases: MongoDB vs Cassandra
Fundamentals, Design, and Implementation, 9/e Chapter 12 ODBC, OLE DB, ADO, and ASP.
Overview Distributed vs. decentralized Why distributed databases
Fundamentals, Design, and Implementation, 9/e COS 346 DAY 22.
Chapter 9: The Client/Server Database Environment
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Overview of Database Languages and Architectures.
Chapter 7 Managing Data Sources. ASP.NET 2.0, Third Edition2.
Module 14: Scalability and High Availability. Overview Key high availability features available in Oracle and SQL Server Key scalability features available.
Distributed Databases
Lecture The Client/Server Database Environment
Discover, Master, InfluenceSlide 1 SQL Server Compact Edition and the Entity Framework Rob Sanders Readify.
The Client/Server Database Environment
Cloud Computing for the Enterprise November 18th, This work is licensed under a Creative Commons.
Software Engineer, #MongoDBDays.
CSCI 6962: Server-side Design and Programming
Overview of SQL Server Alka Arora.
IMS 4212: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida Distributed Databases Business needs.
Training Workshop Windows Azure Platform. Presentation Outline (hidden slide): Technical Level: 200 Intended Audience: Developers Objectives (what do.
Practical Database Design and Tuning. Outline  Practical Database Design and Tuning Physical Database Design in Relational Databases An Overview of Database.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Goodbye rows and tables, hello documents and collections.
Lecture Set 14 B new Introduction to Databases - Database Processing: The Connected Model (Using DataReaders)
1 Yasin N. Silva Arizona State University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
NoSQL Databases Oracle - Berkeley DB. Content A brief intro to NoSQL About Berkeley Db About our application.
Copyright 2006 MySQL AB The World’s Most Popular Open Source Database MySQL Cluster: An introduction Geert Vanderkelen MySQL AB.
Module 11: Programming Across Multiple Servers. Overview Introducing Distributed Queries Setting Up a Linked Server Environment Working with Linked Servers.
H-Store: A Specialized Architecture for High-throughput OLTP Applications Evan Jones (MIT) Andrew Pavlo (Brown) 13 th Intl. Workshop on High Performance.
Applications hitting a wall today with SQL Server Locking/Latching Scale-up Throughput or latency SLA Applications which do not use SQL Server.
Overview – Chapter 11 SQL 710 Overview of Replication
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation MongoDB Architecture.
IN-MEMORY OLTP By Manohar Punna SQL Server Geeks – Regional Mentor, Hyderabad Blogger, Speaker.
1 CS 430 Database Theory Winter 2005 Lecture 2: General Concepts.
1 Biometric Databases. 2 Overview Problems associated with Biometric databases Some practical solutions Some existing DBMS.
ESRI User Conference 2004 ArcSDE. Some Nuggets Setup Performance Distribution Geodatabase History.
Chapter 9 Database Systems © 2007 Pearson Addison-Wesley. All rights reserved.
SQL Server 2005 Implementation and Maintenance Chapter 12: Achieving High Availability Through Replication.
Chapter 5 Introduction To Form Builder. Lesson A Objectives  Display Forms Builder forms in a Web browser  Use a data block form to view, insert, update,
Chapter 1 Database Access from Client Applications.
Lock Tuning. Overview Data definition language (DDL) statements are considered harmful DDL is the language used to access and manipulate catalog or metadata.
CPT-S Advanced Databases 11 Yinghui Wu EME 49.
Introduction to Core Database Concepts Getting started with Databases and Structure Query Language (SQL)
Log Shipping, Mirroring, Replication and Clustering Which should I use? That depends on a few questions we must ask the user. We will go over these questions.
Physical Layer of a Repository. March 6, 2009 Agenda – What is a Repository? –What is meant by Physical Layer? –Data Source, Connection Pool, Tables and.
Introduction to Node.js® Jitendra Kumar Patel Saturday, January 31, 2015.
ISC321 Database Systems I Chapter 2: Overview of Database Languages and Architectures Fall 2015 Dr. Abdullah Almutairi.
Introducing Hekaton The next step in SQL Server OLTP performance Mladen Prajdić
CSCI5570 Large Scale Data Processing Systems
CPT-S 415 Big Data Yinghui Wu EME B45 1.
Introduction to VoltDB
MongoDB Er. Shiva K. Shrestha ME Computer, NCIT
UFC #1433 In-Memory tables 2014 vs 2016
The Client/Server Database Environment
Introduction to NewSQL
Agenda VoltDB Technical Overview Comparing VoltDB to Traditional OLTP
SQL 2014 In-Memory OLTP What, Why, and How
Predictive Performance
Taming the Big Data Fire Hose
Introduction of Week 14 Return assignment 12-1
Presentation transcript:

VoltDB: an SQL Developer’s Perspective Tim Callaghan, VoltDB Field Engineer

March 3, 2009 | Agenda VoltDB Technical Overview Comparing VoltDB to Traditional OLTP Migration and Development Q+A 2

March 3, 2009 | About Me VoltDB Field Engineer/Community Advocate Joined VoltDB in September, 2009 Support commercial and community customers Built many examples and POCs Technical background 18 years of Oracle design/development/administration User of many programming languages – database and traditional See last slide for full contact information 3

March 3, 2009 | VoltDB Technical Overview 4

March 3, 2009 | Before I Begin… VoltDB is available in community and commercial editions Runs on Linux + Mac* 5

March 3, 2009 | XX X XX Scaling Traditional OLTP Databases Sharding improves performance but introduces… Management complexity +disjointed backup/recovery and replication +manual effort to re-partition data Application complexity +shard awareness +cross partition joins +cross partition transactions And, each shard still suffers from traditional OLTP performance limitations If you can shard, your application is probably great in VoltDB.

March 3, 2009 | Technical Overview “OLTP Through the Looking Glass” VoltDB avoids the overhead of traditional databases K-safety for fault tolerance - no logging In memory operation for maximum throughput - no buffer management Partitions operate autonomously and single-threaded - no latching or locking Built to horizontally scale X X X X 7

March 3, 2009 | XX X XX Technical Overview – Partitions (1/3) 1 partition per physical CPU core Each physical server has multiple VoltDB partitions Data - Two types of tables Partitioned +Single column serves as partitioning key +Rows are spread across all VoltDB partitions by partition column +Transactional data (high frequency of modification) Replicated +All rows exist within all VoltDB partitions +Relatively static data (low frequency of modification) Code - Two types of work – both ACID Single-Partition +All insert/update/delete operations within single partition +Majority of transactional workload Multi-Partition +CRUD against partitioned tables across multiple partitions +Insert/update/delete on replicated tables

March 3, 2009 | Technical Overview – Partitions (2/3) Single-partition vs. Multi-partition knife 2spoon 3fork Partition knife 2spoon 3fork Partition knife 2spoon 3fork Partition 3 table orders : customer_id (partition key) (partitioned)order_id product_id table products : product_id (replicated)product_name select count(*) from orders where customer_id = 5 single-partition select count(*) from orders where product_id = 3 multi-partition insert into orders (customer_id, order_id, product_id) values (3,303,2) single-partition update products set product_name = ‘spork’ where product_id = 3 multi-partition

March 3, 2009 | Technical Overview – Partitions (3/3) Looking inside a VoltDB partition… Each partition contains data and an execution engine. The execution engine contains a queue for transaction requests. Requests are executed sequentially (single threaded). Work Queue execution engine Table Data Index Data - Complete copy of all replicated tables - Portion of rows (about 1/partitions) of all partitioned tables

March 3, 2009 | Technical Overview – Compiling The database is constructed from The schema (DDL) The work load (Java stored procedures) The Project (users, groups, partitioning) VoltCompiler creates application catalog Copy to servers along with 1.jar and 1.so Start servers CREATE TABLE HELLOWORLD ( HELLO CHAR(15), WORLD CHAR(15), DIALECT CHAR(15), PRIMARY KEY (DIALECT) ); Schema import org.voltdb. * partitionInfo = "HELLOWORLD.DIA singlePartition = true ) public class Insert extends VoltPr public final SQLStmt sql = new SQLStmt("INSERT INTO HELLO public VoltTable[] run( String hel import org.voltdb. * partitionInfo = "HELLOWORLD.DIA singlePartition = true ) public class Insert extends VoltPr public final SQLStmt sql = new SQLStmt("INSERT INTO HELLO public VoltTable[] run( String hel import org.voltdb. * partitionInfo = "HE singlePartition = t public final SQLStmt public VoltTable[] run Stored Procedures <database name='data <schema path='ddl. <partition table=‘ Project.xml

March 3, 2009 | Technical Overview - Transactions All access to VoltDB is via Java stored procedures (Java + SQL) A single invocation of a stored procedure is a transaction (committed on success) Limits round trips between DBMS and application High performance client applications communicate asynchronously with VoltDB SQL

March 3, 2009 | Technical Overview – Clusters/Durability Scalability Increase RAM in servers to add capacity Add servers to increase performance / capacity Consistently measuring 90% of single-node performance increase per additional node High availability K-safety for redundancy Snapshots Scheduled, continuous, on demand Spooling to data warehouse Disaster Recovery/WAN replication (Future) Asynchronous replication

March 3, 2009 | Comparing VoltDB to Traditional OLTP (in no particular order) 14

March 3, 2009 | Asynchronous Communications Client applications communicate asynchronously with VoltDB Stored procedure invocations are placed “on the wire” Responses are pulled from the server Allows a single client application to generate > 100K TPS Our client library will simulate synchronous if needed Traditional salary := get_salary(employee_id); VoltDB callProcedure(asyncCallback, “get_salary”, employee_id); 15

March 3, 2009 | Transaction Control VoltDB does not support client-side transaction control Client applications cannot: +insert into t_colors (color_name) values (‘purple’); +rollback; Stored procedures commit if successful, rollback if failed Client code in stored procedure can call for rollback 16

March 3, 2009 | Interfacing with VoltDB Client applications interface with VoltDB via stored procedures Java stored procedures – Java and SQL No ODBC/JDBC 17

March 3, 2009 | Lack of concurrency Single-threaded execution within partitions (single- partition) or across partitions (multi-partition) No need to worry about locking/dead-locks great for “inventory” type applications +checking inventory levels +creating line items for customers Because of this, transactions execute in microseconds. However, single-threaded comes at a price Other transactions wait for running transaction to complete Don’t do anything crazy in a SP (request web page, send ) Useful for OLTP, not OLAP 18

March 3, 2009 | Throughput vs. Latency VoltDB is built for throughput over latency Latency measured in mid single-digits in a properly sized cluster Do not estimate latency as (1 / TPS) 19

March 3, 2009 | SQL Support SELECT, INSERT (using values), UPDATE, and DELETE Aggregate SQL supports AVG, COUNT, MAX, MIN, SUM Materialized views using COUNT and SUM Hash and Tree Indexes SQL functions and functionality will be added over time, for now I do it in Java Execution plan for all SQL is created at compile time and available for analysis 20

March 3, 2009 | SQL in Stored Procedures SQL can be parameterized, but not dynamic “select * from foo where bar = ?;” (YES) “select * from ? where bar = ?;”(NO)

March 3, 2009 | Connecting to the Cluster Clients connect to one or more nodes in the VoltDB cluster, transactions are forwarded to the correct node Clients are not aware of partitioning strategy In the future we may send back data in the response indicating if the transaction was sent to the correct node.

March 3, 2009 | Schema Changes Traditional OLTP add table… alter table… VoltDB modify schema and stored procedures build catalog deploy catalog V1.0: Add/drop users, stored procedures V1.1: Add/drop tables Future: Add/drop column, …

March 3, 2009 | Table/Index Storage VoltDB is entirely in-memory Cluster must collectively have enough RAM to hold all tables/indexes (k + 1 copies) Even data distribution is important

March 3, 2009 | Migration and Development 25

March 3, 2009 | Migrating to VoltDB 1.Grab your existing DDL, SQL, and stored procedures. 2.Compile your VoltDB Application. 3.Sorry… Review your application as a whole Partitioning is HUGE Everything inside stored procedures SQL requirements

March 3, 2009 | Client Libraries We support Java and C++ natively PHP is via C++/SWIG SWIG can produce many others Long-term roadmap is native PHP, Python, C#, and others. Community has developed Erlang (wire protocol) and Ruby (HTTP/JSON) HTTP/JSON interface also available Easily used by most languages Server can handle ~1,000 requests per second

March 3, 2009 | Getting Data In and Out In: Create simple client application to read flat file and load into table(s). 1 SQL stored procedures can be defined in XML Working on a generic utility for loading Out: Snapshot to flat file Snapshots can be converted to CSV/TSV data Out: EL Special type of tables in VoltDB +export (insert/update/delete) or export only (insert only) +client application reads from buffers and acks when done currently targeting file-system, JDBC coming

March 3, 2009 | Q & A Visit to… Download VoltDB Get sample app code Join the VoltDB community VoltDB user groups: Follow VoltDB on Contact me (Twitter) 29