IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

Slides:



Advertisements
Similar presentations
Tuning Oracle SQL The Basics of Efficient SQLThe Basics of Efficient SQL Common Sense Indexing The Optimizer –Making SQL Efficient Finding Problem Queries.
Advertisements

Using the Optimizer to Generate an Effective Regression Suite: A First Step Murali M. Krishna Presented by Harumi Kuno HP.
Manajemen Basis Data Pertemuan Matakuliah: M0264/Manajemen Basis Data Tahun: 2008.
5/15/2015Lecture 31 CS 222 Database Management System Spring Lecture 3 Korra Sathya Babu Department of Computer Science NIT Rourkela.
Midterm Review Lecture 14b. 14 Lectures So Far 1.Introduction 2.The Relational Model 3.Disks and Files 4.Relational Algebra 5.File Org, Indexes 6.Relational.
David Konopnicki Choosing Access Path ä The basic methods. ä The access paths and when they are available. ä How the optimizer chooses among the.
Fundamentals, Design, and Implementation, 9/e Chapter 7 Using SQL in Applications.
Chapter 7 Advanced SQL Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
PARALLEL DBMS VS MAP REDUCE “MapReduce and parallel DBMSs: friends or foes?” Stonebraker, Daniel Abadi, David J Dewitt et al.
Lecture 6 Indexing Part 2 Column Stores. Indexes Recap Heap FileBitmapHash FileB+Tree InsertO(1) O( log B n ) DeleteO(P)O(1) O( log B n ) Range Scan O(P)--
1DBTest2008. Motivation Background Relational Data Warehousing (DW) SQL Server 2008 Starjoin improvement Testing Challenge Extending Enterprise-class.
SQL Server Parallel Data Warehouse: Supporting Large Scale Analytics José Blakeley, Software Architect Database Systems Group, Microsoft Corporation.
Accelerating SQL Database Operations on a GPU with CUDA Peter Bakkum & Kevin Skadron The University of Virginia GPGPU-3 Presentation March 14, 2010.
Early Hash Join: A Configurable Algorithm for the Efficient and Early Production of Join Results Ramon Lawrence University of Iowa
Lecture 8 Index Organized Tables Clusters Index compression
Optimizing Queries Using Materialized Views Qiang Wang CS848.
A Paradigm Shift in Database Optimization: From Indices to Aggregates Presented to: The Data Warehousing & Data Mining mini-track – AMCIS 2002 as Research-in-Progress.
Secure Cloud Database using Multiparty Computation.
Getting Started With Ingres VectorWise
Distributed Protein Structure Analysis By Jeremy S. Brown Travis E. Brown.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Automatically Synthesizing SQL Queries from Input-Output Examples Sai Zhang University of Washington Joint work with: Yuyin Sun.
Communicating with the Outside. Overview Package several SQL statements within one call to the database server Embedded procedural language (Transact.
1 © 2012 OpenLink Software, All rights reserved. Virtuoso - Column Store, Adaptive Techniques for RDF Orri Erling Program Manager, Virtuoso Openlink Software.
CSCE Database Systems Chapter 15: Query Execution 1.
Wai Kit Wong 1, Ben Kao 2, David W. Cheung 2, Rongbin Li 2, Siu Ming Yiu 2 1 Hang Seng Management College, Hong Kong 2 University of Hong Kong.
1 Recovery Tuning Main techniques Put the log on a dedicated disk Delay writing updates to the database disks as long as possible Setting proper intervals.
Computer Science Integrity Assurance for Outsourced Databases without DBMS Modification DBSec 2014 Wei Wei, Ting Yu 1.
Module 11: Programming Across Multiple Servers. Overview Introducing Distributed Queries Setting Up a Linked Server Environment Working with Linked Servers.
Top 10 SQL-on-Hadoop Pitfalls Monte Zweben CEO, Splice Machine.
Views In some cases, it is not desirable for all users to see the entire logical model (that is, all the actual relations stored in the database.) In some.
CS Operating System & Database Performance Tuning Xiaofang Zhou School of Computing, NUS Office: S URL:
Parallel Execution Plans Joe Chang
BACS 287 Structured Query Language 1. BACS 287 Visual Basic Table Access Visual Basic provides 2 mechanisms to access data in tables: – Record-at-a-time.
TPC-H Studies Joe Chang
Multi-Way Hash Join Effectiveness M.Sc Thesis Michael Henderson Supervisor Dr. Ramon Lawrence 2.
Buffer-pool aware Query Optimization Ravishankar Ramamurthy David DeWitt University of Wisconsin, Madison.
SQL/Lesson 7/Slide 1 of 32 Implementing Indexes Objectives In this lesson, you will learn to: * Create a clustered index * Create a nonclustered index.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Database Management Systems Chapter 5 SQL.
What is OLAP?.
IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida Physical Database Implementation—Topics.
A Guide to SQL, Eighth Edition Chapter Four Single-Table Queries.
Introduction to Database Systems1 External Sorting Query Processing: Topic 0.
1 SQL II CIS*2450 Advanced Programming Concepts. 2 Data Types INTEGER –numbers without a decimal point –range is to SMALLINT –like.
Auditing Information Leakage for Distance Metrics Yikan Chen David Evans TexPoint fonts used in EMF. Read the TexPoint manual.
Oracle Business Intelligence Foundation - Commonly Used Features in Repository.
COMP 430 Intro. to Database Systems Grouping & Aggregation Slides use ideas from Chris Ré and Chris Jermaine. Get clickers today!
ALITHEIA: Towards Practical Verifiable Graph Processing Yupeng Zhang, Charalampos Papamanthou and Jonathan Katz University of Maryland.
Eugene Meidinger Execution Plans
Presentation Title Goes Here …presentation subtitle. SCRIPTS FOR RETRIEVING DATASETS.
Tuning Oracle SQL The Basics of Efficient SQL Common Sense Indexing
Designing High Performance BIRT Reports
Verifiable Databases and RAM Programs
MongoDB Er. Shiva K. Shrestha ME Computer, NCIT
SQL Server 2000 and Access 2000 limits
Scaling SQL with different approaches
Choosing Access Path The basic methods.
Teradata Join Processing
Chapter 8 Advanced SQL Pearson Education © 2014.
Query Sampling in DB2.
Optimizing Queries Using Materialized Views
Verifiable Oblivious Storage
Physical Database Design
Query Sampling in DB2.
Chapter 4 Summary Query.
Four Rules For Columnstore Query Performance
Course Instructor: Supriya Gupta Asstt. Prof
New 11.0 DataServer Features
Presentation transcript:

IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland

What is a verifiable database? digest δ result proof or  database SQL query Update database’ Update digest δ’ ownerserver client IDnaage 1ali12 2ba24

Our contributions IntegriDB: A system for Verifiable SQL Expressive: multidimensional RANGE, JOIN, SUM, COUNT, MAX, MIN, etc.; limited nesting Validated on native SQL queries from TPC-H benchmark Dynamic: Efficient updates Scalable: Validated on tables from TPC-H benchmark (6 million rows)

1. SELECT SUM (l_extendedprice * (1 - l_discount)) AS revenue FROM lineitem, part WHERE 2. ( p_partkey = l_partkey 3. AND p_brand = ‘Brand#41’ 4. AND p_container IN (‘SM CASE’, ‘SM BOX’, ‘SM PACK’, ‘SM PKG’) 5. AND l_quantity >= 7 AND l_quantity <= AND p_size BETWEEN 1 AND 5 7. AND l_shipmode IN (‘AIR’, ‘AIR REG’) 8. AND l_shipinstruct = ‘DELIVER IN PERSON’ ) 9. OR 10. ( p_partkey = l_partkey 11. AND p_brand = ‘Brand#14’ 12. AND p_container IN (‘MED BAG’, ‘MED BOX’,‘MED PKG’, ‘MED PACK’) 13. AND l_quantity >= 14 AND l_quantity <= AND p_size BETWEEN 1 AND AND l_shipmode IN (‘AIR’, ‘AIR REG’) 16. AND l_shipinstruct = ‘DELIVER IN PERSON’ ) 17. OR 18. ( p_partkey = l_partkey 19. AND p_brand = ‘Brand#23’ 20. AND p_container IN (‘LG CASE’, ‘LG BOX’, ‘LG PACK’, ‘LG PKG’) 21. AND l_quantity >= 25 AND l_quantity <= AND p_size BETWEEN 1 AND AND l_shipmode IN (‘AIR’, ‘AIR REG’) 24. AND l_shipinstruct = ‘DELIVER IN PERSON’ ); Query #19 of the TPC-H benchmark Example

1. SELECT SUM (l_extendedprice * (1 - l_discount)) AS revenue FROM lineitem, part WHERE 2. ( p_partkey = l_partkey 3. AND p_brand = ‘Brand#41’ 4. AND p_container IN (‘SM CASE’, ‘SM BOX’, ‘SM PACK’, ‘SM PKG’) 5. AND l_quantity >= 7 AND l_quantity <= AND p_size BETWEEN 1 AND 5 7. AND l_shipmode IN (‘AIR’, ‘AIR REG’) 8. AND l_shipinstruct = ‘DELIVER IN PERSON’ ) 9. OR 10. ( p_partkey = l_partkey 11. AND p_brand = ‘Brand#14’ 12. AND p_container IN (‘MED BAG’, ‘MED BOX’,‘MED PKG’, ‘MED PACK’) 13. AND l_quantity >= 14 AND l_quantity <= AND p_size BETWEEN 1 AND AND l_shipmode IN (‘AIR’, ‘AIR REG’) 16. AND l_shipinstruct = ‘DELIVER IN PERSON’ ) 17. OR 18. ( p_partkey = l_partkey 19. AND p_brand = ‘Brand#23’ 20. AND p_container IN (‘LG CASE’, ‘LG BOX’, ‘LG PACK’, ‘LG PKG’) 21. AND l_quantity >= 25 AND l_quantity <= AND p_size BETWEEN 1 AND AND l_shipmode IN (‘AIR’, ‘AIR REG’) 24. AND l_shipinstruct = ‘DELIVER IN PERSON’ ); Query #19 of the TPC-H benchmark Example Executed on tables with 6 million rows (2.8GB) Proof size: KB Verification time: 232ms

Metrics result proof or  database SQL query Update digest δ ownerserver client IDnaage 1ali12 2ba24 setup time prover time proof size. “Optimal”: O(|R|) verification time update time

Prior solutions Generic verifiable computation systems Circuit-based VC [PHCM13, BCTV13, CFHKNPZ14] RAM-based VC [BFRSBW13, BCGTV13, BCTV14] Not practical Authenticated data structures JoinMultidim range FunctionsNested queries Update Tree-based [YPPK09]  Signature-based [PZM09]  Multi-range [PPT14]  IntegriDB

Multidimensional range queries row_IDstudent_IDageGPAFirst_name Alice Bob Cathy David Table T:

Multidimensional range queries row_IDstudent_IDageGPAFirst_name Alice Bob Cathy David SELECT * FROM T WHERE (T.age BETWEEN 17 AND 22) AND (T.student_ID > 720) Table T: row_IDstudent_IDageGPAFirst_name Alice David Result:

Multidimensional range queries row_IDstudent_IDageGPAFirst_name Alice Bob Cathy David Table T: 18, {1} 20, {4} 21, {3} 24, {2} (age, row_ID): 18, {1,4} 21, {2,3} 18, {1,2,3,4} SELECT * FROM T WHERE (T.age BETWEEN 17 AND 22) AND (T.student_ID > 720)

Multidimensional range queries row_IDstudent_IDageGPAFirst_name Alice Bob Cathy David Table T: 18, {1} 20, {4} 21, {3} 24, {2} 18, {1,4} 21, {2,3} 18, {1,2,3,4} Result of age: {1,4} U {3} = {1,3,4} 715, {3} 721, {4} 747, {1} 781, {2} 715, {3,4} 747, {1,2} 715, {1,2,3,4} Result of student_ID: {4} U {1,2} = {1,2,4} Final result: {1,3,4} ∩ {1,2,4} = {1, 4} (student_ID, row_ID): SELECT * FROM T WHERE (T.age BETWEEN 17 AND 22) AND (T.student_ID > 720) (age, row_ID):

Multidimensional range queries row_IDstudent_IDageGPAFirst_name Alice Bob Cathy David Table T: 18, {1} 20, {4} 21, {3} 24, {2} 18, {1,4} 21, {2,3} 18, {1,2,3,4} Result of age: {1,4} U {3} = {1,3,4} 715, {3} 721, {4} 747, {1} 781, {2} 715, {3,4} 747, {1,2} 715, {1,2,3,4} Result of student_ID: {4} U {1,2} = {1,2,4} Final result: {1,3,4} ∩ {1,2,4} = {1, 4} (student_ID, row_ID): SELECT * FROM T WHERE (T.age BETWEEN 17 AND 22) AND (T.student_ID > 720) (age, row_ID): Not efficient!!

Multidimensional range queries 18, {1} 20, {4} 21, {3} 24, {2} (age, row_ID): 18, {1,4} 21, {2,3} 18, {1,2,3,4} Final result: {1,3,4} ∩ {1,2,4} = {1, 4}

Multidimensional range queries 18, acc({1}) 20, acc({4}) 21, acc({3}) 24, acc({2}) (age, row_ID): 18, acc({1,4}) 21, acc({2,3}) 18, acc({1,2,3,4}) Final result: {1,3,4} ∩ {1,2,4} = {1, 4}

Join queries row_IDstudent_IDcourse_ID 1715ENEE ENEE ENEE340 Table T’: SELECT * FROM T JOIN T’ ON T.student_ID = T’.student_ID student_IDagecourse_ID 71521ENEE ENEE340 Result: row_IDstudent_IDageGPAFirst_name Alice Bob Cathy David Table T: GPAFirst_name 3.0Cathy 3.3Bob

Supporting summation We extend the bilinear accumulator to support summation

row_IDstudent_IDageGPAFirst_name Alice Bob Cathy David Table T: SQL SUM SELECT SUM(age) FROM T: 83 (age, age):

SQL functions SELECT MAX(age) FROM A WHERE A.GPA BETWEEN 3.0 AND SELECT age FROM A WHERE (A.GPA BETWEEN 3.0 AND 3.5) AND (A.age >= 24) row_IDstudent_IDageGPAFirst_name Alice Bob Cathy David Table T: SELECT COUNT(*) FROM A WHERE A.age BETWEEN 17 AND 22: 2 SELECT SUM(row_ID’) – SUM(row_ID) FROM A WHERE A.age BETWEEN 17 AND 22 row_ID’

Nested queries row_IDstudent_IDageGPAFirst_name Alice Bob Cathy David row_IDstudent_IDcourse_ID 1715ENEE ENEE ENEE340 18, {747} 20, {721} 21, {715} 24, {781} (T.age, T.student_ID): 18, {721,747} 21, {715,781} 18, {715,721,747,781} 1, {715} 2, {779,781} 1, {715,715,781} (T’.row_ID, T’.student_ID): SELECT COUNT(student_ID) FROM (SELECT * FROM T WHERE age>19) JOIN (SELECT * FROM T’ WHERE row_ID >1) ON T.student_ID = T’.student_ID 3, {781} 2, {779}

Nested queries row_IDstudent_IDcourse_ID 1715ENEE ENEE ENEE340 Final result: COUNT( ({721} U {715,781}) {715,781} ∩ ) SELECT COUNT(student_ID) FROM (SELECT * FROM T WHERE age>19) JOIN (SELECT * FROM T’ WHERE row_ID >1) ON T.student_ID = T’.student_ID 18, {747} 20, {721} 21, {715} 24, {781} (T.age, T.student_ID): 18, {721,747} 21, {715,781} 18, {715,721,747,781} 1, {715} 2, {779,781} 1, {715,715,781} (T’.row_ID, T’.student_ID): 3, {781} 2, {779} row_IDstudent_IDageGPAFirst_name Alice Bob Cathy David

Efficient update Updates can be done in logarithmic time See our paper for details

Implementation IntegriDB Client SQL Client SQL query Client IntegriDB query IntegriDB Server SQL Server subqueries intermediate results Result & proof Result or  Server IntegriDB Data Owner digest ADS database

1. SELECT SUM (l_extendedprice * (1 - l_discount)) AS revenue FROM lineitem, part WHERE 2. ( p_partkey = l_partkey 3. AND p_brand = ‘Brand#41’ 4. AND p_container IN (‘SM CASE’, ‘SM BOX’, ‘SM PACK’, ‘SM PKG’) 5. AND l_quantity >= 7 AND l_quantity <= AND p_size BETWEEN 1 AND 5 7. AND l_shipmode IN (‘AIR’, ‘AIR REG’) 8. AND l_shipinstruct = ‘DELIVER IN PERSON’ ) 9. OR 10. ( p_partkey = l_partkey 11. AND p_brand = ‘Brand#14’ 12. AND p_container IN (‘MED BAG’, ‘MED BOX’,‘MED PKG’, ‘MED PACK’) 13. AND l_quantity >= 14 AND l_quantity <= AND p_size BETWEEN 1 AND AND l_shipmode IN (‘AIR’, ‘AIR REG’) 16. AND l_shipinstruct = ‘DELIVER IN PERSON’ ) 17. OR 18. ( p_partkey = l_partkey 19. AND p_brand = ‘Brand#23’ 20. AND p_container IN (‘LG CASE’, ‘LG BOX’, ‘LG PACK’, ‘LG PKG’) 21. AND l_quantity >= 25 AND l_quantity <= AND p_size BETWEEN 1 AND AND l_shipmode IN (‘AIR’, ‘AIR REG’) 24. AND l_shipinstruct = ‘DELIVER IN PERSON’ ); TPC-H #19

1. SELECT SUM (l_extendedprice * (1 - l_discount)) AS revenue FROM lineitem, part WHERE 2. ( p_partkey = l_partkey 3. AND p_brand = ‘Brand#41’ 4. AND p_container IN (‘SM CASE’, ‘SM BOX’, ‘SM PACK’, ‘SM PKG’) 5. AND l_quantity >= 7 AND l_quantity <= AND p_size BETWEEN 1 AND 5 7. AND l_shipmode IN (‘AIR’, ‘AIR REG’) 8. AND l_shipinstruct = ‘DELIVER IN PERSON’ ) 9. OR 10. ( p_partkey = l_partkey 11. AND p_brand = ‘Brand#14’ 12. AND p_container IN (‘MED BAG’, ‘MED BOX’,‘MED PKG’, ‘MED PACK’) 13. AND l_quantity >= 14 AND l_quantity <= AND p_size BETWEEN 1 AND AND l_shipmode IN (‘AIR’, ‘AIR REG’) 16. AND l_shipinstruct = ‘DELIVER IN PERSON’ ) 17. OR 18. ( p_partkey = l_partkey 19. AND p_brand = ‘Brand#23’ 20. AND p_container IN (‘LG CASE’, ‘LG BOX’, ‘LG PACK’, ‘LG PKG’) 21. AND l_quantity >= 25 AND l_quantity <= AND p_size BETWEEN 1 AND AND l_shipmode IN (‘AIR’, ‘AIR REG’) 24. AND l_shipinstruct = ‘DELIVER IN PERSON’ ); TPC-H #19

1. SELECT SUM (l_extendedprice * (1 - l_discount)) AS revenue FROM lineitem, part WHERE 2. ( p_partkey = l_partkey 3. AND p_brand = ‘Brand#41’ 4. AND p_container IN (‘SM CASE’, ‘SM BOX’, ‘SM PACK’, ‘SM PKG’) 5. AND l_quantity >= 7 AND l_quantity <= AND p_size BETWEEN 1 AND 5 7. AND l_shipmode IN (‘AIR’, ‘AIR REG’) 8. AND l_shipinstruct = ‘DELIVER IN PERSON’ ) 9. OR 10. ( p_partkey = l_partkey 11. AND p_brand = ‘Brand#14’ 12. AND p_container IN (‘MED BAG’, ‘MED BOX’,‘MED PKG’, ‘MED PACK’) 13. AND l_quantity >= 14 AND l_quantity <= AND p_size BETWEEN 1 AND AND l_shipmode IN (‘AIR’, ‘AIR REG’) 16. AND l_shipinstruct = ‘DELIVER IN PERSON’ ) 17. OR 18. ( p_partkey = l_partkey 19. AND p_brand = ‘Brand#23’ 20. AND p_container IN (‘LG CASE’, ‘LG BOX’, ‘LG PACK’, ‘LG PKG’) 21. AND l_quantity >= 25 AND l_quantity <= AND p_size BETWEEN 1 AND AND l_shipmode IN (‘AIR’, ‘AIR REG’) 24. AND l_shipinstruct = ‘DELIVER IN PERSON’ ); TPC-H #19 Join query Multi-range on part Multi-range on lineitem

Performance on query #19 Setup timeProver time Verification time Proof size Update time Digest size s s232ms184.16KB150ms256bits lineitem: 6 million rows x 16 columns part: 200,000 rows x 9 columns 2.8GB 54MB Disadvantages: Slow setup time Slow prover time Advantages: Small digest size Fast verification time Small proof size Excellent update time One-time cost

Expressiveness IntegriDB supports 12 out of 22 queries in TPC-H benchmark 94% of the queries in TPC-C benchmark

Future work Support larger class of SQL queries Join with duplicates in nested queries Aggregations among columns Arbitrary nesting Improve prover time Parallelization Better data structures, precomputation Faster crypto primitives

Code available at

Problems of generic circuit-based VC Cannot support updates Cannot support nested queries All queries must be merged into one circuit Problems of generic RAM-based VC Compiler of SQL queries introduces high overhead All queries must be merged into one RAM program Our comparison: only 1 query, no compiler included for generic VC Comparison to generic VC

Libsnark (circuit-based) [BCTV13] SNARKs for C (RAM-based) [BCTV14] IntegriDB setup time187.96s2000s*13.878s prover time47.57s1000s*10.420s verification time8ms10ms*112ms proof size288 Bytes 84 KB SELECT SUM(c 1 ) FROM T WHERE (c1 BETWEEN a1 AND b1) AND... AND (c 10 BETWEEN a 10 AND b 10 ) Sum on a 10-dimensional range query on a table of 1000x10 table *estimation