Computer Science Integrity Assurance for Outsourced Databases without DBMS Modification DBSec 2014 Wei Wei, Ting Yu 1.

Slides:

Advertisements

Similar presentations

Efficient Kerberized Multicast Olga Kornievskaia University of Michigan Giovanni Di Crescenzo Telcordia Technologies.

Advertisements

CSC 774 Advanced Network Security

Optimizing Join Enumeration in Transformation-based Query Optimizers ANIL SHANBHAG, S. SUDARSHAN IIT BOMBAY VLDB 2014

The Volcano/Cascades Query Optimization Framework

CSC 774 Advanced Network Security

1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.

Processing Data in External Storage CS Data Structures Mehmet H Gunes Modified from authors’ slides.

B+-tree and Hashing.

Vault: A Secure Binding Service Guor-Huar Lu, Changho Choi, Zhi-Li Zhang University of Minnesota.

Dept. of Computer Science & Engineering, CUHK1 Trust- and Clustering-Based Authentication Services in Mobile Ad Hoc Networks Edith Ngai and Michael R.

1 Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Amnon Shochot.

SafeQ: Secure and Efficient Query Processing in Sensor Networks Fei Chen and Alex X. Liu Department of Computer Science and Engineering Michigan State.

An Authentication Service Against Dishonest Users in Mobile Ad Hoc Networks Edith Ngai, Michael R. Lyu, and Roland T. Chin IEEE Aerospace Conference, Big.

Chapter 3: Data Storage and Access Methods

Privacy and Integrity Preserving in Distributed Systems Presented for Ph.D. Qualifying Examination Fei Chen Michigan State University August 25 th, 2009.

DSAC (Digital Signature Aggregation and Chaining) Digital Signature Aggregation & Chaining An approach to ensure integrity of outsourced databases.

1 Tree-Structured Indexes Chapter Introduction  As for any index, 3 alternatives for data entries k* :  Data record with key value k   Choice.

Chapter 4: Transaction Management

DSAC (Digital Signature Aggregation and Chaining) Digital Signature Aggregation & Chaining An approach to ensure integrity of outsourced databases.

Wide-area cooperative storage with CFS

Chapter 7 Managing Data Sources. ASP.NET 2.0, Third Edition2.

WHAT IS A DATABASE ? a collection of data organized to help easy retrieval & usage.

Working with SQL and PL/SQL/ Session 1 / 1 of 27 SQL Server Architecture.

Yin Yang, Dimitris Papadias, Stavros Papadopoulos HKUST, Hong Kong Panos Kalnis KAUST, Saudi Arabia Providence, USA, 2009.

Cong Wang1, Qian Wang1, Kui Ren1 and Wenjing Lou2

Construction of efficient PDP scheme for Distributed Cloud Storage. By Manognya Reddy Kondam.

Privacy Preserving Query Processing in Cloud Computing Wen Jie

Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.

Database Management 8. course. Query types Equality query – Each field has to be equal to a constant Range query – Not all the fields have to be equal.

Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

Database Systems: Design, Implementation, and Management Tenth Edition Chapter 11 Database Performance Tuning and Query Optimization.

Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.

An Investigation of Oracle and SQL Server with respect to Integrity, and SQL Language standards Presented by: Paul Tarwireyi Supervisor: John Ebden Date:

Efficient Keyword Search over Virtual XML Views Feng Shao and Lin Guo and Chavdar Botev and Anand Bhaskar and Muthiah Chettiar and Fan Yang Cornell University.

XML as a Boxwood Data Structure Feng Zhou, John MacCormick, Lidong Zhou, Nick Murphy, Chandu Thekkath 8/20/04.

Lecture Set 14 B new Introduction to Databases - Database Processing: The Connected Model (Using DataReaders)

Computer Science iBigTable: Practical Data Integrity for BigTable in Public Cloud CODASPY 2013 Wei Wei, Ting Yu, Rui Xue 1/40.

DBSQL 14-1 Copyright © Genetic Computer School 2009 Chapter 14 Microsoft SQL Server.

Module 7 Reading SQL Server® 2008 R2 Execution Plans.

SQL Server 7.0 Maintaining Referential Integrity.

Storing Organizational Information - Databases

Set Containment Joins: The Good, The Bad and The Ugly Karthikeyan Ramasamy Jointly With Jignesh Patel, Jeffrey F. Naughton and Raghav Kaushik.

McGraw-Hill/Irwin © 2008 The McGraw-Hill Companies, All Rights Reserved Chapter 7 Storing Organizational Information - Databases.

Hash/B+ Tree/R Tree Muneeb Mahmood Ashfaq Ahmed Jim Kang.

ASYNCHRONOUS LARGE-SCALE CERTIFICATION BASED ON CERTIFICATE VERIFICATION TREES Josep Domingo-Ferrer, Marc Alba and Francesc Sebé Dept. of Computer Engineering.

This document is for academic purposes only. © 2012 Department of Computer Science, Hong Kong Baptist University. All rights reserved. 1 Authenticating.

1 Biometric Databases. 2 Overview Problems associated with Biometric databases Some practical solutions Some existing DBMS.

Merkle trees Introduced by Ralph Merkle, 1979 An authentication scheme

Qian Chen, Haibo Hu, Jianliang Xu Hong Kong Baptist University Authenticating Top-k Queries in Location-based Services with Confidentiality1.

Johannes Kepler University Linz Department of Business Informatics Data & Knowledge Engineering Altenberger Str. 69, 4040 Linz Austria/Europe

Mining Document Collections to Facilitate Accurate Approximate Entity Matching Presented By Harshda Vabale.

Privacy Preserving Payments in Credit Networks By: Moreno-Sanchez et al from Saarland University Presented By: Cody Watson Some Slides Borrowed From NDSS’15.

Data Integrity Proofs in Cloud Storage Author: Sravan Kumar R and Ashutosh Saxena. Source: The Third International Conference on Communication Systems.

Spring 2003 ECE569 Lecture 05.1 ECE 569 Database System Engineering Spring 2003 Yanyong Zhang

Session 1 Module 1: Introduction to Data Integrity

CIS 250 Advanced Computer Applications Database Management Systems.

Spring 2004 ECE569 Lecture 05.1 ECE 569 Database System Engineering Spring 2004 Yanyong Zhang

CS 540 Database Management Systems

McGraw-Hill/Irwin © 2008 The McGraw-Hill Companies, All Rights Reserved Chapter 7 Storing Organizational Information - Databases.

Introduction to Databases Angela Clark University of South Alabama.

Secure Data Outsourcing

Database Laboratory Regular Seminar TaeHoon Kim Article.

Memory Protection through Dynamic Access Control Kun Zhang, Tao Zhang and Santosh Pande College of Computing Georgia Institute of Technology.

Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.

A Case Study in Building Layered DHT Applications

Indexing Structures for Files and Physical Database Design

CS 540 Database Management Systems

Indexing 4/11/2019.

Ensuring Correctness over Untrusted Private Database

Efficient Aggregation over Objects with Extent

Presentation transcript:

Computer Science Integrity Assurance for Outsourced Databases without DBMS Modification DBSec 2014 Wei Wei, Ting Yu 1

Computer Science Overview  Motivation o Database outsourcing is a cost-effective solution o Integrity for Outsourced Databases It has been an active research area in decades Existing solutions requires modifying DBMSs No existing cloud database services support integrity checking  Our Focus o Provide integrity assurance for outsourced databases without DBMS modification  Basic Idea o Build a Merkle Hash Tree based Authenticated Data Structure per table o De-serialize authentication data into tables with well-designed format Support highly efficient authentication data retrieval 2

Computer Science Database Outsourcing Model 3 idcol 1 …col n 0Alice… Ben…2000 ………… 70Smith…4500 Database Service Provider (DSP) Data Owner Clients Upload Data and Authentication Data Send Data Queries and Authentication Data Queries Update Data and Authentication Data Query Results Including Data and Authentication Data

Computer Science System Model  Assumptions o DSPs are not fully trusted by data owners and clients o The data owner has a public/private key pair, and public key is known to all o The data owner is the only party who can update data o Public communications are through a secure channel  Attacks from a DSP o Return incorrect data by tampering some data o Return incomplete data result by discarding some data o Report that data doesn’t exist or return old data 4

Computer Science System Model cont’d  Goal o Provide integrity assurance for outsourced databases without DBMS modification  Design Goals o Security (Integrity) Correctness, completeness, freshness o Practicability Simplicity, flexibility, efficiency 5

Computer Science Running Example 6 idcol 1 col 2 col 3 col 4 …col n 0AliceF20NC… BenM30NY… CaryF42CA… LisaF15CA… KateF18NY… MikeM24SC… NancyF36VA… SmithM12TA…4500 A Relational Data Table

Computer Science System Design  Authentication Data Structure  Identify Authentication Data  Store Authentication Data  Extract Authentication Data 7

Computer Science Authentication Data Structure 8 idcol 1 …col n 0Alice… Ben… Cary… Lisa… Kate… Mike… Nancy… Smith… Data Table Merkle B-tree p i k i ……h i =H(h 1 |…|h f )  Signature Aggregation based ADS  Merkle Hash Tree based ADS

Computer Science Identify Authentication Data 9  Existing Approaches o Adjacency list Multiple steps to find ancestor, no order of pointers or records in a node o Path enumeration No order of pointers or records in a node, inefficient string operation o Nested set Require joining two tables to find parent node, hard to find siblings o Closure table Consume more storage, no order of pointers or records in a node  Our Approach o Radix-Path Identifier Combine Radix-based labeling and Dewey labeling

Computer Science Identify Authentication Data Radix-Path Identifier

Computer Science Identify Authentication Data Radix-Path Identifier (r b = 4)

Computer Science Identify Authentication Data 12  Radix-Path Identifier Properties 1.Identifiers in a node are continuous, but not continuous between sibling nodes 2. = 3.Min = Max = 4.Easy to find the index in a node, which is

Computer Science Store Authentication Data 13  SAT: Single Authentication Table idrpidhashlevel 0TvJtus2 201asdwS2 402DFsQ2 0Kjdaw1 101Ujrw1 4JHds1 305iueDs1 8Jdiw dkaw1 idrpidhashlevel 6010Udew1 00nudg0 104Q9ej0 2016wVi kidDs0 4032Kdie* dFes0 6040Iurw0 7041KJdw0 data_auth (max level - 2)

Computer Science Store Authentication Data 14  SAT: Single Authentication Table o Pros Simple and straightforward o Cons Index is built based on all records (inefficient queries) Updates could be inefficient Concurrent updates may cost more resources

Computer Science Store Authentication Data 15  LBAT: Level-based Authentication Table idrpidhash 0Kjdaw 101Ujrw 4JHds 305iueDs 8Jdiw. 509.dkaw 6010Udew idcol 1 …col n rpidhash 0Alice…10000nudg 10Ben…20004Q9ej 20Cary…150016wVi2 30Lisa…300020kidDs 40Kate…230032Kdie* 50Mike… dFes 60Nancy…230040Iurw 70Smith…450041KJdw idrpidhash 0TvJtus 201asdwS 402DFsQ data_auth0 (Level 2) data_auth1 (Level 1)data (Level 0) leveltable 2data_auth0 1data_auth1 0data data_mapping

Computer Science Store Authentication Data 16  LBAT: Level-based Authentication Table o Pros Indexes are more efficient Updates could be more efficient (root split) Enable concurrent updates with table level lock o Cons Multiple authentication tables Relatively complicated for authentication data generation

Computer Science Extract Authentication Data 17 Retrieval of Authentication Data Authentication Data for 50

Computer Science Extract Authentication Data 18  SingleJoin idrpidhash 0Kjdaw 101Ujrw 4JHds 305iueDs 8Jdiw. 509.dkaw 6010Udew idcol 1 …col n rpidhash 0Alice…10000nudg 10Ben…20004Q9ej 20Cary…150016wVi2 30Lisa…300020kidDs 40Kate…230032Kdie* 50Mike… dFes 60Nancy…230040Iurw 70Smith…450041KJdw idrpidhash 0TvJtus 201asdwS 402DFsQ data_auth0 (Level 0) data_auth1 (Level 1)data (Level 2) select l1.rpid,l1.hash from data t0 left join data l1 on l1.rpid/4 = t0.rpid/(4) where t0.id=50; select l1.rpid,l1.hash from data t0 left join data_auth1 l1 on l1.rpid/4 = t0.rpid/(4*4) where t0.id=50; select l1.rpid,l1.hash from data t0 left join data_auth0 l1 on l1.rpid/4 = t0.rpid/(4*4*4) where t0.id=50;

Computer Science Extract Authentication Data 19  RangeCondition -- find the rpid of the data record with the id 50 AS int; top 1 rpid from data where id=50); -- level 2, 1, 0 (from leaf level to root level) select rpid,hash from data where and select rpid,hash from data_auth1 where and select rpid,hash from data_auth0 where and

Computer Science Extract Authentication Data 20  SingleJoin  RangeCondition

Computer Science Data Operations 21  Select o Unique Select o Range Select  Update o Single Record Update o Batch Update and Optimization  Insert o Single Record Insert o Batch Insert and Optimization

Computer Science Range Select 22  Steps o Find two boundaries o Retrieve authentication data for two boundaries Retrieve Authentication Data for Range Select from 15 to Query RangeLeft Boundary Right Boundary

Computer Science Range Select 23  Steps o Find two boundaries o Retrieve authentication data for two boundaries Retrieve Authentication Data for Range Select from 15 to 45

Computer Science Single Record Update 24  Steps o Retrieve authentication data o Generate update statements o h updates, (h – tree height) o Execute updates in one transaction VO for 20VO update for 20 Update 20

Computer Science Batch Update and Optimization 25  Update x records o x * h update statements o Some of them update the same authentication data record # of updates is 4 * 3 = 12 Actually, 8 updates are necessary

Computer Science Batch Update and Optimization 26  Optimization – MergeUpdate o Track all updates to each table o Find the set of updates to one authentication data record o Keep the latest one and remove others

Computer Science Experimental Evaluation  System Implementation o Implementation based on.NET and SQL Server o Merkle B-tree based on.NET o MultiJoin, SingleJoin, ZeroJoin and RangeCondition o Query rewrite algorithm o A tool to generate authentication data  Experiment Setup o A synthetic database containing one table with 100,000 records o Each record is about 1KB o.NET3.5, SQL Server 2008 R2, Windows OS o Client network with 30Mbps download and 4Mbps upload 27

Computer Science Experiments – Unique Select  Setup o Run a select statement based on the primary key to retrieve one data record o Compute the overhead of our scheme when using different authentication retrieval methods 28  Results 1.RangeCondition is much more efficient than others 2.The overhead of our scheme could be as low as 5%

Computer Science Experiments – Update  Setup o Execute batch updates for different number of rows based on different cases D – Direct Update, C – Cached Update, RC – RangeCondition, MU – MergeUpdate o Compute the overhead of our scheme 29  Results 1.The overhead of Direct Update could go up to 200% 2.The overhead of C-RC ranges from 3% to 30% 3.The overhead of C-RC-MU is about 3% 4.The MergeUpdate effectively reduce the number of update statements

Computer Science Experiments – Scalability  Setup o Run a range query to select 512 rows as # of rows changes in the data table o Record time spent to complete the select query 30  Results 1.The overhead of our scheme is about 2 ~ 3% in all cases

Computer Science Experiments – Comparison  Setup o Run queries to retrieve authentication data as # of rows in table increases for three schemes: our scheme, OPEN-XML and DT-XML o Record time spent to retrieve authentication data 31  Results 1.Our scheme takes about 250ms in all cases 2.Our scheme is about 100 times faster than two XML-based schemes

Computer Science Related Work  Authentication Data Structure o Verifiable B-tree [Pang et al. ICDE 2004] o Embedded Merkle B-tree [Hakan et al. SIGMOD 2006] o Signature aggregation and chaining [Narasimha et al. DASFFA 2006]  Others o Protect data privacy [Hakan et al. SIGMOD 2002] o Only handle read-only queries [Sion et al.VLDB 2005] o Authenticated join processing [Yin et al.SIGMOD 2009] o Partially materialized digest scheme [Mouratidis et al.VLDB 2009]  Similar to our work o Probabilistic approaches [Xie et al.VLDB 2007] o Tamper-Evident Database System [Gerome et al. ASIAN 2005] o Authenticated Relational Tables [Giuseppe et al. DBSec 2007] 32

Computer Science Conclusion  Contributions o Proposed a novel approach called Radix-Path Identifier to serialize/de- serialize MHT-based authentication data o Explored the efficiency of different methods to retrieve authentication data, and optimize the update processing o Build a proof-of-concept prototype and conduct extensive experiments to evaluate the performance overhead and efficiency 33

Computer Science Thank youQuestions? 34