Computer Science Integrity Assurance for Outsourced Databases without DBMS Modification DBSec 2014 Wei Wei, Ting Yu 1
Computer Science Overview Motivation o Database outsourcing is a cost-effective solution o Integrity for Outsourced Databases It has been an active research area in decades Existing solutions requires modifying DBMSs No existing cloud database services support integrity checking Our Focus o Provide integrity assurance for outsourced databases without DBMS modification Basic Idea o Build a Merkle Hash Tree based Authenticated Data Structure per table o De-serialize authentication data into tables with well-designed format Support highly efficient authentication data retrieval 2
Computer Science Database Outsourcing Model 3 idcol 1 …col n 0Alice… Ben…2000 ………… 70Smith…4500 Database Service Provider (DSP) Data Owner Clients Upload Data and Authentication Data Send Data Queries and Authentication Data Queries Update Data and Authentication Data Query Results Including Data and Authentication Data
Computer Science System Model Assumptions o DSPs are not fully trusted by data owners and clients o The data owner has a public/private key pair, and public key is known to all o The data owner is the only party who can update data o Public communications are through a secure channel Attacks from a DSP o Return incorrect data by tampering some data o Return incomplete data result by discarding some data o Report that data doesn’t exist or return old data 4
Computer Science System Model cont’d Goal o Provide integrity assurance for outsourced databases without DBMS modification Design Goals o Security (Integrity) Correctness, completeness, freshness o Practicability Simplicity, flexibility, efficiency 5
Computer Science Running Example 6 idcol 1 col 2 col 3 col 4 …col n 0AliceF20NC… BenM30NY… CaryF42CA… LisaF15CA… KateF18NY… MikeM24SC… NancyF36VA… SmithM12TA…4500 A Relational Data Table
Computer Science System Design Authentication Data Structure Identify Authentication Data Store Authentication Data Extract Authentication Data 7
Computer Science Authentication Data Structure 8 idcol 1 …col n 0Alice… Ben… Cary… Lisa… Kate… Mike… Nancy… Smith… Data Table Merkle B-tree p i k i ……h i =H(h 1 |…|h f ) Signature Aggregation based ADS Merkle Hash Tree based ADS
Computer Science Identify Authentication Data 9 Existing Approaches o Adjacency list Multiple steps to find ancestor, no order of pointers or records in a node o Path enumeration No order of pointers or records in a node, inefficient string operation o Nested set Require joining two tables to find parent node, hard to find siblings o Closure table Consume more storage, no order of pointers or records in a node Our Approach o Radix-Path Identifier Combine Radix-based labeling and Dewey labeling
Computer Science Identify Authentication Data Radix-Path Identifier
Computer Science Identify Authentication Data Radix-Path Identifier (r b = 4)
Computer Science Identify Authentication Data 12 Radix-Path Identifier Properties 1.Identifiers in a node are continuous, but not continuous between sibling nodes 2. = 3.Min = Max = 4.Easy to find the index in a node, which is
Computer Science Store Authentication Data 13 SAT: Single Authentication Table idrpidhashlevel 0TvJtus2 201asdwS2 402DFsQ2 0Kjdaw1 101Ujrw1 4JHds1 305iueDs1 8Jdiw dkaw1 idrpidhashlevel 6010Udew1 00nudg0 104Q9ej0 2016wVi kidDs0 4032Kdie* dFes0 6040Iurw0 7041KJdw0 data_auth (max level - 2)
Computer Science Store Authentication Data 14 SAT: Single Authentication Table o Pros Simple and straightforward o Cons Index is built based on all records (inefficient queries) Updates could be inefficient Concurrent updates may cost more resources
Computer Science Store Authentication Data 15 LBAT: Level-based Authentication Table idrpidhash 0Kjdaw 101Ujrw 4JHds 305iueDs 8Jdiw. 509.dkaw 6010Udew idcol 1 …col n rpidhash 0Alice…10000nudg 10Ben…20004Q9ej 20Cary…150016wVi2 30Lisa…300020kidDs 40Kate…230032Kdie* 50Mike… dFes 60Nancy…230040Iurw 70Smith…450041KJdw idrpidhash 0TvJtus 201asdwS 402DFsQ data_auth0 (Level 2) data_auth1 (Level 1)data (Level 0) leveltable 2data_auth0 1data_auth1 0data data_mapping
Computer Science Store Authentication Data 16 LBAT: Level-based Authentication Table o Pros Indexes are more efficient Updates could be more efficient (root split) Enable concurrent updates with table level lock o Cons Multiple authentication tables Relatively complicated for authentication data generation
Computer Science Extract Authentication Data 17 Retrieval of Authentication Data Authentication Data for 50
Computer Science Extract Authentication Data 18 SingleJoin idrpidhash 0Kjdaw 101Ujrw 4JHds 305iueDs 8Jdiw. 509.dkaw 6010Udew idcol 1 …col n rpidhash 0Alice…10000nudg 10Ben…20004Q9ej 20Cary…150016wVi2 30Lisa…300020kidDs 40Kate…230032Kdie* 50Mike… dFes 60Nancy…230040Iurw 70Smith…450041KJdw idrpidhash 0TvJtus 201asdwS 402DFsQ data_auth0 (Level 0) data_auth1 (Level 1)data (Level 2) select l1.rpid,l1.hash from data t0 left join data l1 on l1.rpid/4 = t0.rpid/(4) where t0.id=50; select l1.rpid,l1.hash from data t0 left join data_auth1 l1 on l1.rpid/4 = t0.rpid/(4*4) where t0.id=50; select l1.rpid,l1.hash from data t0 left join data_auth0 l1 on l1.rpid/4 = t0.rpid/(4*4*4) where t0.id=50;
Computer Science Extract Authentication Data 19 RangeCondition -- find the rpid of the data record with the id 50 AS int; top 1 rpid from data where id=50); -- level 2, 1, 0 (from leaf level to root level) select rpid,hash from data where and select rpid,hash from data_auth1 where and select rpid,hash from data_auth0 where and
Computer Science Extract Authentication Data 20 SingleJoin RangeCondition
Computer Science Data Operations 21 Select o Unique Select o Range Select Update o Single Record Update o Batch Update and Optimization Insert o Single Record Insert o Batch Insert and Optimization
Computer Science Range Select 22 Steps o Find two boundaries o Retrieve authentication data for two boundaries Retrieve Authentication Data for Range Select from 15 to Query RangeLeft Boundary Right Boundary
Computer Science Range Select 23 Steps o Find two boundaries o Retrieve authentication data for two boundaries Retrieve Authentication Data for Range Select from 15 to 45
Computer Science Single Record Update 24 Steps o Retrieve authentication data o Generate update statements o h updates, (h – tree height) o Execute updates in one transaction VO for 20VO update for 20 Update 20
Computer Science Batch Update and Optimization 25 Update x records o x * h update statements o Some of them update the same authentication data record # of updates is 4 * 3 = 12 Actually, 8 updates are necessary
Computer Science Batch Update and Optimization 26 Optimization – MergeUpdate o Track all updates to each table o Find the set of updates to one authentication data record o Keep the latest one and remove others
Computer Science Experimental Evaluation System Implementation o Implementation based on.NET and SQL Server o Merkle B-tree based on.NET o MultiJoin, SingleJoin, ZeroJoin and RangeCondition o Query rewrite algorithm o A tool to generate authentication data Experiment Setup o A synthetic database containing one table with 100,000 records o Each record is about 1KB o.NET3.5, SQL Server 2008 R2, Windows OS o Client network with 30Mbps download and 4Mbps upload 27
Computer Science Experiments – Unique Select Setup o Run a select statement based on the primary key to retrieve one data record o Compute the overhead of our scheme when using different authentication retrieval methods 28 Results 1.RangeCondition is much more efficient than others 2.The overhead of our scheme could be as low as 5%
Computer Science Experiments – Update Setup o Execute batch updates for different number of rows based on different cases D – Direct Update, C – Cached Update, RC – RangeCondition, MU – MergeUpdate o Compute the overhead of our scheme 29 Results 1.The overhead of Direct Update could go up to 200% 2.The overhead of C-RC ranges from 3% to 30% 3.The overhead of C-RC-MU is about 3% 4.The MergeUpdate effectively reduce the number of update statements
Computer Science Experiments – Scalability Setup o Run a range query to select 512 rows as # of rows changes in the data table o Record time spent to complete the select query 30 Results 1.The overhead of our scheme is about 2 ~ 3% in all cases
Computer Science Experiments – Comparison Setup o Run queries to retrieve authentication data as # of rows in table increases for three schemes: our scheme, OPEN-XML and DT-XML o Record time spent to retrieve authentication data 31 Results 1.Our scheme takes about 250ms in all cases 2.Our scheme is about 100 times faster than two XML-based schemes
Computer Science Related Work Authentication Data Structure o Verifiable B-tree [Pang et al. ICDE 2004] o Embedded Merkle B-tree [Hakan et al. SIGMOD 2006] o Signature aggregation and chaining [Narasimha et al. DASFFA 2006] Others o Protect data privacy [Hakan et al. SIGMOD 2002] o Only handle read-only queries [Sion et al.VLDB 2005] o Authenticated join processing [Yin et al.SIGMOD 2009] o Partially materialized digest scheme [Mouratidis et al.VLDB 2009] Similar to our work o Probabilistic approaches [Xie et al.VLDB 2007] o Tamper-Evident Database System [Gerome et al. ASIAN 2005] o Authenticated Relational Tables [Giuseppe et al. DBSec 2007] 32
Computer Science Conclusion Contributions o Proposed a novel approach called Radix-Path Identifier to serialize/de- serialize MHT-based authentication data o Explored the efficiency of different methods to retrieve authentication data, and optimize the update processing o Build a proof-of-concept prototype and conduct extensive experiments to evaluate the performance overhead and efficiency 33
Computer Science Thank youQuestions? 34