Authentic Publication The TRUTHSAYER Project Chip Martel Premkumar Devanbu Michael Gertz April Kwong Glen Nuckolls Stuart Stubblebine Department of Computer.

Slides:



Advertisements
Similar presentations
CSC 774 Advanced Network Security
Advertisements

Hash Function. What are hash functions? Just a method of compressing strings – E.g., H : {0,1}*  {0,1} 160 – Input is called “message”, output is “digest”
COL 106 Shweta Agrawal and Amit Kumar
PROVENANCE FOR THE CLOUD (USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES(FAST `10)) Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard.
CSC 774 Advanced Network Security
Using Multi-Encryption to Provide Secure and Controlled Access to XML Documents Tomasz Müldner, Jodrey School of Computer Science, Acadia University, Wolfville,
AVL Trees COL 106 Amit Kumar Shweta Agrawal Slide Courtesy : Douglas Wilhelm Harder, MMath, UWaterloo
The Efficiency of Algorithms
Digital Signatures and Hash Functions. Digital Signatures.
2P13 Week 11. A+ Guide to Managing and Maintaining your PC, 6e2 RAID Controllers Redundant Array of Independent (or Inexpensive) Disks Level 0 -- Striped.
Advanced Databases: Lecture 2 Query Optimization (I) 1 Query Optimization (introduction to query processing) Advanced Databases By Dr. Akhtar Ali.
Provenance in Open Distributed Information Systems Syed Imran Jami PhD Candidate FAST-NU.
1 Suffix Trees and Suffix Arrays Modern Information Retrieval by R. Baeza-Yates and B. Ribeiro-Neto Addison-Wesley, (Chapter 8)
Modern Information Retrieval Chapter 8 Indexing and Searching.
Fractional Cascading CSE What is Fractional Cascading anyway? An efficient strategy for dealing with iterative searches that achieves optimal.
SIA: Secure Information Aggregation in Sensor Networks Bartosz Przydatek, Dawn Song, Adrian Perrig Carnegie Mellon University Carl Hartung CSCI 7143: Secure.
1 Networking through Linux Partha Sarathi Dasgupta MIS Group Indian Institute of Management Calcutta.
Mar 12, 2002Mårten Trolin1 This lecture Diffie-Hellman key agreement Authentication Certificates Certificate Authorities SSL/TLS.
Multiversion Access Methods - Temporal Indexing. Basics A data structure is called : Ephemeral: updates create a new version and the old version cannot.
Modern Information Retrieval
BTrees & Bitmap Indexes
Dept. of Computer Science & Engineering, CUHK1 Trust- and Clustering-Based Authentication Services in Mobile Ad Hoc Networks Edith Ngai and Michael R.
Temporal Databases. Outline Spatial Databases Indexing, Query processing Temporal Databases Spatio-temporal ….
1 CS 430: Information Discovery Lecture 4 Data Structures for Information Retrieval.
An Authentication Service Against Dishonest Users in Mobile Ad Hoc Networks Edith Ngai, Michael R. Lyu, and Roland T. Chin IEEE Aerospace Conference, Big.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Privacy and Integrity Preserving in Distributed Systems Presented for Ph.D. Qualifying Examination Fei Chen Michigan State University August 25 th, 2009.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
DSAC (Digital Signature Aggregation and Chaining) Digital Signature Aggregation & Chaining An approach to ensure integrity of outsourced databases.
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
Freenet A Distributed Anonymous Information Storage and Retrieval System I Clarke O Sandberg I Clarke O Sandberg B WileyT W Hong.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
DSAC (Digital Signature Aggregation and Chaining) Digital Signature Aggregation & Chaining An approach to ensure integrity of outsourced databases.
1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
Efficient Consistency Proofs for Generalized Queries on a Committed Database R. Ostrovsky C. Rackoff A. Smith UCLA Toronto.
Maps A map is an object that maps keys to values Each key can map to at most one value, and a map cannot contain duplicate keys KeyValue Map Examples Dictionaries:
Yin Yang, Dimitris Papadias, Stavros Papadopoulos HKUST, Hong Kong Panos Kalnis KAUST, Saudi Arabia Providence, USA, 2009.
Cong Wang1, Qian Wang1, Kui Ren1 and Wenjing Lou2
Construction of efficient PDP scheme for Distributed Cloud Storage. By Manognya Reddy Kondam.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
Selective and Authentic Third-Party distribution of XML Documents - Yashaswini Harsha Kumar - Netaji Mandava (Oct 16 th 2006)
Chapter Tow Search Trees BY HUSSEIN SALIM QASIM WESAM HRBI FADHEEL CS 6310 ADVANCE DATA STRUCTURE AND ALGORITHM DR. ELISE DE DONCKER 1.
1 Hash Tables  a hash table is an array of size Tsize  has index positions 0.. Tsize-1  two types of hash tables  open hash table  array element type.
UNC Chapel Hill M. C. Lin Point Location Reading: Chapter 6 of the Textbook Driving Applications –Knowing Where You Are in GIS Related Applications –Triangulation.
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
Searching: Binary Trees and Hash Tables CHAPTER 12 6/4/15 Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education,
1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.
Implementing EFECT Easy Fast Efficient Certification Technique Ivan Nestlerode Bell Labs Lucent Technologies Based on EFECT paper by: Phil MacKenzie, Bell.
On the Communication Complexity of SFE with Long Output Daniel Wichs (Northeastern) joint work with Pavel Hubáček.
This document is for academic purposes only. © 2012 Department of Computer Science, Hong Kong Baptist University. All rights reserved. 1 Authenticating.
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
Matej Bel University Cascaded signatures Ladislav Huraj Department of Computer Science Faculty of Natural Sciences Matthias Bel University Banska Bystrica.
Merkle trees Introduced by Ralph Merkle, 1979 An authentication scheme
1 CS 430: Information Discovery Lecture 4 Files Structures for Inverted Files.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Temporal Databases. Outline Spatial Databases Indexing, Query processing Temporal Databases Spatio-temporal ….
CS 152: Programming Language Paradigms April 7 Class Meeting Department of Computer Science San Jose State University Spring 2014 Instructor: Ron Mak
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Sorting With Priority Queue In-place Extra O(N) space
Authenticated Join Processing in Outsourced Databases
Dynamic Authenticated Index Structures for Outsourced Databases
CS/ECE 478 Introduction to Network Security
Data Integrity: Applications of Cryptographic Hash Functions
Temporal Databases.
Combinatorial Optimization of Multicast Key Management
Ensuring Correctness over Untrusted Private Database
Presentation transcript:

Authentic Publication The TRUTHSAYER Project Chip Martel Premkumar Devanbu Michael Gertz April Kwong Glen Nuckolls Stuart Stubblebine Department of Computer Science, University of California, Davis

Databases Play a Vital Role 1)Commerce: credit card data, find goods 2)Financial: Investment sites 3)Health: treatments, doctors/credentials, drugs 4)Many more

Answering queries Data Query Answers Server Integrity? Correct Query processing? Performance? Reliability? Database User

Goals Correct and complete answers (with assurance) Efficient Protocols

Example Queries Is Credit card number 5543… Valid? List all Hong Kong to San Francisco flights. Find Digital cameras with 3-5 Mega-pixels, and cost < $200 List all bars within one mile of HKU

What is a Correct Answer? We assume a trusted Data Owner with the official copy of the Database: Defines the “correct answer”

What is a Correct Answer? We assume a trusted Data Owner with the official copy of the Database: Defines the “correct answer” Problems with a single Data Owner: 1) May not want/be able to answer queries 2) Hard to keep online DB secure 3) Scalability

Solution: Third-Party Servers Third party sites (Publishers) get information from the Data Owner and answer queries Example: Travel sites (Expedia, Travelocity, Orbitz) answer using government airline Data (FAA)

Server Replication Can I Trust This Server? FAA Orbitz Data Expedia Travelocity

Trust Issues Sites have left out cheaper flights from non-preferred airlines (deliberate) Sites may be corrupted: outside hacker or insider Errors

Authentic Publication: The TRUTHSAYER project. Data + Digest of Data Query Answer + Verification Object Initially: for RDB (DBSEC 2000, Jnl. Comp. Sec.) General Model for a Variety of Data (Algorithmica, 2004) Owner Publisher

Talk Outline Introduction Background--- Merkle Trees Range Queries (Multi-attribute Queries) A General Model for Authenticated Data Structures Conclusion

Authentic Publication 1)A trusted Owner digests the Data Set, and signs it. 2)Untrusted Publishers receive the data & signature. 3)Clients submit queries to untrusted Publishers. 4)Publishers return Answers (A), and Verification Objects (A+ VO) 5)Clients use A + VO to Prove the answer is correct/complete. Protocol is correct, and secure.

Verifying answers Protocol provides: Correctness: Returns exact elements matching the query. Completeness: Returns all elements matching query. Security: Cheating is infeasible. Efficiency: Overhead is low. Recall: No signatures!!

Merkle hashing a data set. Leaves: data in some lexical order. One way hash function h; h 1 = h(d 1 ) Bottom-up hashing, starting with data Root hash value = the digest of the data set. h(d 1 )

Merkle Trees Classic use: prove that data value d is in the data set Solves: Is Credit card number 5543… Valid? But also can verify all items in a range: e.g. camcorders from $400 to $900

Verifying a Range To Show that q =(5,6,8) is the Answer to 4<d <10: q Used Lower Bound 3, Upper Bound 10 and starred hash values to compute/verify root hash.

Verifying a Range Query: 4<d <10: Answer: 5,6,8 (in practice, key + data) q Verification Object: [( (h(1),3), (5,6) ) ( (8,10), *) ]

Authentic Publication Merkle Tree Hash Digest

Security Property If the Answer and VO are correct, user accepts

Security Property User accepts an Invalid answer only if a specific collision in h is found (provable): h(x,y)= z in a correct VO (x,y, z are the hash values of tree nodes), VO uses different x’, y’ with h(x’,y’)=z

Good Features Proofs are short (size proportional to tree height and answer size). Use hashes, a fast cryptographic operation Proofs as easy to compute as finding the answer No secret keys: hash function and digests all are public (no insider attack once data set is digested).

Extensions Want to handle more complex queries Find Digital cameras with 3-5 Mega pixels, and cost < $200 List all bars within one mile of HKU

Multi-Attribute Queries Model as a 2-D Range query Find points (x,y) with a < x < b  c < y < d ( a,d ) ( b,d ) ( a,c ) (b,c) Cost Pixels

2-Dimensional range tree Leaves are 2D points, or 2 attributes (cost, pixels). Sorted by x-value in X-tree A Y-tree for each internal node

Searching a 2D-range Tree Find (x,y) with 4 < x <50 AND 4 < y < 10 All in Associated Y-trees Match x-range

Searching a 2D-range Tree Find pairs (x,y) with 4 < x <50 AND 4 < y < 10 In X-tree: subtrees rooted at 5 and 13 Search in Associated Y-trees

Searching a 2D-range Tree Find (x,y) with 4 < x <50 AND 4 < y < 10 Answer: (12,5) and (23,8) AND values in 5 ’ s Y-tree

Digesting a 2D-range Tree Digest each Y-tree as Merkle tree Each internal node in the X-tree gets the hash of three values: two children and associated Y-tree value

Range Trees Let k be the number of answers (out of n) Search: O(k+ log 2 n) time, nlogn space improve to O(k+ logn) time with extra pointers (can still get a hash digest) VO (proof) size also O(k+logn) Extend to d-dimensions (d-attribute query). Search time: O(k+log (d-1) n), VO size: same.

Authenticated Data Structures Problem: May want to use a variety of efficient data-structures:  B-trees (reduce disk access)  Suffix arrays (string queries)  Geometric data structures (items within one mile)  Many more

Authenticated Data Structures Solution: General method to digest a data structure (produce a single summary hash value). Efficient: Proof size and construction time = search time. Secure: Similar security property: break only with a specific collision in h

Search DAGS Our general setting is any data structure modeled by:  A labeled Directed Acyclic Graph (DAG)  A search process that visits DAG nodes and determines which neighboring nodes to visit next (based on labels of visited nodes) This Models a wide range of structures

A Search DAG Search starts at the unique source node s of in-degree zero Digesting starts from the sinks (here u, v ): hash the associated values s a c b v u

A Search DAG D(u): Digest of u Node u data : d u D(u)= h(d u ) D(v)= h(d v ) s a c b v u

A Search DAG Other Digests use data and successors D(c) = h(d c, D(v) ) D(b)=h(d b,D(v),D(c)) D(s) is DAG Digest s a c b v u

Verification for Search DAG Traditional Merkle Tree verification is Bottom up (hash path values to root) We use top down verification to simulate a correct search Owner provides search procedure P and root digest D(s)

Authentic Publication DAG, P D(s), P

Verification Object for DAG VO: information so User can reproduce the search (and thus verify answers) “Lines” of VO match steps of P: Data of a node and successor hashes  d s, D(v 1 ), D(v 2 ) … (successors of s)  d v 1, D(u 1 ), D(u 2 ), … (successors of v 1 )

An Example Search Starts at s, then visits b then v VO:  d s, D(a), D(b), D(c) (line 1) D(s) = h(d s, D(a), D(b), D(c)) So know data d s is OK. s a c b v u

An Example Search Starts at s, process d s and decide b is next VO:  d s, D(a), D(b), D(c) [line 1]  d b, D(v), D(c)[line 2] If D(b)=h(d b,D(v),D(c)) (using D(b) from line 1)  Data d b is correct s a c b v u

Verified Search The verified computation proceeds until all nodes in the actual search are visited (the VO has one line for each node visited). The correct answer is now returned by search procedure P.

Verified Search The verified computation takes time proportional to the original search (visits the same nodes). Security Proof: shows that a User accepts the wrong answer only if a specific collision in hash function h used (e.g. D(b)=h(d’ b,D’(v),D’(c))

Updates Typically Digests are updated with work similar to the data structure’s update time (e.g. length of the search paths to updated items) If updates are frequent, overall scheme doesn’t work well (can use time-stamped digests)

Generalizations Allowing multiple Owners: often want to query data collected from several owners. Can be done, but now need to trust owners and data collector. Privacy: VO’s may reveal information about about the data set. Methods to conceal extra data.

Generalizations I/O efficient digests/VO’s: can use a multi- way tree to store multiple values in one disk block (still logically a binary tree for VO purposes, but stored more efficiently). Top-down search DAG approach may be improved for specific data-structures (e.g. 2D range trees)

Generalizations Collections of structured data: XML documents (can answer path queries) Relational operations (Joins, Selection, Projection) Fancier Crypto operations (to reduce VO size)

References P. Devanbu, M. Gertz, C. Martel, and S. G. Stubblebine. Authentic Third Party Data Publication, 14th IFIP 11.3 Working Conf. in DB Security (DBSec 2000), Original Authentic Publication Paper A General Model for Authenticated Data Structures, Algorithmica, 2004 Many Data Structures and Search DAG ( above group and G. Nuckolls)

References Certifying Data from Multiple Sources, Proceedings of the 17th Database Security Conference, 2003 Shows how to use multiple Owners Flexible authentication of XML documents, Journal Computer Security, 2004

Survey Chapters Li, Hadjieleftheriou, Kollios, Reyzin Authenticated Index Structures for Outsourced Databases(Overview of area and efficiency issues) R. Sion: Towards Secure Data Outsourcing Both in: Michael Gertz and Sushil Jajodia (eds.): "Handbook of Database Security: Applications and Trends", Springer, 2007, to appear.

A.Anagnostopoulos, M. Goodrich, R. Tamassia, Persistent Authenticated Dictionaries and Their Applications (allows queries of prior DB versions) Authenticated Data Structures for Graph and Geometric Searching (fancy geometric data structures)

Pointer for more information

Conclusion A single signed Digest, can authenticate answers to many queries Secure against hackers and insiders Can handle a wide range of data structures Efficient protocols: fast query processing and small VO’s

Future Work Better Update Mechanisms Integration of Database optimization methods Actual implementation (partly done by others), and evaluation