Lecturer: Moni Naor Foundations of Privacy Informal Lecture Anti-Persistence or History Independent Data Structures.

Slides:



Advertisements
Similar presentations
Lower and Upper Bounds on Obtaining History Independence
Advertisements

Anti-Persistence or History Independent Data Structures Moni Naor Vanessa Teague Weizmann Institute Stanford.
Space-for-Time Tradeoffs
Weizmann Institute of Science Israel Deterministic History-Independent Strategies for Storing Information on Write-Once Memories Tal Moran Moni Naor Gil.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Hash-Based Indexes The slides for this text are organized into chapters. This lecture covers Chapter 10. Chapter 1: Introduction to Database Systems Chapter.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11.
Lecture 11 oct 6 Goals: hashing hash functions chaining closed hashing application of hashing.
Chapter 11 (3 rd Edition) Hash-Based Indexes Xuemin COMP9315: Database Systems Implementation.
Data Structures Using C++ 2E
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Foundations of Cryptography Lecture 4 Lecturer: Moni Naor.
Hashing21 Hashing II: The leftovers. hashing22 Hash functions Choice of hash function can be important factor in reducing the likelihood of collisions.
E.G.M. PetrakisHashing1  Data organization in main memory or disk  sequential, binary trees, …  The location of a key depends on other keys => unnecessary.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
Lower and Upper Bounds on Obtaining History Independence Niv Buchbinder and Erez Petrank Technion, Israel.
REPRESENTING SETS CSC 172 SPRING 2002 LECTURE 21.
Hashing (Ch. 14) Goal: to implement a symbol table or dictionary (insert, delete, search)  What if you don’t need ordered keys--pred, succ, sort, select?
Tirgul 8 Universal Hashing Remarks on Programming Exercise 1 Solution to question 2 in theoretical homework 2.
History Independent Data-Structures. What is History Independent Data-Structure ? Sometimes data structures keep unnecessary information. –not accessible.
Tirgul 9 Hash Tables (continued) Reminder Examples.
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
Tirgul 8 Hash Tables (continued) Reminder Examples.
Lecture 10: Search Structures and Hashing
Lecture 11 oct 7 Goals: hashing hash functions chaining closed hashing application of hashing.
Hashing General idea: Get a large array
E.G.M. PetrakisHashing1 Hashing on the Disk  Keys are stored in “disk pages” (“buckets”)  several records fit within one page  Retrieval:  find address.
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (excerpts) Advanced Implementation of Tables CS102 Sections 51 and 52 Marc Smith and.
L. Grewe. Computing hash function for a string Horner’s rule: (( … (a 0 x + a 1 ) x + a 2 ) x + … + a n-2 )x + a n-1 ) int hash( const string & key )
History-Independent Cuckoo Hashing Weizmann Institute Israel Udi WiederMoni NaorGil Segev Microsoft Research Silicon Valley.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Data Structures Hashing Uri Zwick January 2014.
ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.
Weizmann Institute of Science Israel Deterministic History-Independent Strategies for Storing Information on Write-Once Memories Tal Moran Moni Naor Gil.
Spring 2015 Lecture 6: Hash Tables
Problems and MotivationsOur ResultsTechnical Contributions Membership: Maintain a set S in the universe U with |S| ≤ n. Given an x in U, answer whether.
Foundations of Privacy Lecture 9: History-Independent Hashing Schemes (and applications) Lecturer: Gil Segev.
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Algorithm Course Dr. Aref Rashad February Algorithms Course..... Dr. Aref Rashad Part: 4 Search Algorithms.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (continued) Advanced Implementation of Tables.
1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
Data Structures and Algorithms (AT70. 02) Comp. Sc. and Inf. Mgmt
Lecture 12COMPSCI.220.FS.T Symbol Table and Hashing A ( symbol) table is a set of table entries, ( K,V) Each entry contains: –a unique key, K,
Data Structures and Algorithms Lecture (Searching) Instructor: Quratulain Date: 4 and 8 December, 2009 Faculty of Computer Science, IBA.
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Hashing 8 April Example Consider a situation where we want to make a list of records for students currently doing the BSU CS degree, with each.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Ihab Mohammed and Safaa Alwajidi. Introduction Hash tables are dictionary structure that store objects with keys and provide very fast access. Hash table.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hashing 1 Hashing. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
© Love Ekenberg Hashing Love Ekenberg. © Love Ekenberg In General These slides provide an overview of different hashing techniques that are used to store.
Data Structure & Algorithm Lecture 8 – Hashing JJCAO Most materials are stolen from Prof. Yoram Moses’s course.
Hashing TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA Course: Data Structures Lecturer: Haim Kaplan and Uri Zwick.
Hash table CSC317 We have elements with key and satellite data
Hashing Course: Data Structures Lecturer: Uri Zwick March 2008
Practical Database Design and Tuning
Pseudorandom number, Universal Hashing, Chaining and Linear-Probing
Hashing.
Hashing Course: Data Structures Lecturer: Uri Zwick March 2008
Presentation transcript:

Lecturer: Moni Naor Foundations of Privacy Informal Lecture Anti-Persistence or History Independent Data Structures

Why hide your history? Core dumps Losing your laptop –The entire memory representation of data structures is exposed ing files –The editing history may be exposed ( e.g. Word) Maintaining lists of people –Sports teams, party invitees

3 Election Day Carol Bob Carol Elections for class president Each student whispers in Mr. Drew’s ear Mr. Drew writes down the votes Alice Bob Alice Problem: Mr. Drew’s notebook leaks sensitive information First student voted for Carol Second student voted for Alice … Alice

Learning from history – only what’s necessary A data structure has: –A “ legitimate ” interface: the set of operations allowed to be performed on it –A memory representation The memory representation should reveal no information that cannot be obtained from the legitimate interface

History of history independence Issue dealt with in Cryptographic and Data Structures communities Micciancio (1997): history independent trees –Motivation: incremental crypto –Based on the “shape” of the data structure, not including memory representation –Stronger performance model! Uniquely represented data structures – Treaps (Seidel & Aragon), uniquely represented dictionaries –Ordered hash tables (Amble & Knuth 1974)

More History Persistent Data Structures: possible to reconstruct all previous states of the data structure (Sarnak and Tarjan) –We want the opposite: anti -persistence Oblivious RAM (Goldreich and Ostrovsky)

Overview Definitions History independent open addressing hashing History independent dynamic perfect hashing –Memory Management (Union Find) Open problems

Precise Definitions A data structure is – history independent: if any two sequences of operations S 1 and S 2 that yield the same content induce the same probability distribution on the memory representation. – strongly history independent: if given any two sets of breakpoints along S 1 and S 2 s.t. corresponding points have identical contents, S 1 and S 2 induce the same probability distributions on memory representation at those points. Alternative Definition – transition probability

Relaxations Statistical closeness Computational indistinguishability –Example where helpful: erasing Allow some information to be leaked –Total number of operations –n -history independent: identical distributions if the last n operations where identical as well Under-defined data structures: same query can yield several legitimate answers, –e.g. approximate priority queue –Define identical content: no suffix T such that set of permitted results returned by S 1  T is different from the one returned by S 2  T

History independence is easy (sort of) If it is possible to decide the (lexicographically) “ first ” sequence of operations that produce a certain content, just store the result of that This gives a history independent version of a huge class of data structures Efficiency is the problem…

Dictionaries Operations are insert(x), lookup(x) and possibly delete(x) The content of a dictionary is the set of elements currently inserted (those that have been inserted but not deleted) Elements x  U some universe Size of table/memory N

Goal Find a history independent implementation of dictionaries with good provable performance. Develop general techniques for history independence

Approaches Unique representation – e.g. array in sorted order –Yields strong history independence Secret randomness – e.g. array in random order –only history independence (not strongly)

Open addressing: traditional version Each element x has a probe sequence h 1 (x), h 2 (x), h 3 (x),... –Linear probing: h 2 (x) = h 1 (x)+1, h 3 (x) = h 1 (x)+2,... –Double hashing –Uniform hashing Element is inserted into the first free space in its probe sequence –Search ends unsuccessfully at a free space Efficient space utilization –Almost all the table can be full

y Open addressing: traditional version y x y x arrived before y, so move y No clash, so insert y Not history independent: later- inserted elements move further along in their probe sequence

History independent version At each cell i, decide elements’ priorities independently of insertion order Call the priority function p i (x,y). If there is a clash, move the element of lower priority At each cell, priorities must form a total order

Insertion x y x x p 2 (x,y) ? No, so move x x y

Search Same as in the traditional algorithm In unsuccessful search, can quit as soon as you find a lower-priority element No deletions Problematic in open addressing Possible way out - clusters

Strong history independence Claim : For all hash functions and priority functions, the final configuration of the table is independent of the order of insertion. Conclusion: Strongly history independent

Proof of history independence A static insertion algorithm (clearly history independent): x3x3 x5x5 x5x5 x3x3 x6x6 x4x4 x2x2 x1x1 x5x5 x3x3 x6x6 x4x4 x2x2 x1x1 p 1 (x 2,x 1 ) so insert x 2 p 6 (x 6,x 4 ) and p 6 (x 3,x 6 ), so insert x 3 insert x 5 x2x2 x3x3 x2x2 x5x5 x2x2 x5x5 x3x3 Gather up the rejects and restart x1x1 x6x6 x4x4 p 3 (x 4,x 5 ) and p 3 (x 4,x 6 ). Insert x 4 and remove x 5 x4x4 x4x4 x1x1 x4x4 x1x1 x5x5

Proof of history independence Nothing moves further in the static algorithm than in the dynamic one –By induction on rounds of the static alg. Vice versa –By induction on the steps in the dynamic alg. Strongly history independent Alternative view [Blelloch-Golovin]: Stable Matching

Some priority functions Global –A single priority function independent of cell Random –Choose a random order at each cell Youth-rules –Call an element “younger” if it has moved less far along its probe sequence; younger elements get higher priority

Youth-rules x y p 2 (x,y) because x has taken fewer steps than y x y y Use a tie-breaker if # of steps the same This is a priority function

Specifying a scheme Priority rule –Choice of priority functions –In Youth-rules – determined by probe sequence Probe functions –How are they chosen –Maintained –Computed

Implementing Youth-rules Let each h i be chosen from a pair-wise independent collection –For any two x and y the r.v. h i (x) and h i (y) are uniform and independent. Let h 1, h 2, h 3, … be chosen independently –Example: h i (x) = (a i ·x mod U) + b i mod N Space: two elements per function Need only log N functions Prime

Performance Analysis Based on worst-case insertion sequence The important parameter:  - the fraction of the table that is used  ·N elements Analysis of expected insertion time and search time (number of probes to the table) –Have to distinguish successful and unsuccessful search

Analysis via the Static Algorithm For insertions, the total number of probes in static and dynamic algorithm are identical –Easier to analyze the static algorithm Key point for Youth-rules : in the phase i all unsettled elements are in the i th probe of their sequence –Assures fresh randomness of h i (x)

Performance For Youth-rules, implemented as specified: For any sequence of insertion the expected probe-time for insertion is at most 1/(1-  ) For any sequence of insertions the expected probe-time for successful or unsuccessful search is at most 1/(1-  ) Analysis based on static algorithm  is the fraction of the table that is used

Comparison to double hashing Analysis of double hashing with truly random functions [Guibas & Szemeredi, Lueker & Molodowitch] Can be replaced by log n wise independent functions [Schmidt & Siegel] log n wise independent is relatively expensive: –either a lot of space or log n time Youth-rules is a simple and provably efficient scheme with very little extra storage Extra benefit of considering history independence

Other Priority Functions [Amble & Knuth] log(1/(1-  )) for global –Truly random hash functions Experiments show about log(1/(1-  )) for most priority functions tried –Performance is for amortized search

Other types of data structures Memory management (dealing with pointers) –Memory Allocation Other state-related issues

Dynamic perfect hashing: FKS scheme, dynamized x6x6 x4x4 x1x1 x2x2 x5x5 x3x3 s0s0 h0h0 h1h1 hkhk h s1s1 sksk Top-level table: O(n) space Low-level tables: O(n) space total. Each gets about s i 2 n elements to be inserted The h i are perfect on their respective sets. Rechoose h or some h i to maintain perfection and linear space.

A subtle problem: the intersection bias problem Suppose we have: –a set of states {  1,  2,...} –a set of objects {h 1, h 2,...} –a way to decide whether h i is “good” for  j. Keep a “current” h as states change Change h only if it is no longer “good”. –Choose uniformly from the “good” ones for . Then this is not history independent: –h is biased towards the intersection of those good for current  and for previous states.

Dynamized FKS is not history independent Does not erase upon deletion Uses history- dependent memory allocation Hash functions (h, h 1, h 2,...) are changed whenever they cease to be “good” –Hence they suffer from the intersection bias problem, since they are biased towards functions that were “good” for previous sets of elements –Hence they leak information about past sets of elements

Making it history independent Use history independent memory allocation Upon deletion, erase the element and rechoose the appropriate h i. This solves the low-level intersection bias problem. Some other minor changes Solve the top-level intersection bias problem...

Solving the top-level intersection bias problem Can’t afford a top-level rehash on every deletion Generate two “potential h ”s  1 and  2 at the beginning –Always use the first “good” one –If neither are good, rehash at every deletion –If not using  1, keep a top-level table for it for easy “goodness” checking (likewise for  2 )

Proof of history independence Table’s state is defined by: The current set of elements Top-level hash functions –Always the first “good”  i, or rechosen each step Low-level hash functions –Uniformly chosen from perfect functions Arrangement of sub-tables in memory –Use history-independent memory allocation Some other history independent things

Performance Lookup takes two steps Insertion and deletion take expected amortized O(1) time –There is a 1/poly chance that they will take more

SHI and Unique Representation Theorem [Hartline et al]: for a reversible data structure to be SHI, a canonical (unique) representation for each state must be determined during the data structure’s initialization.

SHI with Deletions Blelloch and Golovin : a dictionary based on linear probing –Goal: search in O(1) time (guaranteed) –Each cluster of size O(log n) –Can be obtained using 5-wise independence [Pagh et al., STOC 2007] –Needs ‘random oracle’ for high level intersection bias

Open Problems Better analysis for youth-rules as well as other priority functions with no random oracles. Efficient memory allocation –ours is O(s log s) Separations –Between strong and weak history independence [Buchbinder-Petrank] –Between history independent and traditional versions e.g. for Union Find Can persistence and (computational) history independence co-exist efficiently?

References Moni Naor and Vanessa Teague, Anti-persistence: History Independent Data Structures, STOC, Hartline, Hong, Mohr, Pentney and Rocke, Characterizing History Independent Data Structures, Algorithmica 2005 Buchbinder and Petrank, Lower and upper bounds on obtaining history independence, Information and Computation Guy Blelloch and Daniel Golovin, Strongly History-Independent Hashing with Applications, FOCS 2007 Tal Moran, Moni Naor and Gil Segev Deterministic History- Independent strategies for Storing Information in Write-Once Memories, ICALP 2007