History-Independent Cuckoo Hashing Weizmann Institute Israel Udi WiederMoni NaorGil Segev Microsoft Research Silicon Valley.

Slides:



Advertisements
Similar presentations
Lower and Upper Bounds on Obtaining History Independence
Advertisements

Anti-Persistence or History Independent Data Structures Moni Naor Vanessa Teague Weizmann Institute Stanford.
David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing.
Multiple Choice Hash Tables with Moves on Deletes and Inserts Adam Kirsch Michael Mitzenmacher.
Order Statistics Sorted
Approximation, Chance and Networks Lecture Notes BISS 2005, Bertinoro March Alessandro Panconesi University La Sapienza of Rome.
Weizmann Institute of Science Israel Deterministic History-Independent Strategies for Storing Information on Write-Once Memories Tal Moran Moni Naor Gil.
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
Lecture 11 oct 6 Goals: hashing hash functions chaining closed hashing application of hashing.
Skip List & Hashing CSE, POSTECH.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Optimal Fast Hashing Yossi Kanizo (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel) and David Hay (Hebrew Univ., Israel)
Chord: A scalable peer-to- peer lookup service for Internet applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashock, Hari Balakrishnan.
A Look at Modern Dictionary Structures & Algorithms Warren Hunt.
Randomized Algorithms Lecturer: Moni Naor Cuckoo Hashing.
Cuckoo Hashing : Hardware Implementations Adam Kirsch Michael Mitzenmacher.
I/O Efficient Algorithms. Problem Data is often too massive to fit in the internal memory The I/O communication between internal memory (fast) and external.
B+-tree and Hashing.
E.G.M. PetrakisHashing1  Data organization in main memory or disk  sequential, binary trees, …  The location of a key depends on other keys => unnecessary.
Accessing Spatial Data
UMass Lowell Computer Science Graduate Analysis of Algorithms Prof. Karen Daniels Spring, 2009 Lecture 3 Tuesday, 2/10/09 Amortized Analysis.
Cuckoo Hashing and CAMs Michael Mitzenmacher. Background For the past several years, I have had funding from Cisco to research hash tables and related.
Lecturer: Moni Naor Foundations of Privacy Informal Lecture Anti-Persistence or History Independent Data Structures.
Lower and Upper Bounds on Obtaining History Independence Niv Buchbinder and Erez Petrank Technion, Israel.
REPRESENTING SETS CSC 172 SPRING 2002 LECTURE 21.
Advanced Algorithms for Massive Datasets Basics of Hashing.
History Independent Data-Structures. What is History Independent Data-Structure ? Sometimes data structures keep unnecessary information. –not accessible.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (continued) Advanced Implementation of Tables.
Lecture 11 oct 7 Goals: hashing hash functions chaining closed hashing application of hashing.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (excerpts) Advanced Implementation of Tables CS102 Sections 51 and 52 Marc Smith and.
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
L. Grewe. Computing hash function for a string Horner’s rule: (( … (a 0 x + a 1 ) x + a 2 ) x + … + a n-2 )x + a n-1 ) int hash( const string & key )
Data Structures Hashing Uri Zwick January 2014.
Storage and Indexing February 26 th, 2003 Lecture 19.
(c) University of Washingtonhashing-1 CSC 143 Java Hashing Set Implementation via Hashing.
Weizmann Institute of Science Israel Deterministic History-Independent Strategies for Storing Information on Write-Once Memories Tal Moran Moni Naor Gil.
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
Foundations of Privacy Lecture 9: History-Independent Hashing Schemes (and applications) Lecturer: Gil Segev.
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (continued) Advanced Implementation of Tables.
Content Addressable Network CAN. The CAN is essentially a distributed Internet-scale hash table that maps file names to their location in the network.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Ihab Mohammed and Safaa Alwajidi. Introduction Hash tables are dictionary structure that store objects with keys and provide very fast access. Hash table.
CSC 172 DATA STRUCTURES. SETS and HASHING  Unadvertised in-store special: SETS!  in JAVA, see Weiss 4.8  Simple Idea: Characteristic Vector  HASHING...The.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Hashing TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA Course: Data Structures Lecturer: Haim Kaplan and Uri Zwick.
Storage and Indexing. How do we store efficiently large amounts of data? The appropriate storage depends on what kind of accesses we expect to have to.
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
Hashing (part 2) CSE 2011 Winter March 2018.
CSC 172 DATA STRUCTURES.
CS 326A: Motion Planning Probabilistic Roadmaps for Path Planning in High-Dimensional Configuration Spaces (1996) L. Kavraki, P. Švestka, J.-C. Latombe,
COMP 430 Intro. to Database Systems
Quadratic probing Double hashing Removal and open addressing Chaining
CS202 - Fundamental Structures of Computer Science II
Advanced Implementation of Tables
Advanced Implementation of Tables
Storage and Indexing.
General External Merge Sort
Minwise Hashing and Efficient Search
Lecture 20 Hashing Amortized Analysis
Path Oram An Extremely Simple Oblivious RAM Protocol
Algorithm Course Algorithms Lecture 3 Sorting Algorithm-1
CSE 373: Data Structures and Algorithms
Presentation transcript:

History-Independent Cuckoo Hashing Weizmann Institute Israel Udi WiederMoni NaorGil Segev Microsoft Research Silicon Valley

2 Election Day Carol Bob Carol Elections for class president Each student whispers in Mr. Drew’s ear Mr. Drew writes down the votes Alice Bob Alice Problem: Mr. Drew’s notebook leaks sensitive information First student voted for Carol Second student voted for Alice … Alice May compromise the privacy of the elections

3 Election Day Carol Alice Bob CarolAlice Bob What about more involved applications? Write-in candidates Votes which are subsets or rankings …. A simple solution: Lexicographically sorted list of candidates Unary counters

4 Learning From History A simple example: sorted list Canonical memory representation Not really efficient... The two levels of a data structure “Legitimate” interface Memory representation History independence The memory representation should not reveal information that cannot be obtained using the legitimate interface Alice Bob Carol

5 Typical Applications Incremental cryptography [BGG94, Mic97] Voting [MKSW06, MNS07] Set comparison & reconciliation [MNS08] Computational Geometry [BGV08]...

6 Our Contribution The first HI dictionary that simultaneously achieves the following: Efficiency: Lookup time – O(1) worst case Update time – O(1) expected amortized Memory utilization 50% ( 25% with deletions) Strongest notion of history independence Simple and fast

7 Notions of History Independence Micciancio (1997): oblivious trees Motivated by incremental cryptography Only considered the shape of the trees and not their memory representation Naor and Teague (2001) Memory representation Weak & strong history independence

8 Notions of History Independence Weak history independence Memory revealed at the end of an activity period Any two sequences of operations S 1 and S 2 that lead to the same content induce the same distribution on the memory representation Strong history independence Memory revealed several times during an activity period Any two sets of breakpoints along S 1 and S 2 with the same content at each breakpoint, induce the same distributions on the memory representation at all these points Completely randomizing memory after each operation is not good enough.

9 Notions of History Independence Weak & strong are not equivalent WHI for reversible data structures is possible without a canonical representation Provable efficiency gaps [BP06] (in restricted models) We consider strong history independence Canonical representation (up to initial randomness) implies SHI Other direction shown to hold for reversible data structures [HHMPR05]

10 SHI Dictionaries Deletions Memory utilization Update time Lookup time Practical? Naor & Teague ‘01 Blelloch & Golovin ‘07 This work 99% < 9% > 25% (> 50%) O(1) expected O(1) worst case O(1) expected O(1) worst case ? (mem. util. < 50%)

11 Our Approach Cuckoo hashing [PR01]: A simple & practical scheme with worst case constant lookup time Force a canonical representation on cuckoo hashing No significant loss in efficiency Avoid rehashing by using a small stash What happens when hash functions fail? Rehashing is highly problematic in SHI data structures All hash functions need to be sampled in advance When an item is deleted, may need to roll back on previous functions We use a secondary storage to reduces the failure probability exponentially [KMW08]

12 Cuckoo Hashing Tables T 1 and T 2 with hash functions h 1 and h 2 Store x in one of T 1 [h 1 (x)] and T 2 [h 2 (x)] Insert(x): Greedily insert in T 1 or T 2 if both are full insert in T 1 Repeat in other table with the previous occupant (if any) Y Z V T1T1 T2T2 X Z Y V T1T1 T2T2 X Successful insertion WW

13 Cuckoo Hashing Tables T 1 and T 2 with hash functions h 1 and h 2 Store x in one of T 1 [h 1 (x)] and T 2 [h 2 (x)] Y U Z V T1T1 T2T2 X Failure – rehash required Insert(x): Greedily insert in T 1 or T 2 if both are full insert in T 1 Repeat in other table with the previous occupant (if any)

14 The Cuckoo Graph Set S ½ U containing n keys h 1, h 2 : U ! {1,...,r} Bipartite graph with sets of size r Edge (h 1 (x), h 2 (x)) for every x 2 S S is successfully stored Every connected component has at most one cycle Main theorem: If r ¸ (1 + ² )n and h 1,h 2 are log(n) -wise independent, then failure probability is £ (1/n)

15 The Canonical Representation Assume that S can be stored using h 1 and h 2 We force a canonical representation on the cuckoo graph Suffices to consider a single connected component Assume that S forms a tree in the cuckoo graph. Typical case One location must be empty. The choice of the empty location uniquely determines the location of all elements a b d c e Rule: h 1 ( minimal element ) is empty

16 The Canonical Representation Assume that S can be stored using h 1 and h 2 We force a canonical representation on the cuckoo graph Suffices to consider a single connected component Assume that S has one cycle Two ways to assign elements in the cycle Each choice uniquely determines the location of all elements a b d c e Rule: minimal element in cycle lies in T 1

17 The Canonical Representation Updates efficiently maintain the canonical representation Insertions: New leaf: check if new element is smaller than current min new cycle: Same component… Merging two components… All cases straight forward Update time < size of component = expected (small) constant Deletions: Find the new min, split component,… Requires connecting all elements in the component with a cyclic list Memory utilization drops to 25% All cases straight forward

18 Rehashing What if S cannot be stored using h 1 and h 2 ? Happens with probability 1/n Can we simply pick new functions? Canonical memory implies we need to sample all hash functions in advance Whenever an item is deleted, need to check whether we can role back to previous hash functions A bad item which is repeatedly inserted and deleted would cause a rehash every operation!

19 Using a Stash Whenever an insert fails, put a ‘bad’ item in a secondary data structure Bad item: smallest item that belongs to a cycle Secondary data structure must be SHI in itself Theorem [KMW08]: Pr[|stash| > s] < n -s In practice keeping the stash as a sorted list is probably the best solution Effectively the query time is constant with (very) high probability In theory the stash could be any SHI with constant lookup time A deterministic hashing scheme, where the elements are rehashed whenever the content changes [AN96, HMP01]

20 Conclusions and Problems Cuckoo hashing is a robust and flexible hashing scheme Easily ‘molded’ into a history independent data structure We don’t know how to analyze variants with more than 2 hash functions and/or more than 1 element per bucket Expected size of connected component is not constant Full performance analysis