History-Independent Cuckoo Hashing Weizmann Institute Israel Udi WiederMoni NaorGil Segev Microsoft Research Silicon Valley.

History-Independent Cuckoo Hashing Weizmann Institute Israel Udi WiederMoni NaorGil Segev Microsoft Research Silicon Valley

2 Election Day Carol Bob Carol Elections for class president Each student whispers in Mr. Drew’s ear Mr. Drew writes down the votes Alice Bob Alice Problem: Mr. Drew’s notebook leaks sensitive information First student voted for Carol Second student voted for Alice … Alice May compromise the privacy of the elections

3 Election Day Carol Alice Bob 1 1 1 1 CarolAlice Bob What about more involved applications? Write-in candidates Votes which are subsets or rankings …. A simple solution: Lexicographically sorted list of candidates Unary counters

4 Learning From History A simple example: sorted list Canonical memory representation Not really efficient... The two levels of a data structure “Legitimate” interface Memory representation History independence The memory representation should not reveal information that cannot be obtained using the legitimate interface Alice Bob Carol

5 Typical Applications Incremental cryptography [BGG94, Mic97] Voting [MKSW06, MNS07] Set comparison & reconciliation [MNS08] Computational Geometry [BGV08]...

6 Our Contribution The first HI dictionary that simultaneously achieves the following: Efficiency: Lookup time – O(1) worst case Update time – O(1) expected amortized Memory utilization 50% ( 25% with deletions) Strongest notion of history independence Simple and fast

7 Notions of History Independence Micciancio (1997): oblivious trees Motivated by incremental cryptography Only considered the shape of the trees and not their memory representation Naor and Teague (2001) Memory representation Weak & strong history independence

8 Notions of History Independence Weak history independence Memory revealed at the end of an activity period Any two sequences of operations S 1 and S 2 that lead to the same content induce the same distribution on the memory representation Strong history independence Memory revealed several times during an activity period Any two sets of breakpoints along S 1 and S 2 with the same content at each breakpoint, induce the same distributions on the memory representation at all these points Completely randomizing memory after each operation is not good enough.

9 Notions of History Independence Weak & strong are not equivalent WHI for reversible data structures is possible without a canonical representation Provable efficiency gaps [BP06] (in restricted models) We consider strong history independence Canonical representation (up to initial randomness) implies SHI Other direction shown to hold for reversible data structures [HHMPR05]

10 SHI Dictionaries Deletions Memory utilization Update time Lookup time Practical? Naor & Teague ‘01 Blelloch & Golovin ‘07 This work 99% < 9% > 25% (> 50%) O(1) expected O(1) worst case O(1) expected O(1) worst case ? (mem. util. < 50%)

11 Our Approach Cuckoo hashing [PR01]: A simple & practical scheme with worst case constant lookup time Force a canonical representation on cuckoo hashing No significant loss in efficiency Avoid rehashing by using a small stash What happens when hash functions fail? Rehashing is highly problematic in SHI data structures All hash functions need to be sampled in advance When an item is deleted, may need to roll back on previous functions We use a secondary storage to reduces the failure probability exponentially [KMW08]

12 Cuckoo Hashing Tables T 1 and T 2 with hash functions h 1 and h 2 Store x in one of T 1 [h 1 (x)] and T 2 [h 2 (x)] Insert(x): Greedily insert in T 1 or T 2 if both are full insert in T 1 Repeat in other table with the previous occupant (if any) Y Z V T1T1 T2T2 X Z Y V T1T1 T2T2 X Successful insertion WW

13 Cuckoo Hashing Tables T 1 and T 2 with hash functions h 1 and h 2 Store x in one of T 1 [h 1 (x)] and T 2 [h 2 (x)] Y U Z V T1T1 T2T2 X Failure – rehash required Insert(x): Greedily insert in T 1 or T 2 if both are full insert in T 1 Repeat in other table with the previous occupant (if any)

14 The Cuckoo Graph Set S ½ U containing n keys h 1, h 2 : U ! {1,...,r} Bipartite graph with sets of size r Edge (h 1 (x), h 2 (x)) for every x 2 S S is successfully stored Every connected component has at most one cycle Main theorem: If r ¸ (1 + ² )n and h 1,h 2 are log(n) -wise independent, then failure probability is £ (1/n)

15 The Canonical Representation Assume that S can be stored using h 1 and h 2 We force a canonical representation on the cuckoo graph Suffices to consider a single connected component Assume that S forms a tree in the cuckoo graph. Typical case One location must be empty. The choice of the empty location uniquely determines the location of all elements a b d c e Rule: h 1 ( minimal element ) is empty

16 The Canonical Representation Assume that S can be stored using h 1 and h 2 We force a canonical representation on the cuckoo graph Suffices to consider a single connected component Assume that S has one cycle Two ways to assign elements in the cycle Each choice uniquely determines the location of all elements a b d c e Rule: minimal element in cycle lies in T 1

17 The Canonical Representation Updates efficiently maintain the canonical representation Insertions: New leaf: check if new element is smaller than current min new cycle: Same component… Merging two components… All cases straight forward Update time < size of component = expected (small) constant Deletions: Find the new min, split component,… Requires connecting all elements in the component with a cyclic list Memory utilization drops to 25% All cases straight forward

18 Rehashing What if S cannot be stored using h 1 and h 2 ? Happens with probability 1/n Can we simply pick new functions? Canonical memory implies we need to sample all hash functions in advance Whenever an item is deleted, need to check whether we can role back to previous hash functions A bad item which is repeatedly inserted and deleted would cause a rehash every operation!

19 Using a Stash Whenever an insert fails, put a ‘bad’ item in a secondary data structure Bad item: smallest item that belongs to a cycle Secondary data structure must be SHI in itself Theorem [KMW08]: Pr[|stash| > s] < n -s In practice keeping the stash as a sorted list is probably the best solution Effectively the query time is constant with (very) high probability In theory the stash could be any SHI with constant lookup time A deterministic hashing scheme, where the elements are rehashed whenever the content changes [AN96, HMP01]

20 Conclusions and Problems Cuckoo hashing is a robust and flexible hashing scheme Easily ‘molded’ into a history independent data structure We don’t know how to analyze variants with more than 2 hash functions and/or more than 1 element per bucket Expected size of connected component is not constant Full performance analysis

History-Independent Cuckoo Hashing Weizmann Institute Israel Udi WiederMoni NaorGil Segev Microsoft Research Silicon Valley.

Similar presentations

Presentation on theme: "History-Independent Cuckoo Hashing Weizmann Institute Israel Udi WiederMoni NaorGil Segev Microsoft Research Silicon Valley."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

History-Independent Cuckoo Hashing Weizmann Institute Israel Udi WiederMoni NaorGil Segev Microsoft Research Silicon Valley.

Similar presentations

Presentation on theme: "History-Independent Cuckoo Hashing Weizmann Institute Israel Udi WiederMoni NaorGil Segev Microsoft Research Silicon Valley."— Presentation transcript:

Similar presentations

About project

Feedback