Reconciling Differences: towards a theory of cloud complexity George Varghese UCSD, visiting at Yahoo! Labs 1.

Reconciling Differences: towards a theory of cloud complexity George Varghese UCSD, visiting at Yahoo! Labs 1

2 Part 1: Reconciling Sets across a link Joint with D. Eppstein, M. Goodrich, F. Uyeda Appeared in SIGCOMM 2011

Motivation 1: OSPF Routing (1990) After partition forms and heals, R1 needs updates at R2 that arrived during partition. 3 R1 R2 Must solve the Set-Difference Problem! Partition heals

Motivation 2:Amazon S3 storage (2007) Synchronizing replicas. 4 S1 S2 Set-Difference across cloud again! Periodic Anti-entropy Protocol between replicas

What is the Set-Difference problem? What objects are unique to host 1? What objects are unique to host 2? A A Host 1Host 2 C C A A F F E E B B D D F F 5

Use case 1: Data Synchronization Identify missing data blocks Transfer blocks to synchronize sets A A Host 1Host 2 C C A A F F E E B B D D F F D D C C B B E E 6

Use case 2: Data De-duplication Identify all unique blocks. Replace duplicate data with pointers A A Host 1Host 2 C C A A F F E E B B D D F F 7

Prior work versus ours Trade a sorted list of keys. – Let n be size of sets, U be size of key space – O(n log U) communication, O(n log n) computation – Bloom filters can improve to O(n) communication. Polynomial Encodings (Minsky,Trachtenberg) – Let “d” be the size of the difference – O(d log U) communication, O(dn+d 3 ) computation Invertible Bloom Filter (our result) – O(d log U) communication, O(n+d) computation 8

Difference Digests Efficiently solves the set-difference problem. Consists of two data structures: – Invertible Bloom Filter (IBF) Efficiently computes the set difference. Needs the size of the difference – Strata Estimator Approximates the size of the set difference. Uses IBF’s as a building block. 9

IBFs: main idea Sum over random subsets: Summarize a set by “checksums” over O(d) random subsets. Subtract: Exchange and subtract checksums. Eliminate: Hashing for subset choice  common elements disappear after subtraction Invert fast: O(d) equations in d unknowns; randomness allows expected O(d) inversion. 10

“Checksum” details Array of IBF cells that form “checksum” words – For set difference of size d, use αd cells (α > 1) Each element ID is assigned to many IBF cells Each cell contains: 11 idSumXOR of all IDs assigned to cell hashSumXOR of hash(ID) of IDs assigned to cell countNumber of IDs assigned to cell

IBF Encode A A idSum ⊕ A hashSum ⊕ H(A) count++ idSum ⊕ A hashSum ⊕ H(A) count++ idSum ⊕ A hashSum ⊕ H(A) count++ idSum ⊕ A hashSum ⊕ H(A) count++ idSum ⊕ A hashSum ⊕ H(A) count++ idSum ⊕ A hashSum ⊕ H(A) count++ Hash1 Hash2 Hash3 B B C C Assign ID to many cells 12 IBF: αd “Add” ID to cell Not O(n), like Bloom Filters! All hosts use the same hash functions

Invertible Bloom Filters (IBF) Trade IBF’s with remote host A A Host 1Host 2 C C A A F F E E B B D D F F IBF 2 IBF 1 13

Invertible Bloom Filters (IBF) “Subtract” IBF structures – Produces a new IBF containing only unique objects A A Host 1Host 2 C C A A F F E E B B D D F F IBF 2 IBF 1 IBF (2 - 1) 14

IBF Subtract 15

Disappearing act After subtraction, elements common to both sets disappear because: – Any common element (e.g W) is assigned to same cells on both hosts (same hash functions on both sides) – On subtraction, W XOR W = 0. Thus, W vanishes. While elements in set difference remain, they may be randomly mixed  need a decode procedure. 16

IBF Decode 17 H(V ⊕ X ⊕ Z) ≠ H(V) ⊕ H(X) ⊕ H(Z) H(V ⊕ X ⊕ Z) ≠ H(V) ⊕ H(X) ⊕ H(Z) Test for Purity: H( idSum ) Test for Purity: H( idSum ) H( idSum ) = hashSum H(V) = H(V) H( idSum ) = hashSum H(V) = H(V)

IBF Decode 18

IBF Decode 19

IBF Decode 20

21 Small Diffs: 1.4x – 2.3x Large Differences: 1.25x - 1.4x How many IBF cells? Space Overhead Set Difference Hash Cnt 3 Hash Cnt 4 Overhead to decode at >99% α

How many hash functions? 1 hash function produces many pure cells initially but nothing to undo when an element is removed. 22 A A B B C C

How many hash functions? 1 hash function produces many pure cells initially but nothing to undo when an element is removed. Many (say 10) hash functions: too many collisions. 23 A A A A B B C C B B C C A A A A B B B B C C C C

How many hash functions? 1 hash function produces many pure cells initially but nothing to undo when an element is removed. Many (say 10) hash functions: too many collisions. We find by experiment that 3 or 4 hash functions works well. Is there some theoretical reason? 24 A A A A B B C C C C A A B B B B C C

Theory Let d = difference size, k = # hash functions. Theorem 1: With (k + 1) d cells, failure probability falls exponentially with k. – For k = 3, implies a 4x tax on storage, a bit weak. [Goodrich,Mitzenmacher]: Failure is equivalent to finding a 2-core (loop) in a random hypergraph Theorem 2: With c k d, cells, failure probability falls exponentially with k. – c 4 = 1.3x tax, agrees with experiments 25

26 Large Differences: 1.25x - 1.4x Recall experiments Space Overhead Set Difference Hash Cnt 3 Hash Cnt 4 Overhead to decode at >99%

Connection to Coding Mystery: IBF decode similar to peeling procedure used to decode Tornado codes. Why? Explanation: Set Difference is equivalent to coding with insert-delete channels Intuition: Given a code for set A, send checkwords only to B. Think of B as a corrupted form of A. Reduction: If code can correct D insertions/deletions, then B can recover A and the set difference. 27 Reed Solomon Polynomial Methods LDPC (Tornado) Difference Digest Reed Solomon Polynomial Methods LDPC (Tornado) Difference Digest

Random Subsets  Fast Elimination 28 Sparse X + Y + Z =.. αd X =.. Y =.. Pure Roughly upper triangular and sparse

Difference Digests Consists of two data structures: – Invertible Bloom Filter (IBF) Efficiently computes the set difference. Needs the size of the difference – Strata Estimator Approximates the size of the set difference. Uses IBF’s as a building block. 29

Strata Estimator A A Consistent Partitioning Consistent Partitioning B B C C 30 ~1/2 ~1/4 ~1/8 1/16 IBF 1 IBF 4 IBF 3 IBF 2 Estimator Divide keys into sampled subsets containing ~1/2 k Encode each subset into an IBF of small fixed size – log(n) IBF’s of ~20 cells each

4x Strata Estimator 31 IBF 1 IBF 4 IBF 3 IBF 2 Estimator 1 Attempt to subtract & decode IBF’s at each level. If level k decodes, then return: 2 k x (the number of ID’s recovered) … IBF 1 IBF 4 IBF 3 IBF 2 Estimator 2 … Decode Host 1 Host 2

KeyDiff Service Promising Applications: – File Synchronization – P2P file sharing – Failure Recovery Key Service Application Add( key ) Remove( key ) Diff( host1, host2 ) 32

Difference Digest Summary Strata Estimator – Estimates Set Difference. – For 100K sets, 15KB estimator has <15% error – O(log n) communication, O(n) computation. Invertible Bloom Filter – Identifies all ID’s in the Set Difference. – 16 to 28 Bytes per ID in Set Difference. – O(d) communication, O(n+d) computation – Worth it if set difference is < 20% of set sizes 33

Connection to Sparse Recovery? If we forget about subtraction, in the end we are recovering a d-sparse vector. Note that the hash check is key for figuring out which cells are pure after differencing. Is there a connection to compressed sensing. Could sensors do the random summing? The hash summing? Connection the other way: could use compressed sensing for differences? 34

Comparison with Information Theory and Coding Worst case complexity versus average It emphasize communication complexity not computation complexity: we focus on both. Existence versus Constructive: some similar settings (Slepian-Wolf) are existential Estimators: We want bounds based on difference and so start by efficiently estimating difference. 35

Aside: IBFs in Digital Hardware 36 a, b, x, y Stream of set elements Logic (Read, hash, Write) Bank 1Bank 2 Bank 3 Hash 1 Hash 2 Hash 3 Hash to separate banks for parallelism, slight cost in space needed. Decode in software Strata Hash

37 Part 2: Towards a theory of Cloud Complexity ? O1 O3 O2 Complexity of reconciling “similar” objects?

38 Example: Synching Files ? Measures: Communication bits, computation X.ppt.v3 X.ppt.v2 X.ppt.v1

39 So far: Two sets, one link, set difference {a,b,c} {d,a,c}

40 Mild Sensitivity Analysis: One set much larger than other ? Set A Set B Small difference d (|A|) bits needed, not O (d) : Patrascu 2008 Simpler proof: DKS 2011 (|A|) bits needed, not O (d) : Patrascu 2008 Simpler proof: DKS 2011

41 Asymmetric set difference in LBFS File System (Mazieres) ? File A Chunk Set B at Server 1 chunk difference LBFS sends all chunk hashes in File A: O|A| C1 C2 C3 C97 C98 C99 C1 C5 C3 C97 C98 C99... File B

42 More Sensitivity Analysis: small intersection: database joins ? Set A Set B Small intersection d (|A|) bits needed, not O (d) : Follows from results on hardness of set disjointness

43 Sequences under Edit Distance (Files for example) ? File A File B Edit distance 2 Insert/delete can renumber all file blocks... A B C D E F A C D E F G

44 Sequence reconciliation (with J. Ullman) File A File B Edit distance 1 Send 2d+1 piece hashes. Clump unmatched pieces and recurse. O( d log (N) ) A B C D E F A C D E F H1 H2 H3 H2 H3 2

21 years of Sequence Reconciliation! Schwartz, Bowdidge, Burkhard (1990): recurse on unmatched pieces, not aggregate. Rsync: widely used tool that breaks file into roughly piece hashes, N is file length. 45 UCSD, Lunch Princeton, kids

46 Sets on graphs? {a,b,c} {d,c,e} {b,c,d} {a,f,g}

47 Generalizes rumor spreading which has disjoint singleton sets {a} {d} {b} {g} CLP10,G11,: O( E n log n /conductance)

48 Generalized Push-Pull (with N. Goyal and R. Kannan) {a,b,c} {d,c,e} {b,c,d} Pick random edge Do 2 party set reconciliation Complexity: C + D, C as before, D = Sum (U – S ) i i

49 Sets on Steiner graphs? {a} U S {b} U S R1 Only terminals need sets. Push-pull wasteful!

Butterfly example for Sets 50 S2 S1 D = Diff(S1,S2) S2 D D Set difference instead of XOR within network S1 X Y

How does reconciliation on Steiner graphs relate to network coding? Objects in general, not just bits. Routers do not need objects but can transform/code objects. What transformations within network allow efficient communication close to lower bound? 51

52 Sequences with d mutations: VM code pages (with Ramjee et al) ? VM A VM B 2 “errors” Reconcile Set A = {(A,1)(B,2),(C,3),(D,4),(E,5)} and Set B = {(A,1),(X,2),(C,3),(D,4),(Y,5)} A B C D E A X C D Y

Twist: IBFs for error correction? (with M. Mitzenmacher) Write message M[1..n] of n words as set S = {(M[1],1), (M[2], 2),.. (M[n], n)}. Calculate IBF(S) and transmit M, IBF(S) Receiver uses received message M’ to find IBF(S’); subtracts from IBF’(S) to locate errors. Protect IBF using Reed-Solomon or redundancy Why: Potentially O(e) decoding for e errors -- Raptor codes achieve this for erasure channels. 53

The Cloud Complexity Milieu 54 2 Node GraphSteiner Nodes Sets (Key,values)EGUV11GKV11? Sequence, Edit Distance (Files) SBB90?? Sequence, errors only (VMs) MV11 ?? Sets of sets (database tables) ??? Streams (movies)??? Other dimensions: approximate, secure,...

Conclusions: Got Diffs? Resiliency and fast recoding of random sums  set reconciliation; and error correction? Sets on graphs – All terminals: generalizes rumor spreading – Routers,terminals: resemblance to network coding. Cloud complexity: Some points covered, many remain Practical, may be useful to synch devices across cloud. 55

Comparison to Logs/Incremental Updates IBF work with no prior context. Logs work with prior context, BUT – Redundant information when sync’ing with multiple parties. – Logging must be built into system for each write. – Logging adds overhead at runtime. – Logging requires non-volatile storage. Often not present in network devices. 56 IBF’s may out-perform logs when: Synchronizing multiple parties Synchronizations happen infrequently IBF’s may out-perform logs when: Synchronizing multiple parties Synchronizations happen infrequently

Reconciling Differences: towards a theory of cloud complexity George Varghese UCSD, visiting at Yahoo! Labs 1.

Similar presentations

Presentation on theme: "Reconciling Differences: towards a theory of cloud complexity George Varghese UCSD, visiting at Yahoo! Labs 1."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Reconciling Differences: towards a theory of cloud complexity George Varghese UCSD, visiting at Yahoo! Labs 1.

Similar presentations

Presentation on theme: "Reconciling Differences: towards a theory of cloud complexity George Varghese UCSD, visiting at Yahoo! Labs 1."— Presentation transcript:

Similar presentations

About project

Feedback