Presentation is loading. Please wait.

Presentation is loading. Please wait.

How to Approximate a Set Without Knowing It’s Size In Advance? Rasmus Pagh Gil Segev Udi Wieder IT University of Copenhagen Stanford Microsoft Research.

Similar presentations


Presentation on theme: "How to Approximate a Set Without Knowing It’s Size In Advance? Rasmus Pagh Gil Segev Udi Wieder IT University of Copenhagen Stanford Microsoft Research."— Presentation transcript:

1 How to Approximate a Set Without Knowing It’s Size In Advance? Rasmus Pagh Gil Segev Udi Wieder IT University of Copenhagen Stanford Microsoft Research

2 Set Membership

3 Approximate Set Membership |S|= n

4 ASM in a picture S ⊆ [100]×[100] |S|=188 |A(S)|=5213 |A(S)|=2699|A(S)|=1580|A(S)|=918

5 Applications Many…. Very common in practice Data Bases, Networking and more… Serves as a filter for accessing slow/bandwidth bounded data Requests arrive first at the filter which determines which requests reside in the proxy’s cache and which should be fetched from the network. The cost of a false positive is a cache miss. Web Proxy Cache Filter: Approximation of the Cache Request External Web

6 Lower Bounds for Static Case: [CFGMW78]

7 Upper Bounds – Bloom Filters 1 1 1 1 X

8 Dictionary Based Upper Bounds

9 Separation of Static and Dynamic

10 But in practice…. The size of the set is not known in advance! Leads to over-provisioning of space up front Waste of space as long as the set is small Typically the data structure lies in prime real estate, the whole idea is saving space. Problem raised and handled in ‘practical’ papers Typically in a naïve way from a ‘theoretical’ point of view

11 Main Results (approximate) Super linear bound!

12 Lower Bound

13 Lower Bound – proof sketch...

14 Lower Bound: the encoding

15 Upper Bound – Construction 1

16 Getting Constant Query Time

17

18 Analysis

19 Extensions and standard tricks Extra space required when rebuilding the new dictionary. Both dictionaries need to be stored until the rebuild is complete. This can be mitigated by bucketing items into many smaller dictionaries, rebuilding the smaller dictionaries one at a time. De-amortization of Insert, Each time an item is inserting, perform O(1) operations on the next dictionary. Not compatible with bucketing technique, requires a small increase in space.

20 Supporting Deletions Necessary assumption: Only items that are in the set are ever deleted. The removal of a ‘false positive’ item may introduce false negatives The assumption makes sense in many applications when data structure filters a cache Standard approach of storing multi-sets is problematic. An item generates many signatures, can’t tell which one to remove. Upon insertions, if fingerprint already appears put it in a secondary structure. Upon removal check secondary structure first. Requires assumption that each item is inserted only once Requires some extra book keeping.

21 Open Problems Bridge a theory – practice gap Practitioners seem content with the solution of multiple bloom filters But then, practitioners seem content with Bloom Filters… Get the leading constant in front of log log n THANK YOU


Download ppt "How to Approximate a Set Without Knowing It’s Size In Advance? Rasmus Pagh Gil Segev Udi Wieder IT University of Copenhagen Stanford Microsoft Research."

Similar presentations


Ads by Google