Beyond Bloom Filters: From Approximate Membership Checks to Approximate State Machines By F. Bonomi et al. Presented by Kenny Cheng, Tonny Mak Yui Kuen.

Slides:



Advertisements
Similar presentations
Hash-based Indexes CS 186, Spring 2006 Lecture 7 R &G Chapter 11 HASH, x. There is no definition for this word -- nobody knows what hash is. Ambrose Bierce,
Advertisements

Hash-Based Indexes The slides for this text are organized into chapters. This lecture covers Chapter 10. Chapter 1: Introduction to Database Systems Chapter.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11.
Skip List & Hashing CSE, POSTECH.
Chapter 11 (3 rd Edition) Hash-Based Indexes Xuemin COMP9315: Database Systems Implementation.
Author: Nan Hua, Bill Lin, Jun (Jim) Xu, Haiquan (Chuck) Zhao Publisher: ANCS’08 Presenter: Yun-Yan Chang Date:2011/02/23 1.
Cuckoo Filter: Practically Better Than Bloom
Chisel: A Storage-efficient, Collision-free Hash-based Network Processing Architecture Author: Jahangir Hasan, Srihari Cadambi, Venkatta Jakkula Srimat.
An Improved Construction for Counting Bloom Filters Flavio Bonomi Michael Mitzenmacher Rina Panigrahy Sushil Singh George Varghese Presented by: Sailesh.
SIGMOD 2006University of Alberta1 Approximately Detecting Duplicates for Streaming Data using Stable Bloom Filters Presented by Fan Deng Joint work with.
Cuckoo Hashing : Hardware Implementations Adam Kirsch Michael Mitzenmacher.
Beyond Bloom Filters: Approximate Concurrent State Machines Michael Mitzenmacher Joint work with Flavio Bonomi, Rina Panigrahy, Sushil Singh, George Varghese.
Hit or Miss ? !!!.  Cache RAM is high-speed memory (usually SRAM).  The Cache stores frequently requested data.  If the CPU needs data, it will check.
Hit or Miss ? !!!.  Small size.  Simple and fast.  Implementable with hardware.  Does not need too much power.  Does not predict miss if we have.
Nested Transactional Memory: Model and Preliminary Architecture Sketches J. Eliot B. Moss Antony L. Hosking.
Bloom Filters Kira Radinsky Slides based on material from:
Streaming Algorithms for Robust, Real- Time Detection of DDoS Attacks S. Ganguly, M. Garofalakis, R. Rastogi, K. Sabnani Krishan Sabnani Bell Labs Research.
Fast Statistical Spam Filter by Approximate Classifications Authors: Kang Li Zhenyu Zhong University of Georgia Reader: Deke Guo.
Ph.D. SeminarUniversity of Alberta1 Approximation Algorithms for Frequency Related Query Processing on Streaming Data Presented by Fan Deng Supervisor:
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
1 Hash-Based Indexes Chapter Introduction  Hash-based indexes are best for equality selections. Cannot support range searches.  Static and dynamic.
Binary Search Introduction to Trees. Binary searching & introduction to trees 2 CMPS 12B, UC Santa Cruz Last time: recursion In the last lecture, we learned.
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
Performance Evaluation of IPv6 Packet Classification with Caching Author: Kai-Yuan Ho, Yaw-Chung Chen Publisher: ChinaCom 2008 Presenter: Chen-Yu Chaug.
Hash Tables1 Part E Hash Tables  
Reverse Hashing for Sketch Based Change Detection in High Speed Networks Ashish Gupta Elliot Parsons with Robert Schweller, Theory Group Advisor: Yan Chen.
Hash Tables1 Part E Hash Tables  
Bloom filters Probability and Computing Randomized algorithms and probabilistic analysis P109~P111 Michael Mitzenmacher Eli Upfal.
Hash Tables1 Part E Hash Tables  
Skip Lists1 Skip Lists William Pugh: ” Skip Lists: A Probabilistic Alternative to Balanced Trees ”, 1990  S0S0 S1S1 S2S2 S3S3 
Hashing and Packet Level Algorithms
Fast and deterministic hash table lookup using discriminative bloom filters  Author: Kun Huang, Gaogang Xie,  Publisher: 2013 ELSEVIER Journal of Network.
Approximate Encoding for Direct Access and Query Processing over Compressed Bitmaps Tan Apaydin – The Ohio State University Guadalupe Canahuate – The Ohio.
Compact Data Structures and Applications Gil Einziger and Roy Friedman Technion, Haifa.
Author: Haoyu Song, Fang Hao, Murali Kodialam, T.V. Lakshman Publisher: IEEE INFOCOM 2009 Presenter: Chin-Chung Pan Date: 2009/12/09.
TinyLFU: A Highly Efficient Cache Admission Policy
Peacock Hash: Deterministic and Updatable Hashing for High Performance Networking Sailesh Kumar Jonathan Turner Patrick Crowley.
Author: Sriram Ramabhadran, George Varghese Publisher: SIGMETRICS’03 Presenter: Yun-Yan Chang Date: 2010/12/29 1.
Dr. John P. Abraham Professor UTPA
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
Author : Guangdeng Liao, Heeyeol Yu, Laxmi Bhuyan Publisher : Publisher : DAC'10 Presenter : Jo-Ning Yu Date : 2010/10/06.
MadCache: A PC-aware Cache Insertion Policy Andrew Nere, Mitch Hayenga, and Mikko Lipasti PHARM Research Group University of Wisconsin – Madison June 20,
The Bloom Paradox Ori Rottenstreich Joint work with Yossi Kanizo and Isaac Keslassy Technion, Israel.
Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems Sailesh Kumar Patrick Crowley.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11 Modified by Donghui Zhang Jan 30, 2006.
Introduction to Database, Fall 2004/Melikyan1 Hash-Based Indexes Chapter 10.
1.1 CS220 Database Systems Indexing: Hashing Slides courtesy G. Kollios Boston University via UC Berkeley.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 10.
Lecture Topics: 11/24 Sharing Pages Demand Paging (and alternative) Page Replacement –optimal algorithm –implementable algorithms.
The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel.
HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation.
Bloom Filters. Lecture on Bloom Filters Not described in the textbook ! Lecture based in part on: Broder, Andrei; Mitzenmacher, Michael (2005), "Network.
1 ECE 526 – Network Processing Systems Design System Implementation Principles I Varghese Chapter 3.
Cuckoo Filter: Practically Better Than Bloom Author: Bin Fan, David G. Andersen, Michael Kaminsky, Michael D. Mitzenmacher Publisher: ACM CoNEXT 2014 Presenter:
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
Hierarchical packet classification using a Bloom filter and rule-priority tries Source : Computer Communications Authors : A. G. Alagu Priya 、 Hyesook.
Updating SF-Tree Speaker: Ho Wai Shing.
The Variable-Increment Counting Bloom Filter
Insert using Linear Hashing
Bloom filters Probability and Computing Michael Mitzenmacher Eli Upfal
Introduction to Database Systems
Bloom Filters Very fast set membership. Is x in S? False Positive
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
A Small and Fast IP Forwarding Table Using Hashing
Heavy Hitters in Streams and Sliding Windows
EMOMA- Exact Match in One Memory Access
Bloom filters From Probability and Computing
Author: Yi Lu, Balaji Prabhakar Publisher: INFOCOM’09
Presentation transcript:

Beyond Bloom Filters: From Approximate Membership Checks to Approximate State Machines By F. Bonomi et al. Presented by Kenny Cheng, Tonny Mak Yui Kuen

2 Introduction A)Motivation B)Objectives C)Problem statements

3 A) Motivation Increasing trend to keep flow state in routers Large memory space (~100 bits per flow) is needed for storing a large amount of flow states If memory space can be reduced, using fast on- chip memory is feasible to improve performance

4 B) Objectives Introduce the idea of an Approximate Concurrent State Machine (ACSM), it sacrifices some accuracy for memory size. Introduce and compare several solutions to ACSM problem To find an approach with the highest accuracy to memory ratio

5 C) Problem statements Describe 3 techniques based on Bloom filters and hashing, and evaluate them using both theoretical analysis and simulation

6 Bloom Filter A data structure proposed by Bloom in 1970 Designed for membership test, i.e. to test whether an element exists in a set Fast and compact Chance of false positive, i.e. an element not in the set may be wrongly identified No false negative, i.e. an element in the set must be identified correctly

7 How a Bloom Filter Works A bit array with all zeros initially k hash functions... 12k

8 How a Bloom Filter Works Hash the element using the hash functions, get k indices in the bit array Mark the bits to k Insertion x

9 How a Bloom Filter Works Hash the element using the hash functions If all corresponding bits are 1, it’s in the set... 12k Lookup x

10 How a Bloom Filter Works Sorry, no deletion You don’t know whether the bits are used by other elements or not, cannot simply clear them... 12k Deletion x 00?00100??100?

11 Counting Bloom Filter Use a counter to replace a bit For insertion, increment the counters For deletion, decrement the counters Problems: more space, overflow counters... 12k x

12 3 Approaches to ACSM Approaches: 1. Direct Bloom Filter 2. Stateful Bloom Filter 3. Fingerprint-compressed Filter Operations need to implement: 1. Insert(flow, state) 2. Lookup(flow) returns (state) 3. Delete(flow) 4. Update(flow, new_state)

13 Direct Bloom Filter Approach Use counting Bloom filter 4 operations: Insert – insert (flow_id, state) pair Lookup – if state is not provided, have to lookup every state, return “don’t know” if more than one state is found Delete – lookup + decrement counters Update – delete old + insert new Improvement: use timing-based deletion to handle non-terminated flows

14 Timing-based Deletion Add a timing bit to each cell Set the bit if the cell is touched Clear untouched cells periodically, and reset timing bits Alternative to DBF: use standard Bloom filter instead of counting, delete elements only by time-based deletion... 12k x Timing Bits

15 Stateful Bloom Filter Approach Direct Bloom Filter doesn’t store the state of a flow, need to lookup every state Improvement: add a state value for each cell for faster lookup Hash flow_id only, instead of (flow_id, state) pair Introduce a “don’t know” (DK) state when collision occurs Keep timing-based deletion

16 Stateful Bloom Filter Approach Insert, modify, delete – similar to Direct Bloom Filter, set the cell value to DK for collision (counter > 1) Lookup: If all cells are DK, return DK If all cells are either state i or DK, return state i If more than one state other than DK, return “not found”

FingerprintState Fingerprint-compressed Filter Approach Store a fingerprint of flow + state in a d-left hashtable... x 12d

18 Fingerprint-compressed Filter Approach Insert - hash the element, and find the corresponding bucket in each hash table, insert the fingerprint + state in the bucket with least number of elements (choose the left-most one to break ties) Lookup – retrieve the state of the fingerprint Delete – remove the fingerprint Update – direct update or remove old + add new Make use of DK when a fingerprint is found in multiple buckets Timing-based deletion can still be applied

19 Simulation To investigate the size/accuracy trade-off for the 3 approaches State machine: 10 states Legal state changes: 1 → 2 → 3 → … → 10 Run for 1 million flows About simultaneous flows 100 ± 40 packets for each flow Some packets trigger state change

20 Simulation 3 kinds of simulation flows Interesting flows (30%) – flows with legal state changes only, always complete Noise flows (30%) – flows with random (can be legal or illegal) state changes, never complete Random flows (40%) – flows without state change

21 Simulation False positive rate: % of completed flows which is not-interesting False negative rate: % of interesting flows without completion

22 Applications Place in the application level QoS:- Video congestion control Peer-to-Peer (P2P) traffic identification

23 Video congestion control Apply to MPEG video streaming 3 kinds of frames for MPEG video: I frame – scene information P frame – differential information B frame – least important information Can drop B frames up to 30% with acceptable quality Need to keep track of current frame

24 Video congestion control Use FCF ACSM to keep track of state Experimentally the highest false positive rate acceptable is 0.37% This requires a memory size of 27 bits per flow (about ¼ compared to original 100 bits)

25 P2P Traffic Identification To limit P2P flows to increase quality for other applications One possible way to identify a P2P flow: concurrent TCP and UDP flows Use ACSM for real-time P2P identification

26 Conclusion It’s feasible for ACSM FCF approach is the best approach Two potential applications are introduced for ACSM ACSM may be beneficial to QoS applications, which are fault-tolerant

27 Comments Authors focus on accuracy and memory size, but not real performance FCF approach may not perform well on hardware

- End - Question & Answer