A flow aware packet sampling mechanism for high speed links

Slides:



Advertisements
Similar presentations
Why Simple Hash Functions Work : Exploiting the Entropy in a Data Stream Michael Mitzenmacher Salil Vadhan And improvements with Kai-Min Chung.
Advertisements

Author : Xinming Chen,Kailin Ge,Zhen Chen and Jun Li Publisher : ANCS, 2011 Presenter : Tsung-Lin Hsieh Date : 2011/12/14 1.
Fast Firewall Implementation for Software and Hardware-based Routers Lili Qiu, Microsoft Research George Varghese, UCSD Subhash Suri, UCSB 9 th International.
OpenSketch Slides courtesy of Minlan Yu 1. Management = Measurement + Control Traffic engineering – Identify large traffic aggregates, traffic changes.
Why Simple Hash Functions Work : Exploiting the Entropy in a Data Stream Michael Mitzenmacher Salil Vadhan.
A Fast and Compact Method for Unveiling Significant Patterns in High-Speed Networks Tian Bu 1, Jin Cao 1, Aiyou Chen 1, Patrick P. C. Lee 2 Bell Labs,
Estimating TCP Latency Approximately with Passive Measurements Sriharsha Gangam, Jaideep Chandrashekar, Ítalo Cunha, Jim Kurose.
Paging Hardware With TLB
Indian Statistical Institute Kolkata
11 Packet Sampling for Worm and Botnet Detection in TCP Connections Reporter: 林佳宜 /10/25.
ClassBench: A Packet Classification Benchmark
Router Architecture : Building high-performance routers Ian Pratt
Hit or Miss ? !!!.  Cache RAM is high-speed memory (usually SRAM).  The Cache stores frequently requested data.  If the CPU needs data, it will check.
Models and Security Requirements for IDS. Overview The system and attack model Security requirements for IDS –Sensitivity –Detection Analysis methodology.
Trajectory Sampling for Direct Traffic Observation Matthias Grossglauser joint work with Nick Duffield AT&T Labs – Research.
Polytechnic University,ECE Department1 Detection of “Hot Spots” Paper Title : Joint Data Streaming and Sampling Techniques for Detection of Super Sources.
Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications Robert Schweller 1, Zhichun Li 1, Yan Chen 1, Yan Gao 1, Ashish.
Look-up problem IP address did we see the IP address before?
Payload Attribution via Hierarchical Bloom Filters
1Bloom Filters Lookup questions: Does item “ x ” exist in a set or multiset? Data set may be very big or expensive to access. Filter lookup questions with.
1 The Mystery of Cooperative Web Caching 2 b b Web caching : is a process implemented by a caching proxy to improve the efficiency of the web. It reduces.
Network Layer (3). Node lookup in p2p networks Section in the textbook. In a p2p network, each node may provide some kind of service for other.
Managing Multi-Configuration Hardware via Dynamic Working Set Analysis By Ashutosh S.Dhodapkar and James E.Smith Presented by Kyriakos Yioutanis.
Particle Filtering in Network Tomography
Page 19/17/2015 CSE 30341: Operating Systems Principles Optimal Algorithm  Replace page that will not be used for longest period of time  Used for measuring.
Scalable and Efficient Data Streaming Algorithms for Detecting Common Content in Internet Traffic Minho Sung Networking & Telecommunications Group College.
Sujayyendhiren RS, Kaiqi Xiong and Minseok Kwon Rochester Institute of Technology Motivation Experimental Setup in ProtoGENI Conclusions and Future Work.
CEDAR Counter-Estimation Decoupling for Approximate Rates Erez Tsidon Joint work with Iddo Hanniel and Isaac Keslassy Technion, Israel 1.
Author: Haoyu Song, Fang Hao, Murali Kodialam, T.V. Lakshman Publisher: IEEE INFOCOM 2009 Presenter: Chin-Chung Pan Date: 2009/12/09.
Efficient Packet Classification with Digest Caches Francis Chang Wu-chang Feng Wu-chi Feng Kang Li.
Vladimír Smotlacha CESNET Full Packet Monitoring Sensors: Hardware and Software Challenges.
Wire Speed Packet Classification Without TCAMs ACM SIGMETRICS 2007 Qunfeng Dong (University of Wisconsin-Madison) Suman Banerjee (University of Wisconsin-Madison)
ECE 526 – Network Processing Systems Design Packet Processing I: algorithms and data structures Chapter 5: D. E. Comer.
Fast Packet Classification Using Bloom filters Authors: Sarang Dharmapurikar, Haoyu Song, Jonathan Turner, and John Lockwood Publisher: ANCS 2006 Present:
Garo Bournoutian and Alex Orailoglu Proceedings of the 45th ACM/IEEE Design Automation Conference (DAC’08) June /10/28.
Optimal XOR Hashing for a Linearly Distributed Address Lookup in Computer Networks Christopher Martinez, Wei-Ming Lin, Parimal Patel The University of.
A Formal Analysis of Conservative Update Based Approximate Counting Gil Einziger and Roy Freidman Technion, Haifa.
1 A Throughput-Efficient Packet Classifier with n Bloom filters Authors: Heeyeol Yu and Rabi Mahapatra Publisher: IEEE GLOBECOM 2008 proceedings Present:
4/19/20021 TCPSplitter: A Reconfigurable Hardware Based TCP Flow Monitor David V. Schuehler.
Efficient Cache Structures of IP Routers to Provide Policy-Based Services Graduate School of Engineering Osaka City University
D 陳怡安 R 解巽評 R 高榮泰 IEEE/ACM TRANSACTIONS ON NETWORKING OCTOBER 2006 Cristian Estan, George Varghese, Member, IEEE, and Michael Fisk.
High-Speed Policy-Based Packet Forwarding Using Efficient Multi-dimensional Range Matching Lakshman and Stiliadis ACM SIGCOMM 98.
The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel.
Mining of Massive Datasets Ch4. Mining Data Streams
Queue Scheduling Disciplines
Jiahao Chen, Yuhui Deng, Zhan Huang 1 ICA3PP2015: The 15th International Conference on Algorithms and Architectures for Parallel Processing. zhangjiajie,
Operating Systems, Winter Semester 2011 Practical Session 9, Memory 1.
SketchVisor: Robust Network Measurement for Software Packet Processing
File-System Management
Memory: Page Table Structure
Author: Heeyeol Yu; Mahapatra, R.; Publisher: IEEE INFOCOM 2008
Frequency Counts over Data Streams
Module 11: File Structure
Updating SF-Tree Speaker: Ho Wai Shing.
Lower bounds for approximate membership dynamic data structures
The Variable-Increment Counting Bloom Filter
doc.: IEEE /xxx Matthew B. Shoemake, Ph.D.
Single-Packet IP Traceback
Chapter 8: Subnetting IP Networks
Optimal Elephant Flow Detection Presented by: Gil Einziger,
Bloom Filters Very fast set membership. Is x in S? False Positive
Lecture 29: Virtual Memory-Address Translation
Lecture 3: Main Memory.
Heavy Hitters in Streams and Sliding Windows
By: Ran Ben Basat, Technion, Israel
EMOMA- Exact Match in One Memory Access
Minwise Hashing and Efficient Search
Chapter-5 Traffic Engineering.
Hash Functions for Network Applications (II)
Lu Tang , Qun Huang, Patrick P. C. Lee
Presentation transcript:

A flow aware packet sampling mechanism for high speed links Marco Canini DIST University of Genoa Damien Fay Computer Laboratory University of Cambridge Andrew W. Moore Computer Laboratory University of Cambridge Raffaele Bolla DIST University of Genoa Work done while visiting the Cambridge University Computer Laboratory

Goal Enable for inline, real-time, application-centric traffic classification Based upon a fixed number of packets at the start of all flows Permit implementation at 10+ Gbit/s

Motivation Studies* show that start of flow information allows for accurate identification of traffic Traditional packet sampling and NetFlow don’t provide required information Heavy-tailed nature of Internet traffic allows discarding of significant amount of data without processing * For example: [1] W. Li and A. W. Moore. A Machine Learning Approach for Efficient Traffic Classication. In Proceedings of the IEEE MASCOTS, October 2007. [2] L. Bernaille, R. Teixeira, and K. Salamatian. Early Application Identication. In Proc. of ACM CoNEXT, December 2006.

Method Focus upon the sampling of a fixed number of packets at the start of all flows The scheme consists of two levels Per-packet level Within a time window of length W, an algorithm based on Bloom filters identifies and samples the first N packets from each flow that occur in that window Per-window level The memory allocation mechanism finds the parameters required in the sampling scheme

Bloom filter refresher Simple space-efficient probabilistic data structure for representing a set of elements Supports the insert() and query() operations False positives are possible False negatives are not possible Implemented with an array of m bits and uses k hash functions to map elements to [0,…,m-1]

Per-packet level Packet N+1 Packet N Packet 1 Packet 2 discard … Flow Id (IP tuple) Id1 true true true query() query() query() false false false insert() insert() insert() Id1 Id1 sample …

Per-packet level Packet 1 Packet N discard Id2 … Flow Id (IP tuple) true true true query() query() query() false false insert() insert() Id2 sample Packet 2 Packet N-1 …

False positives False positives introduce sampling errors Packets that should have been sampled get discarded Discarded are only ever at the end Reduce false positives by Resetting the filters periodically (at the end of each time window) Optimize the memory based upon the traffic

Theory 1 After n elements have been inserted, the probability that a specific bit is still 0 is The probability of a false positive is approximated with that is minimized for In this application we are interested in the number of false positives that occur as the Bloom filter is being filled. The expected number of false positives is where n is the expected number of elements that will be inserted within the current window And considering a bank of N Bloom filters

Theory 2 : Per-window level Aim to divide a block of memory M into N different portions optimally The allocation of mj is given by a system of linear equations However the number of elements that will be inserted in the filters, nj, are unknown and have to be estimated (in our case using an AR model)

Theory 3 : Window length Given an acceptable level of false positives the appropriate value of W can be estimated by use of a search algorithm as the number of false positives is a function of the window size

Theory Summary Memory limits number of samples and size of Bloom filters False positive rate allows near-ideal sizing of Bloom filters (for a given amount of memory) False positive rate is estimated via AR Window length sets rate at which filters are reset

Results Software implementation 12 hours trace from the edge of a research institute connected via a full-duplex 1 Gbit/s link

Results: false positives

Results: sampled packets

Results: sampled bytes

Sampled packets, varying M

Sampled bytes, varying M

Effects of false positives M Affected flows Unsampled packets Unsampled bytes 512KB 5529 (0.0021%) 517173 (0.0221%) 199167936 (0.0323%) 1024KB 4863 (0.0018%) 58007 (0.0024%) 22907748 (0.0037%) 1576KB 4274 (0.0016%) 15830 (0.0006%) 6535918 (0.0010%) 2048KB 4237 (0.0016%) 6086 (0.0003%) 2475934 (0.0004%)

Sampling plus flow classifier We used a pattern matching flow classifier derived from open source tool l7-filter Test data contains 2,642,841 flows With sampling (N = 10, W = 120, M = 512 KB) 2,636,549 flows are reported 6,292 (0.0024%) unreported flows 6,021 (0.0023%) flows had classifier error Total error is 12,323 (0.0047%) flows

Future Work FPGA based hardware implementation Flow sampling works Focus shifted to improving classification (identification) of applications