Fast, Memory-Efficient Traffic Estimation by Coincidence Counting Fang Hao 1, Murali Kodialam 1, T. V. Lakshman 1, Hui Zhang 2, 1 Bell Labs, Lucent Technologies.

Slides:



Advertisements
Similar presentations
Data Streaming Algorithms for Accurate and Efficient Measurement of Traffic and Flow Matrices Qi Zhao*, Abhishek Kumar*, Jia Wang + and Jun (Jim) Xu* *College.
Advertisements

Fast Algorithms For Hierarchical Range Histogram Constructions
3/13/2012Data Streams: Lecture 161 CS 410/510 Data Streams Lecture 16: Data-Stream Sampling: Basic Techniques and Results Kristin Tufte, David Maier.
A Fast and Compact Method for Unveiling Significant Patterns in High-Speed Networks Tian Bu 1, Jin Cao 1, Aiyou Chen 1, Patrick P. C. Lee 2 Bell Labs,
Estimating TCP Latency Approximately with Passive Measurements Sriharsha Gangam, Jaideep Chandrashekar, Ítalo Cunha, Jim Kurose.
Composite Subset Measures Lei Chen, Paul Barford, Bee-Chung Chen, Vinod Yegneswaran University of Wisconsin - Madison Raghu Ramakrishnan Yahoo! Research.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Sampling: Final and Initial Sample Size Determination
Reviewer: Jing Lu Gigabit Rate Packet Pattern- Matching Using TCAM Fang Yu, Randy H. Katz T. V. Lakshman UC Berkeley Bell Labs, Lucent ICNP’2004.
Multi-Variate Analysis of Mobility Models for Network Protocol Performance Evaluation Carey Williamson Nayden Markatchev
Sampling and Flow Measurement Eric Purpus 5/18/04.
Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L10.1 CorrelationCorrelation The underlying principle of correlation analysis.
Streaming Algorithms for Robust, Real- Time Detection of DDoS Attacks S. Ganguly, M. Garofalakis, R. Rastogi, K. Sabnani Krishan Sabnani Bell Labs Research.
Volcano Routing Scheme Routing in a Highly Dynamic Environment Yashar Ganjali Stanford University Joint work with: Nick McKeown SECON 2005, Santa Clara,
BCOR 1020 Business Statistics Lecture 17 – March 18, 2008.
On Appropriate Assumptions to Mine Data Streams: Analyses and Solutions Jing Gao† Wei Fan‡ Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM.
Detecting Network Intrusions via Sampling : A Game Theoretic Approach Presented By: Matt Vidal Murali Kodialam T.V. Lakshman July 22, 2003 Bell Labs, Lucent.
Communication-Efficient Distributed Monitoring of Thresholded Counts Ram Keralapura, UC-Davis Graham Cormode, Bell Labs Jai Ramamirtham, Bell Labs.
Modeling TCP in Small-Buffer Networks
Lecture 10 Comparison and Evaluation of Alternative System Designs.
Optimal Load-Balancing Isaac Keslassy (Technion, Israel), Cheng-Shang Chang (National Tsing Hua University, Taiwan), Nick McKeown (Stanford University,
1 Proportional differentiations provisioning Packet Scheduling & Buffer Management Yang Chen LANDER CSE Department SUNY at Buffalo.
Determining the Size of
Not All Microseconds are Equal: Fine-Grained Per-Flow Measurements with Reference Latency Interpolation Myungjin Lee †, Nick Duffield‡, Ramana Rao Kompella†
1. Homework #2 2. Inferential Statistics 3. Review for Exam.
A Machine Learning-based Approach for Estimating Available Bandwidth Ling-Jyh Chen 1, Cheng-Fu Chou 2 and Bo-Chun Wang 2 1 Academia Sinica 2 National Taiwan.
Estimation Basic Concepts & Estimation of Proportions
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
SIGCOMM 2002 New Directions in Traffic Measurement and Accounting Focusing on the Elephants, Ignoring the Mice Cristian Estan and George Varghese University.
Lecture 14 Dustin Lueker. 2  Inferential statistical methods provide predictions about characteristics of a population, based on information in a sample.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Scalable and Efficient Data Streaming Algorithms for Detecting Common Content in Internet Traffic Minho Sung Networking & Telecommunications Group College.
New Streaming Algorithms for Fast Detection of Superspreaders Shobha Venkataraman* Joint work with: Dawn Song*, Phillip Gibbons ¶,
CEDAR Counter-Estimation Decoupling for Approximate Rates Erez Tsidon (Technion, Israel) Joint work with Iddo Hanniel and Isaac Keslassy ( Technion ) 1.
CEDAR Counter-Estimation Decoupling for Approximate Rates Erez Tsidon Joint work with Iddo Hanniel and Isaac Keslassy Technion, Israel 1.
Author: Haoyu Song, Fang Hao, Murali Kodialam, T.V. Lakshman Publisher: IEEE INFOCOM 2009 Presenter: Chin-Chung Pan Date: 2009/12/09.
© 2010 AT&T Intellectual Property. All rights reserved. AT&T, the AT&T logo and all other AT&T marks contained herein are trademarks of AT&T Intellectual.
Hung X. Nguyen and Matthew Roughan The University of Adelaide, Australia SAIL: Statistically Accurate Internet Loss Measurements.
Chapter 7 Estimation Procedures. Basic Logic  In estimation procedures, statistics calculated from random samples are used to estimate the value of population.
I. Statistical Tests: A Repetive Review A.Why do we use them? Namely: we need to make inferences from incomplete information or uncertainty þBut we want.
End-biased Samples for Join Cardinality Estimation Cristian Estan, Jeffrey F. Naughton Computer Sciences Department University of Wisconsin-Madison.
Challenges and Opportunities Posed by Power Laws in Network Analysis Bruno Ribeiro UMass Amherst MURI REVIEW MEETING Berkeley, 26 th Oct 2011.
Determination of Sample Size: A Review of Statistical Theory
Population and Sample The entire group of individuals that we want information about is called population. A sample is a part of the population that we.
Confidence Intervals Lecture 3. Confidence Intervals for the Population Mean (or percentage) For studies with large samples, “approximately 95% of the.
11/18/2015 IENG 486 Statistical Quality & Process Control 1 IENG Lecture 07 Comparison of Location (Means)
Trajectory Sampling for Direct Traffic Oberservation N.G. Duffield and Matthias Grossglauser IEEE/ACM Transactions on Networking, Vol. 9, No. 3 June 2001.
1 Network Measurements and Sampling Nick Duffield, Carsten Lund, and Mikkel Thorup AT&T Labs-Research, Florham Park, NJ.
Replicating Memory Behavior for Performance Skeletons Aditya Toomula PC-Doctor Inc. Reno, NV Jaspal Subhlok University of Houston Houston, TX By.
BARD / April BARD: Bayesian-Assisted Resource Discovery Fred Stann (USC/ISI) Joint Work With John Heidemann (USC/ISI) April 9, 2004.
Chapter 7 Statistical Inference: Estimating a Population Mean.
D 陳怡安 R 解巽評 R 高榮泰 IEEE/ACM TRANSACTIONS ON NETWORKING OCTOBER 2006 Cristian Estan, George Varghese, Member, IEEE, and Michael Fisk.
Author: Haoyu Song, Murali Kodialam, Fang Hao and T.V. Lakshman Publisher/Conf. : IEEE International Conference on Network Protocols (ICNP), 2009 Speaker:
Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks Authors: Pegna, J.M., Lozano, J.A., Larragnaga, P., and Inza, I. In.
1 Sampling distributions The probability distribution of a statistic is called a sampling distribution. : the sampling distribution of the mean.
Development of a QoE Model Himadeepa Karlapudi 03/07/03.
Network-based and Attack-resilient Length Signature Generation for Zero-day Polymorphic Worms Zhichun Li 1, Lanjia Wang 2, Yan Chen 1 and Judy Fu 3 1 Lab.
REU 2009-Traffic Analysis of IP Networks Daniel S. Allen, Mentor: Dr. Rahul Tripathi Department of Computer Science & Engineering Data Streams Data streams.
Incrementally Improving Lookup Latency in Distributed Hash Table Systems Hui Zhang 1, Ashish Goel 2, Ramesh Govindan 1 1 University of Southern California.
SketchVisor: Robust Network Measurement for Software Packet Processing
On-line Detection of Real Time Multimedia Traffic
Basic Estimation Techniques
Empirically Characterizing the Buffer Behaviour of Real Devices
The Variable-Increment Counting Bloom Filter
Basic Estimation Techniques
I. Statistical Tests: Why do we use them? What do they involve?
Kun-chan Lan National ICT Australia John Heidemann USC/ISI
1. Homework #2 (not on posted slides) 2. Inferential Statistics 3
By: Ran Ben Basat, Technion, Israel
Lu Tang , Qun Huang, Patrick P. C. Lee
Presentation transcript:

Fast, Memory-Efficient Traffic Estimation by Coincidence Counting Fang Hao 1, Murali Kodialam 1, T. V. Lakshman 1, Hui Zhang 2, 1 Bell Labs, Lucent Technologies 2 University of Southern California

2 Traffic flow measurement  Related work: Sample and hold [Estan et al. 2002], Smart Sampling [Duffield et al. 2004], RATE [Kodialam et al. 2004], ACCEL-RATE[Hao et al. 2004], etc.  Flow rate p f : the proportion of packets belonging to flow f during a certain time period. Arrival rate r f = p f  : - total arrival rate (packets / second) router packets flows Flow rate statistics network link

3 Problem definition What’s the traffic flow composition through a network link during a certain time period, given an estimation accuracy requirement ( ,  )?   - confidence percentile;   - estimation error. For any given flow f with its rate be p f, determine an estimate p’ f for p f such that p’ f  (p f -  /2, p f +  /2) with probability greater than .  e.g.,  = ,  =

4 One solution – naive counting Directly counting the number of packets for each flow.  It’s simple;  It estimates the rate of the flows rapidly;  But it requires frequent access of a large amount of high-speed memory. There can be millions of active flows on backbone links [Duffield et al. 2001]

5 Arrival model 1.Flow rates are stationary during the estimation period.  An arriving packet belongs to flow f with probability p f. 2.Packet arrivals are independent.  An arriving packet belongs to flow f independent of other arrivals.

6 Performance metrics Sample size (estimation time): the number of arrivals needed to achieve the specified estimation accuracy. Memory size: the number of flows that are tracked during the estimation.

7 Can we … Can we design a scheme which runs as fast as naïve counting but only catches “interesting” flows? By counting the exact number of arrivals for each flow, naïve accounting requires minimally arrivals to meet the accuracy requirement ( ,  ).  This will capture at least one packet in expectation for any flow f with p f  (Z  =3,  = , ).

8 Problem re-definition What’s the traffic flow composition through a network link during a certain time period given an estimation accuracy requirement ( , , ,  )?   - confidence percentile;  - estimation error;   - threshold rate;  - error relaxation factor. For any given flow f with its rate be p f, determine an estimate p’ f for p f such that p’ f  (p f -  /2, p f +  /2) if p f   p’ f  (p f -  /2, p f +  /2) if p f   with probability greater than .  e.g.,  = (Z  =3),  = ,  = 0.01,  = 10.

9 Our solution – CATE Coincidence bAsed Traffic Estimation  It’s simple;  It estimates the size of the flows rapidly;  It requires a small amount of memory. A generalization of RATE scheme [ Kodialam et al ]

10 CATE – scheme description (I) k-length

11 CATE – scheme description (II) Estimation procedure 1.Specify the estimation accuracy requirement ( , , ,  ); 2.Calculate sample size N, memory size M, and k (the length of PT); 3.Run and count the coincidences CC(f) for each flow f; 4.At the end of the estimation, output the estimated flow rates as a)For each flow f in CCT, p’(f) = ; b)For the rest of the flows, report 0 as the estimated rates.

12 Intuition behind CATE Counting coincidence dramatically amplifies (squares) the ratio of catching probability between a large flow and a small flow.  Good news: CATE sample size is still for the estimation accuracy requirement ( , , ,  ). Multiple (i.e., k) comparisons increase the number of membership testing to kN with N arrivals.  Good news: the reduction on estimation variance due to increase in testing number is no less than the increase on estimation variance due to comparison correlation.

13 Given the accuracy requirement ( ,  ) and k-length predecessor table for CATE,  The minimal sample size CATE – sample size  Specifically, for a flow f with p f , the minimal sample size

14 Given an estimation accuracy requirement ( , , ,  ), if then setting gives the sample size = Theorem 1 – CATE sample size  = 99.9 %  =  = 0.01   = 10  k = 50  =   = 20  k = 500 …

15 Given the estimation setting in Theorem 1, the maximum expected memory Theorem 2 – CATE memory size Totally 100,000 flows, rate range [10 -6, 1] Z  = 3  = CATE will catch no more than 1650 flows. Naïve counting will record all flows w.h.p.

16 CATE – experiment 1 Real IP traces from NLANR; A size-1000 buffer between new arrival and PT. (The memory size = 547)(The memory size = 1464)

17 CATE – experiment 2 (I) methodology  Totally 1 million flows, synthetic traces;  5 “large” flows, rates uniformly distributed between 0.1 and 0.2 of the entire traffic;  1000 “medium-sized” flows, rates uniformly distributed between and of the entire traffic;  All the rest are “small” flows, each with rate roughly ;  Deliberately chose a short sample time (1 million packet time) to illustrate the impact of k, the predecessor table length.

18 CATE – experiment 2 (II)

19 CATE – experiment 2 (III)

20 Conclusion & future work CATE, a memory efficient traffic estimation scheme as fast as naïve counting. Future work:  Extending CATE for byte rate estimation.  Extending CATE to minimize the impact of arrival dependence without (excessive) additional overhead.