On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena Walter Willinger AT&T Labs-Research [This is joint.

Slides:



Advertisements
Similar presentations
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Advertisements

1 Methods of Experimental Particle Physics Alexei Safonov Lecture #21.
10 Further Time Series OLS Issues Chapter 10 covered OLS properties for finite (small) sample time series data -If our Chapter 10 assumptions fail, we.
Probability distribution functions Normal distribution Lognormal distribution Mean, median and mode Tails Extreme value distributions.
Model Fitting Jean-Yves Le Boudec 0. Contents 1 Virus Infection Data We would like to capture the growth of infected hosts (explanatory model) An exponential.
Statistical Inference Chapter 12/13. COMP 5340/6340 Statistical Inference2 Statistical Inference Given a sample of observations from a population, the.
Statistics & Modeling By Yan Gao. Terms of measured data Terms used in describing data –For example: “mean of a dataset” –An objectively measurable quantity.
1 Engineering Computation Part 6. 2 Probability density function.
Time Series Basics Fin250f: Lecture 3.1 Fall 2005 Reading: Taylor, chapter
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
A gentle introduction to Gaussian distribution. Review Random variable Coin flip experiment X = 0X = 1 X: Random variable.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Statistical inference
Probability By Zhichun Li.
Presenting: Assaf Tzabari
Performance Evaluation
Review of Probability and Random Processes
Probability Distributions and Frequentist Statistics “A single death is a tragedy, a million deaths is a statistic” Joseph Stalin.
1 Dynamic Models for File Sizes and Double Pareto Distributions Michael Mitzenmacher Harvard University.
Statistical Theory; Why is the Gaussian Distribution so popular? Rob Nicholls MRC LMB Statistics Course 2014.
Chapter 21 Random Variables Discrete: Bernoulli, Binomial, Geometric, Poisson Continuous: Uniform, Exponential, Gamma, Normal Expectation & Variance, Joint.
Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;
Elec471 Embedded Computer Systems Chapter 4, Probability and Statistics By Prof. Tim Johnson, PE Wentworth Institute of Technology Boston, MA Theory and.
Review of Probability.
Review of Probability.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
1 Chapters 9 Self-SimilarTraffic. Chapter 9 – Self-Similar Traffic 2 Introduction- Motivation Validity of the queuing models we have studied depends on.
Statistical Methods For Engineers ChE 477 (UO Lab) Larry Baxter & Stan Harding Brigham Young University.
Probability distribution functions
Distributions of Randomized Backtrack Search Key Properties: I Erratic behavior of mean II Distributions have “heavy tails”.
Standard Statistical Distributions Most elementary statistical books provide a survey of commonly used statistical distributions. The reason we study these.
Traffic Modeling.
Random Sampling, Point Estimation and Maximum Likelihood.
Models and Algorithms for Complex Networks Power laws and generative processes.
Theory of Probability Statistics for Business and Economics.
Lecture 15: Statistics and Their Distributions, Central Limit Theorem
ENGR 610 Applied Statistics Fall Week 3 Marshall University CITE Jack Smith.
On Internet Traffic Dynamics and Internet Topology II Internet Model Validation Walter Willinger AT&T Labs-Research
Fundamentals of Data Analysis Lecture 3 Basics of statistics.
ELEC 303 – Random Signals Lecture 18 – Classical Statistical Inference, Dr. Farinaz Koushanfar ECE Dept., Rice University Nov 4, 2010.
Lecture V Probability theory. Lecture questions Classical definition of probability Frequency probability Discrete variable and probability distribution.
Stats Probability Theory Summary. The sample Space, S The sample space, S, for a random phenomena is the set of all possible outcomes.
Random Variables (1) A random variable (also known as a stochastic variable), x, is a quantity such as strength, size, or weight, that depends upon a.
Random Variable The outcome of an experiment need not be a number, for example, the outcome when a coin is tossed can be 'heads' or 'tails'. However, we.
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Probability distributions
CLASSICAL NORMAL LINEAR REGRESSION MODEL (CNLRM )
Risk Analysis Workshop April 14, 2004 HT, LRD and MF in teletraffic1 Heavy tails, long memory and multifractals in teletraffic modelling István Maricza.
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
Stats Term Test 4 Solutions. c) d) An alternative solution is to use the probability mass function and.
1 Review of Probability and Random Processes. 2 Importance of Random Processes Random variables and processes talk about quantities and signals which.
Probability and Statistics in Environmental Modeling.
Fundamentals of Data Analysis Lecture 3 Basics of statistics.
Selecting Input Probability Distributions. 2 Introduction Part of modeling—what input probability distributions to use as input to simulation for: –Interarrival.
CLT and Levy Stable Processes John Rundle Econophysics PHYS 250
Probability Distributions
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005
Oliver Schulte Machine Learning 726
STATISTICS POINT ESTIMATION
Probability Theory and Parameter Estimation I
Probability.
Chapter 7: Sampling Distributions
Review of Probability Theory
MEGN 537 – Probabilistic Biomechanics Ch.3 – Quantifying Uncertainty
Hydrologic Statistics
STOCHASTIC HYDROLOGY Random Processes
Statistics & Flood Frequency Chapter 3
Continuous Statistical Distributions: A Practical Guide for Detection, Description and Sense Making Unit 3.
Applied Statistics and Probability for Engineers
Presentation transcript:

On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena Walter Willinger AT&T Labs-Research [This is joint work with J. Doyle and D. Alderson (Caltech)]

Topics Covered  Motivation  A working definition  Some illustrative examples  Some simple constructions  Some key mathematical results  Resilience to ambiguity  Heavy tails and statistics  A word of caution

Motivation  Internet is full of “high variability” –Link bandwidth: Kbps – Gbps –File sizes: a few bytes – Mega/Gigabytes –Flows: a few packets – 100,000+ packets –In/out-degree (Web graph): 1 – 100,000+ –Delay: Milliseconds – seconds and beyond  How to deal with “high variability” –High variability = large, but finite variance –High variability = infinite variance

A Working Definition  A distribution function F(x) or random variable X is called heavy-tailed if for some  where c>0 and finite  F is also called a power law or scaling distribution  The parameter  is called the tail index  1<  < 2, F has infinite variance, but finite mean  0 <  < 1, the variance and mean of F are infinite  “Mild” vs “wild” (Mandelbrot):   2 vs  < 2

Some Illustrative Examples  Some commonly-used plotting techniques –Probability density functions (pdf) –Cumulative distribution functions (CDF) –Complementary CDF (CCDF)  Different plots emphasize different features –Main body of the distribution vs. tail –Variability vs. concentration –Uni- vs. multi-modal

Probability density functions

Cumulative Distribution Function

Complementary CDFs

20 th Century’s 100 largest disasters worldwide Log(size) Log(rank) Most events are small But the large events are huge

Why “Heavy Tails” Matter …  Risk modeling (insurance)  Load balancing (CPU, network)  Job scheduling (Web server design)  Combinatorial search (Restart methods)  Complex systems studies (SOC vs. HOT)  Towards a theory for the Internet …

Some First Properties  Heavy-tailed or “scaling” distribution – –Compare to exponential distribution:  Linearly increasing mean residual lifetime – –Compare to exponential distribution

Some Simple Constructions  For U uniform in [0,1], set X=1/U –X is heavy-tailed with   For E (standard) exponential, set X=exp(E) –X is heavy-tailed with   The mixture of exponential distributions with parameter 1/  having a (centered) Gamma(a,b) distribution is a Pareto distribution with  a  The distribution of the time between consecutive visits of a symmetric random walk to zero is heavy-tailed with 

Key Mathematical Properties of Scaling Distributions  Invariant under aggregation –Non-classical CLT and stable laws  (Essentially) invariant under maximization –Domain of attraction of Frechet distribution  (Essentially) invariant under mixture –Example: The largest disasters worldwide

Linear Aggregation: Classical Central Limit Theorem  A well-known result –X(1), X(2), … independent and identically distributed random variables with distribution function F (mean  and variance 1) –S(n) = X(1)+X(2)+…+X(n) n-th partial sum –  More general formulations are possible  Often-used argument for the ubiquity of the normal distribution

Linear Aggregation: Non-classical Central Limit Theorem  A not so well-known result –X(1), X(2), … independent and identically distributed with common distribution function F that is heavy- tailed with 1<  <2 –S(n) = X(1)+X(2)+…+X(n) n-th partial sum –  The limit distribution is heavy-tailed with index   More general formulations are possible  Rarely taught in most Stats/Probability courses!

Maximization: Maximum Domain of Attraction  A not so well-known result (extreme-value theory) –X(1), X(2), … independent and identically distributed with common distribution function F that is heavy- tailed with 1<  <2 –M(n)=max(X(1), …, X(n)), n-th successive maxima –  G is the Fréchet distribution  G is heavy-tailed with index 

Intuition for “Mild” vs. “Wild”  The case of “mild” distributions –“Evenness” – large values of S(n) occur as a consequence of many of the X(i)’s being large –The contribution of each X(i), even of the largest, is negligible compared to the sum  The case of “wild” distributions –“Concentration” – large values of S(n) or M(n) occur as a consequence of a single large X(i) –The largest X(i) is dominant compared to S(n)

Weighted Mixture  A little known result –X(1), X(2), … independent and identically distributed with common distribution function F that is heavy- tailed with 1<  <2 –p(1), p(2), …, p(n) iid in [0,1] with p(1)+…p(n)=1 –W(n)=p(1)X(1)+…+p(n)X(n) –  Invariant “distributions” are  Condition on X(i) >a>0

th Century’s 100 largest disasters worldwide US Power outages (10M of customers, ) Natural ($100B) Technological ($10B) Slope = -1 (  =1)

8/14/ US Power outages (10M of customers, ) Slope = -1 (  =1) A large event is not inconsistent with statistics.

Resilience to Ambiguity  Scaling distributions are robust under –… aggregation, maximization, and mixture –… differences in observing/reporting/accounting –… varying environments, time periods  The “value” of robustness –Discoveries are easier/faster –Properties can be established more accurately –Findings are not sensitive to the details of the data gathering process

On the Ubiquity of Heavy Tails  Heavy-tailed distributions are attractors for averaging (e.g., non-classical CLT), but are the only distributions that are also (essentially) invariant under maximizing and mixing.  Gaussians (“normal”) distributions are also attractors for averaging (e.g., classical CLT), but are not invariant under maximizing and mixing  This makes heavy tails more ubiquitous than Gaussians, so no “special” explanations should be required …

Heavy Tails and Statistics  The traditional “curve-fitting” approach  “Curve-fitting” by example  Beyond “curve-fitting” – “Borrowing strength”  “Borrowing strength” by example  What “science” in “scientific modeling”?  Additional considerations

“Curve-fitting” approach  Model selection –Choose parametric family of distributions  Parameter estimation –Take a strictly static view of data –Assume moment estimates exist/converge  Model validation –Select “best-fitting” model –Rely on some “goodness-of-fit” criteria/metrics  “Black box-type” modeling, “data-fitting” exercise

“Curve-fitting” by example  Randomly picked data set –LBL’s WAN traffic (in- and outbound) –1:30, June 24 – 1:30, June 25 (PDT), 1996 –243,092 HTTP connection sizes (bytes) –Courtesy of Vern Paxson (thanks!)  Illustration of widely-accepted approach

CCDF plot on log-log scale HTTP Connections (Sizes) log10(x) log10(1-F(x))

Fitted 2-parameter Lognormal distribution (  =6.75,  =2.05) HTTP Connections (Sizes) log10(x) log10(1-F(x))

Fitted 2-parameter Pareto distribution (  =1.27, m=2000) HTTP Connections (Sizes) log10(x) log10(1-F(x))

The “truth” about “curve-fitting”  Highly predictable outcome –Always doable, no surprises Cause for endless discussions (“Which model is better?”)  When “more” means “better” … –2-parameter distributions (Pareto, Lognormal, …) 3-parameter distributions (Weibull, Gamma, …) –5-parameter distribution (Double-Pareto, -Lognormal, …), etc.  Inadequate “goodness-of-fit” criteria due to –Voluminous data sets Data with strong dependencies (LRD) –Data with high variability (heavy tails) »Data with non-stationary features

“Borrowing strength” approach  Mandelbrot & Tukey to the rescue –Sequential moment plots (Mandelbrot) –Borrowing strength from large data (Tukey)  “Borrowing strength” – dynamic view of data –Rely on traditional approach for initial (small) subset of available data –Consider successively larger subsets –Look out for inherently consistent models –Identify “patchwork “ of “fixes”

Lognormal? HTTP Connections (Sizes) log10(x) log10(1-F(x))

Pareto? HTTP Connections (Sizes) log10(x) log10(1-F(x))

“Borrowing strength” (example 1)  Use same data set as before  Illustration of Mandelbrot-Tukey approach (1) –Sequential standard deviation plots –Lack of robustness –A case against Lognormal distributions  More on sequential standard deviation plots  Scaling distributions to the rescue

Fitting lognormal: n=20,000

Fitting lognormal: n=40,000

Fitting lognormal: n=80,000

Fitting lognormal: n=160,000

Fitting lognormal: All data

The case against lognormal  The lognormal model assumes –existence/convergence of 2 nd moment –parameter estimates are inherently consistent  However, sequential std plot indicates –non-existence/divergence of 2 nd moment –inherently inconsistent parameter estimates  What “science” in “scientific modeling”? –Curve/data fitting is not “science” –“Patchwork” of “fixes” (Mandelbrot)

Randomizing observations

Matching “mild” distributions

Lognormal or scaling distribution

Pareto? HTTP Connections (Sizes) log10(x) log10(1-F(x))

“Borrowing strength” (example 2)  Use same data set as before  Illustration of Mandelbrot-Tukey approach (2) –Sequential tail index plots –Strong robustness properties –A case for scaling distributions  A requirement for future empirical studies

Fitting Pareto: n=20,000

Fitting Pareto: n=40,000

Fitting Pareto: n=80,000

Fitting Pareto: n=160,000

Fitting Pareto: All data

The case for scaling distributions  The “creativity” of scaling distributions –Data: Divergent sequential moment plots –Mathematics: Allow for infinite moments  Re-discover the “science” in “scientific modeling” –Scientifically “economical” modeling (when more data doesn’t mean more parameters) –Statistically “efficient” modeling (when more data mean more model accuracy/confidence) –Trading “goodness-of-fit” for “robustness”

Looking ahead …  Main objective of current empirical studies “The observations are consistent (in the sense of “curve fitting”) with model/distribution X, but are not consistent with model/distribution Y.”  Requirement for future empirical studies “The observations are consistent (in the sense of “borrowing strength”) with model/distribution X, and X is not sensitive to the methods of measuring and collecting the observations.”

Some Words of Caution …  Not every “straight-looking” log-log plot means “heavy tails”!  Never use frequency plots to infer heavy tails – even though physicists do it all the time!

Straight-looking log-log plot?

No! 25 samples from Exp(50) – linear semi-log plot!

 =1  = log 10 (x) log 10 (p) Slope = -(  +1) NEVER infer  from f requency plots on log-log scale!

log 10 (x) log 10 (p)

 =1  = log 10 (x) log 10 (P) ALWAYS infer  from CCDF plots on log-log scale!

A Word of Wisdom … In my view, even if an accumulation of quick “fixes”were to yield an adequately fitting “patchwork”, it would bring no understanding. – B.B. Mandelbrot, 1997