Interesting Links.

Interesting Links

On the Self-Similar Nature of Ethernet Traffic Will E
On the Self-Similar Nature of Ethernet Traffic Will E. Leland, Walter Willinger and Daniel V. Wilson BELLCORE Murad S. Taqqu BU Analysis and Prediction of the Dynamic Behavior of Applications, Hosts, and Networks

Overview What is Self Similarity? Ethernet Traffic is Self-Similar
Source of Self Similarity Implications of Self Similarity

Section 1: What is Self-Similarity ?

Intuition of Self-Similarity
Something “feels the same” regardless of scale (also called fractals)

Exact self similarity The self similarity may be exact, this normally only occurs in mathematically defined fractals where the realities/constraints on structures by the physical world don't apply. The following example is the well known Koch snowflake curve created by starting with a single line segment and on each iteration replacing each line segment by four others shaped as follows . As one successively zooms in the resulting shape is exactly the same no matter how far in the zoom is applied.

Approximate self similarity
A far more common type of self similarity is an approximate one, that is, as one looks at the object at different scales one sees structures that are recognisably similar but not exactly so. An example of this in a mathematically defined system can be readily demonstrated by almost all the patterns seen in the Mandelbrot set. The following show three successive zooms and at each level a structure similar but not exactly the same as the whole Mandelbrot set can be found. In the case of the self similarity found in the fern, not only is there a limit to the range of scales at which the self similarity occurs but it also occurs at only at a few discrete scales.

Statistical self similarity Sometimes the self similarity is isn't visually obvious but there may be numerical or statistical measures that are preserved across scales. One obvious measure might be the fractal dimension, in the example below of 1/f noise. the fractal dimension is constant as one zooms in.

What is Self-Similarity?
In case of stochastic objects like time-series, self-similarity is used in the distributional sense

Pictorial View of Self-Similarity

The Famous Data Leland and Wilson collected hundreds of millions of Ethernet packets without loss and with recorded time-stamps accurate to within 100µs. Data collected from several Ethernet LAN’s at the Bellcore Morristown Research and Engineering Center at different times over the course of approximately 4 years.

Why is Self-Similarity Important?
Recently, network packet traffic has been identified as being self-similar. Current network traffic modeling using Poisson distributing (etc.) does not take into account the self-similar nature of traffic. This leads to inaccurate modeling which, when applied to a huge network like the Internet, can lead to huge financial losses.

Problems with Current Models
A Poisson process When observed on a fine time scale will appear bursty When aggregated on a coarse time scale will flatten (smooth) to white noise A Self-Similar (fractal) process When aggregated over wide range of time scales will maintain its bursty characteristic

Consequences of Self-Similarity
Traffic has similar statistical properties at a range of timescales: ms, secs, mins, hrs, days Merging of traffic (as in a statistical multiplexer) does not result in smoothing of traffic Bursty Data Streams Aggregation Bursty Aggregate Streams

Pictorial View of Current Modeling

Side-by-side View

Definitions and Properties
Long-range Dependence autocorrelation decays slowly Hurst Parameter Developed by Harold Hurst (1965) H is a measure of “burstiness” also considered a measure of self-similarity 0 < H < 1 H increases as traffic increases

Definitions and Properties Cont.’d
low, medium, and high traffic hours as traffic increases, the Hurst parameter increases i.e., traffic becomes more self-similar

Properties of Self Similarity
X = (Xt : t = 0, 1, 2, ….) is covariance stationary random process (i.e. Cov(Xt,Xt+k) does not depend on t for all k) Let X(m)={Xk(m)} denote the new process obtained by averaging the original series X in non-overlapping sub-blocks of size m. Mean , variance 2 Suppose that Autocorrelation Function r(k)  k-β, 0<β<1 E.g. X(1)= 4,12,34,2,-6,18,21,35 Then X(2)=8,18,6,28 X(4)=13,17

Auto-correlation Definition
X is exactly second-order self-similar if The aggregated processes have the same autocorrelation structure as X. i.e. r (m) (k) = r(k), k0 for all m =1,2, … X is [asymptotically] second-order self-similar if the above holds when [ r (m) (k)  r(k), m  ] Most striking feature of self-similarity: Correlation structures of the aggregated process do not degenerate as m  

Traditional Models This is in contrast to traditional models
Correlation structures of their aggregated processes degenerate as m   i.e. r (m) (k)  0 as m  , for k = 1,2,3,... Example: Poisson Distribution Self-Similar Distribution

Long Range Dependence Processes with Long Range Dependence are characterized by an autocorrelation function that decays hyperbolically as k increases Important Property: This is also called non-summability of correlation

Intuition Short-range processes:
Exponential Decay of autocorrelations , i.e.: r(k) ~ pk , as k  , 0 < p < 1 Summation is finite The intuition behind long-range dependence: While high-lag correlations are all individually small, their cumulative affect is important Gives rise to features drastically different from conventional short-range dependent processes

The Measure of Self-Similarity
Hurst Parameter H , 0.5 < H < 1 Three approaches to estimate H (Based on properties of self-similar processes) Variance Analysis of aggregated processes Analysis of Rescaled Range (R/S) statistic for different block sizes A Whittle Estimator Surprisingly ,only a single parameter sufficiently describes the degree of self-similarity in the data

Variance Analysis Plot Var(X(m)) against m on a log-log plot
Variance of aggregated processes decays as: Var(X(m)) = am-b as m inf, For short range dependent processes (e.g. Poisson Process), Var(X(m)) = am-1 as m inf, Plot Var(X(m)) against m on a log-log plot Slope > -1 indicative of self-similarity Very important property of self-similar processes is that the variance of aggregated processes decays as:

The R/S statistic For a given set of observations,
Rescaled Adjusted Range or R/S statistic is given by where

Example Xk = 14,1,3,5,10,3 Mean = 36/6 = 6 W1 =14-(1.6 )=8
R/S = 1/S*[8-(-1)] = 9/S

The Hurst Effect For self-similar data, rescaled range or R/S statistic grows according to cnH H = Hurst Paramater, > 0.5 For short-range processes , R/S statistic ~ dn0.5 History: The Nile river In the ’s, Harold Edwin Hurst studies the 800-year record of flooding along the Nile river. (yearly minimum water level) Finds long-range dependence.

Whittle Estimator Provides a confidence interval
Property: Any long range dependent process approaches FGN, when aggregated to a certain level Test the aggregated observations to ensure that it has converged to the normal distribution

Recap Self-similarity manifests itself in several equivalent fashions:
Non-degenerate autocorrelations Slowly decaying variance Long range dependence Hurst effect

Section 2: Ethernet Traffic is Self-Similar

Plots Showing Self-Similarity (Ⅰ)
Estimate H  0.8

Plots Showing Self-Similarity (Ⅱ)
High Traffic 5.0%-30.7% Mid Traffic 3.4%-18.4% Low Traffic 1.3%-10.4% Higher Traffic, Higher H

H : A Function of Network Utilization
Observation shows “contrary to Poisson” Network Utilization H As we shall see shortly, H measures traffic burstiness As number of Ethernet users increases, the resulting aggregate traffic becomes burstier instead of smoother

Difference in low traffic H values
Pre-1990: host-to-host workgroup traffic Post-1990: Router-to-router traffic Low period router-to-router traffic consists mostly of machine-generated packets Tend to form a smoother arrival stream, than low period host-to-host traffic

H : Measuring “Burstiness”
Intuitive explanation using M/G/ Model As α 1, service time is more variable, easier to generate burst Increasing H !

Summary Ethernet LAN traffic is statistically self-similar
H : the degree of self-similarity H : a function of utilization H : a measure of “burstiness” Models like Poisson are not able to capture self-similarity

Discussions How to explain self-similarity ?
Heavy tailed file sizes How this would impact existing performance? Limited effectiveness of buffering Effectiveness of FEC How to adapt to self-similarity? Prediction Adaptive FEC

Section 3: Explaining Self - Similarity

Introduction

Introduction The superposition of many ON/OFF sources whose ON-periods and OFF-periods exhibit the Noah Effect produces aggregate network traffic that features the Joseph Effect. Noah Effect: high variability or infinite variance Joseph Effect: self-similar or long-range dependent Also known as packet train models

The Noah Effect Noah Effect is the essential point of departure from traditional to self-similar traffic modeling Results in highly variable ON-OFF periods : Train length and inter-train distances can be very large with non-negligible probabilities Infinite Variance Syndrome : Many naturally occurring phenomenon can be well described with infinite variance distributions Heavy-tail distributions,  parameter

Existing Models Traditional traffic models: finite variance ON/OFF source models Superposition of such sources behaves like white noise, with only short range correlations

Idealized ON/OFF Model
Lengths of ON- and OFF periods are iid positive random variables, Uk Suppose that U has a hyperbolic tail distribution, Property (1) is the infinite variance syndrome or the Noah Effect.   2 implies E(U2) =   > 1 ensures that E(U) < , and that S0 is not infinite

Explaining Self-Similarity
Consider a set of processes which are either ON or OFF The distribution of ON and OFF times are heavy tailed (a1, a2) The aggregation of these processes leads to a self-similar process H = (3 - min (a1, a2))/2 So, how do we get heavy tailed ON or OFF times?

Heavy Tailed ON Times and File Sizes
Analysis of client logs showed that ON times were, in fact, heavy tailed a ~ 1.2 Over about 3 orders of magnitude This lead to the analysis of underlying file sizes a ~ 1.1 Over about 4 orders of magnitude Similar to FTP traffic Files available from UNIX file systems are typically heavy tailed

Heavy Tailed OFF times Analysis of OFF times showed that they are also heavy tailed a ~ 1.5

Ethernet LAN Traffic Measurements at the Source Level
Location Bellcore Morristown Research and Engineering Center The first set The busy hour of the August 1989 Ethernet LAN measurements About 105 sources, 748 active source-destination pairs 95% of the traffic was internal The second set 9 day-long measurement period in December 1994 About 3,500 sources, 10,000 active pairs Measurements are made up entirely of remote traffic

Textured Plots of Packet Arrival Times

Checking for the Noah Effect
Complementary distribution plots Hill’s estimate Let U1, U2,…, Un denote the observed ON-(or OFF-)periods and write U(1)  U(2) …U(n) for the corresponding order statistics

Traffic Modeling and Generation
Although network traffic is intrinsically complex, parsimonious modeling is still possible. Estimating a single parameter  (intensity of the Noah Effect) is enough.

Performance and Protocol Analysis
The queue length distribution Traditional (Markovian) traffic: decreases exponentially fast Self-similar traffic: decreases much more slowly Protocol design should be expected to take into account knowledge about network traffic such as the presence or absence of the Noah Effect.

Conclusion The presence of the Noah Effect in measured Ethernet LAN traffic is confirmed. The superposition of many ON/OFF models with Noah Effect results in aggregate packet streams that are consistent with measured network traffic, and exhibits the self-similar or fractal properties.

Major Results of CB97 Established that WWW traffic was self-similar
Modeled a number of different WWW characteristics (focus on the tail) Provide an explanation for self-similarity of WWW traffic based on underlying file size distribution

An example File size Distribution on a Win2000 machine

Section 4: Impact of Self Similarity

Comparison

Impact on Network Engineering
Queuing delays are much higher in the presence of long range dependence than for Poisson traffic To avoid dropping packets, buffers have to be huge You have to be very careful predicting future traffic based past measurement You cannot look at a little bit of video and decide how much buffer it’s going to require

Thanks !

Interesting Links.

Similar presentations

Presentation on theme: "Interesting Links."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Interesting Links.

Similar presentations

Presentation on theme: "Interesting Links."— Presentation transcript:

Similar presentations

About project

Feedback