Trace-based Network Bandwidth Analysis and Prediction Yi QIAO 06/10/2002.

Trace-based Network Bandwidth Analysis and Prediction Yi QIAO 06/10/2002

OUTLINE 1.Introduction 2.Data Collection and Transformation 3.Basic Statistical Analysis of Bandwidth 4.Trace Classification 5.Bandwidth Prediction 6.Conclusion

1.Introduction Fact: Network bandwidth is one of the most important characteristics for both WANs and LANs We want to know: What does bandwidth time series looks like? Are there any correlations between bandwidth at different times? Do bandwidth from different traces share any common properties? Is network bandwidth predictable or not? Are there any differences between bandwidth data from long period traces and those from short traces?

Step by step: Trace Collection and Transformation Classification of the Traces Bandwidth Prediction

2. Data Collection and Transformation Three Data Sets: I.NLANR short-period (90 seconds) WAN traces II.AUCKLAND long-period (1 day) WAN traces III. BC Traces, 2 WAN traces and 2 LAN traces

Converting Trace file to Bandwidth Data: Original Trace file (Time Stamp + IP Header + TCP Header) Time Stamp + Packet Length (From IP Header) assign packets to their bins according to their timestamp, and computes instantaneous bandwidth Final Bandwidth File

3. Basic Statistical Analysis After some basic statistical analysis of the bandwidth data, such as mean and maximum value of bandwidth, standard deviation of bandwidth, we get … Correlation Coefficient

CovMax/MeanMin/Mean Bin Size-0.4411-0.29670.6788 Now, what’s the effect of bin size on these properties? Relationship between Mean, Min and Max Bandwidth Correlation Coefficient

Relationship between bin sizes and COV Relationship between bin sizes and Max/Mean

4. Traces Classification How To? What does the time series plot looks like? What does the shape for the ACF plot looks like? What percentage of ACFs is significant? What best describes the distribution (histogram) of bandwidth? What does the PSD plot looks like? Is it decreasing linearly (in log-log plot) as the frequency increase? Result: 12 Classes for NLANR traces, 8 Classes for AUCKLAND traces.

I.NLANR short period WAN traces classification: A.Class 1: Not predictable, under-utilized ACF: Small value, low percentage ACFs are significant Bandwidth Distribution: Heavy-tailed distribution y=x - α PSD: Flat, contains all-frequency components like white noise. Bin size: 0.001S

Effect of different bin sizes: 0.01S 0.1S 1S Different bin sizes can all give us some useful information We should all these bin sizes for each trace.

B. Class 2: Little predictability, under-utilized ACF: Small value, low percentage significant ACFs Bandwidth Distribution: Multiple heavy-tailed distribution y=x - α PSD: Flat, contains all-frequency components like white noise. Bin size: 0.1S for ACF; 0.001S for other plots

C. Class 2a: No predictability, well-utilized ACF: Small value, low percentage significant ACFs Bandwidth Distribution: Left branch - half a normal distribution; Right-branch – heavy-tailed distribution y=x - α PSD: Flat, contains all-frequency components like white noise. Bin size: 0.1S for ACF; 0.001S for other plots

D. Class 4: Some predictability, under-utilized ACF: Over 50% significant ACFs Bandwidth Distribution: Multiple heavy-tailed distribution in the form of y=x - α PSD: Decreasing linearly in log-log plot as frequency increases; low-frequency components are dominant Bin size: 0.1S for ACF; 0.001S for other plots

E. Class 5: Some predictability, fairly-utilized ACF: Over 50% significant ACFs, high-frequency vibration Bandwidth Distribution: Left branch - half a normal distribution; Right-branch – heavy-tailed distribution y=x - α PSD: A dominant frequency (frequency band) component Bin size: 0.01S for ACF; 0.001S for other plots

II. Auckland long period WAN traces classification: A. Class 1: Good predictability, fairly-utilized ACF: Over 90% significant ACFs, regular and smooth plot Bandwidth Distribution: Two separate parts and two separate peaks, all heavy-tailed PSD: Decreasing linearly in log-log plot as frequency increases; low-frequency components are dominant Bin size: 1 S for all plots

B. Class 1a: Good predictability, fairly-utilized ACF: Over 85% significant ACFs, regular and smooth plot Bandwidth Distribution: Two separate parts and two separate peaks, with large parts overlapping PSD: Decreasing linearly in log-log plot as frequency increases; low-frequency components are dominant Bin size: 1 S for all plots

C. Class 2: Some predictability, well-utilized ACF: Over 70% significant ACFs, with some high frequency fluctuation Bandwidth Distribution: Left branch - half a normal distribution; Right-branch – heavy-tailed distribution y=x - α PSD: Decreasing linearly in log-log plot as frequency increases; low-frequency components are dominant Bin size: 1 S for all plots

III. Tree-based Classification Why do this? Some classes could be very similar to each other while some are quite different. This can be best described by a tree structure. Tree-based classification enables us to do classification at different granularity.

A.Tree-based Classification for NLANR traces

B. Tree-based Classification for Auckland traces

IV. Summary of Traces Classification Summary for NLANR traces (12 classes)

Summary for AUCKLAND traces (8 classes)

Pie Chart for NLANR traces and AUCKLAND traces

What else can we learn? All the long traces have some predictability. Most of the short traces are not predictable. And even for those short traces which are predictable, their predictability are still not as good as long traces. Only a small fraction of short traces could make good use of the bandwidth, while all the long traces have good (or fairly good) utilization of the bandwidth. All traces that are predictable have demonstrated some degree of long-range-dependency, including both short NLANR traces and long AUCKLAND traces.

5. Bandwidth Prediction What do we want to know? What’s the real predictability for each class that we classified? Which prediction model is best suited for bandwidth prediction? What’s the effect of different bin sizes on bandwidth prediction? Prediction models used (part of RPS Toolkit): MEAN, LAST, MA, BM, AR, ARMA, ARIMA, ARFIMA

How to evaluate predictability? Three evaluation criterions: I. The ratio of mean squared error (msqerr) to the variance of testing sequence, that is: II.How well does the error distribution fit the normal distribution? (=1 ideally) III.What percentage of ACFs for prediction error is significant? (=0 ideally)

I.Effectiveness of different predictors A. Bandwidth prediction for NLANR traces Mean squared err/variance of testing sequence Bin size: 0.01 S

Normal Distribution Fit Percentage of error ACFs that are significant Bin size: 0.01 S Bin size: 0.01 S

B. Bandwidth prediction for AUCKLAND traces Mean squared err/variance of testing sequence Bin size: 10 S

Normal Distribution Fit Percentage of error ACFs that are significant Bin size: 10 S Bin size: 10 S

C. Bandwidth prediction for BC traces Mean squared err/variance of testing sequence Bin size: 10 S for 2 WAN traces, 0.1 S for 2 LAN traces

What does bandwidth prediction really look like? An AUCKLAND Trace A NLANR Trace Bin Size: 1000S, 100S, 10S and 1S Bin Size: 1S, 0.1S, 0.01S and 0.001S

D. Observations For almost all classes of traces, AR model can yield the optimal or near optimal prediction results among all the eight predictors that have been tested. For almost all the classes and all the predictors, the error distribution are very close to normal distribution. The value of sigacffrac for AR model are almost the lowest among all predictors for any class. Our expectation of predictability for different classes have been confirmed by real results: All these long traces are predictable, and a large fraction of them have very good predictability. While for short traces, only 20% of them have some predictability. BC traces also have some predictability.

II. Influence of bin size on bandwidth prediction A. NLANR traces (AR 32) Mean squared err/variance of testing sequence at different bin sizes (0.001S, 0.01S, 0.1 S and 1S)

B. AUCKLAND traces (AR 32) Mean squared err/variance of testing sequence at different bin sizes (1S, 10S, 100S and 1000S)

C. Observations For NLANR traces, bin size of 0.1 second gives the best prediction among all the four bin sizes. For most AUCKLAND trace, bin size of 100 second or 10 second can give the best prediction performance among the four bin sizes. For any trace, there probably exists a optimal bin size that can give the best prediction performance.

D. Further Probe For Auckland traces, there are seems to be an optimal bin size for bandwidth prediction… There seems to be an optimal bin size around 20 second Red: a Class 1 trace Green: a Class 1c trace

6. Conclusion Bandwidth traces can be classified based on their time series plot, ACF plot, distribution of bandwidth, and PSD plot. Most long period WAN traces are predictable, with some degree of long-range dependency. A small part of short period WAN traces have some predictability, also with some degree of long-range dependency. The BC LAN traces are also predictable. AR model is an ideal model for prediction because of its accuracy and efficiency. For each trace, there exists an “optimal” bin size where we can get the best prediction performance.

Acknowledgement Many Thanks to Peter, Dong, and Jason!

Trace-based Network Bandwidth Analysis and Prediction Yi QIAO 06/10/2002.

Similar presentations

Presentation on theme: "Trace-based Network Bandwidth Analysis and Prediction Yi QIAO 06/10/2002."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Trace-based Network Bandwidth Analysis and Prediction Yi QIAO 06/10/2002.

Similar presentations

Presentation on theme: "Trace-based Network Bandwidth Analysis and Prediction Yi QIAO 06/10/2002."— Presentation transcript:

Similar presentations

About project

Feedback