Hung X. Nguyen and Matthew Roughan The University of Adelaide, Australia SAIL: Statistically Accurate Internet Loss Measurements.

Slides:



Advertisements
Similar presentations
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Advertisements

Copula Regression By Rahul A. Parsa Drake University &
10-3 Inferences.
Outline input analysis input analyzer of ARENA parameter estimation
Output analyses for single system
PhD Student: Ana Novak Supervisors: Prof Peter Taylor & Dr Darryl Veitch Active Probing Using Packet-Pair Probing to Estimate Packet Size and Packet Arrival.
Chapter 7(7b): Statistical Applications in Traffic Engineering Chapter objectives: By the end of these chapters the student will be able to (We spend 3.
Multiple regression analysis
CMPT 855Module Network Traffic Self-Similarity Carey Williamson Department of Computer Science University of Saskatchewan.
On the Self-Similar Nature of Ethernet Traffic - Leland, et. Al Presented by Sumitra Ganesh.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Modeling TCP Throughput Jeng Lung WebTP Meeting 11/1/99.
Statistics & Modeling By Yan Gao. Terms of measured data Terms used in describing data –For example: “mean of a dataset” –An objectively measurable quantity.
Nonparametric Estimation with Recurrent Event Data Edsel A. Pena Department of Statistics, USC Research supported by NIH and NSF Grants Based on joint.
On the Constancy of Internet Path Properties Yin Zhang, Nick Duffield AT&T Labs Vern Paxson, Scott Shenker ACIRI Internet Measurement Workshop 2001 Presented.
Building and Running a FTIM n 1. Define the system of interest. Identify the DVs, IRVs, DRVs, and Objective. n 2. Develop an objective function of these.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Probability By Zhichun Li.
Estimating Congestion in TCP Traffic Stephan Bohacek and Boris Rozovskii University of Southern California Objective: Develop stochastic model of TCP Necessary.
Chapter 2 Simple Comparative Experiments
Violations of Assumptions In Least Squares Regression.
Experimental Evaluation
7/3/2015© 2007 Raymond P. Jefferis III1 Queuing Systems.
1 Simulation Modeling and Analysis Output Analysis.
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Statistical inference.
Standard Error of the Mean
IE 594 : Research Methodology – Discrete Event Simulation David S. Kim Spring 2009.
A Machine Learning-based Approach for Estimating Available Bandwidth Ling-Jyh Chen 1, Cheng-Fu Chou 2 and Bo-Chun Wang 2 1 Academia Sinica 2 National Taiwan.
References for M/G/1 Input Process
Analysis of Simulation Results Andy Wang CIS Computer Systems Performance Analysis.
AM Recitation 2/10/11.
Simulation Output Analysis
 1  Outline  stages and topics in simulation  generation of random variates.
“A non parametric estimate of performance in queueing models with long-range correlation, with applications to telecommunication” Pier Luigi Conti, Università.
Chapter 4 – Modeling Basic Operations and Inputs  Structural modeling: what we’ve done so far ◦ Logical aspects – entities, resources, paths, etc. 
Modeling and Simulation CS 313
Montecarlo Simulation LAB NOV ECON Montecarlo Simulations Monte Carlo simulation is a method of analysis based on artificially recreating.
Renewable Energy Research Laboratory University of Massachusetts Prediction Uncertainties in Measure- Correlate-Predict Analyses Anthony L. Rogers, Ph.D.
1 Statistical Distribution Fitting Dr. Jason Merrick.
Analysis of Simulation Results Chapter 25. Overview  Analysis of Simulation Results  Model Verification Techniques  Model Validation Techniques  Transient.
Measurement and Modeling of Packet Loss in the Internet Maya Yajnik.
Lecture 19 Nov10, 2010 Discrete event simulation (Ross) discrete and continuous distributions computationally generating random variable following various.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Challenges and Opportunities Posed by Power Laws in Network Analysis Bruno Ribeiro UMass Amherst MURI REVIEW MEETING Berkeley, 26 th Oct 2011.
1 draft-ietf-ippm-loss-episode-metrics-00 Loss Episode Metrics for IPPM Nick Duffield, Al Morton, AT&T Joel Sommers, Colgate University IETF 79, Beijing,
Internet Performance Measurements and Measurement Techniques Jim Kurose Department of Computer Science University of Massachusetts/Amherst
Stefano Vezzoli, CROS NT
Detecting the Long-Range Dependence in the Internet Traffic with Packet Trains Péter Hága, Gábor Vattay Department Of Physics of Complex Systems Eötvös.
Multiplicative Wavelet Traffic Model and pathChirp: Efficient Available Bandwidth Estimation Vinay Ribeiro.
Chapter 10 Verification and Validation of Simulation Models
Aron, Aron, & Coups, Statistics for the Behavioral and Social Sciences: A Brief Course (3e), © 2005 Prentice Hall Chapter 6 Hypothesis Tests with Means.
1 draft-duffield-ippm-burst-loss-metrics-01.txt Nick Duffield, Al Morton, AT&T Joel Sommers, Colgate University IETF 76, Hiroshima, Japan 11/10/2009.
Development of a QoE Model Himadeepa Karlapudi 03/07/03.
1 Probability and Statistics Confidence Intervals.
Precision Measurements with the EVERGROW Traffic Observatory Péter Hága István Csabai.
Measurement and Modelling of the Temporal Dependence in Packet loss Maya Yajnik, Sue Moon, Jim Kurose, Don Towsley Department of Computer Science University.
Simulation. Types of simulation Discrete-event simulation – Used for modeling of a system as it evolves over time by a representation in which the state.
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss Pedro Domingos, Michael Pazzani Presented by Lu Ren Oct. 1, 2007.
Queuing Analysis of Tree-Based LRD Traffic Models Vinay J. Ribeiro R. Riedi, M. Crouse, R. Baraniuk.
Nick Duffield, Al Morton, AT&T Joel Sommers, Colgate University
Chapter 10 Verification and Validation of Simulation Models
Pong: Diagnosing Spatio-Temporal Internet Congestion Properties
Additional notes on random variables
Additional notes on random variables
Performance Evaluation of Computer Networks
MECH 3550 : Simulation & Visualization
Model generalization Brief summary of methods
Performance Evaluation of Computer Networks
CPSC 641: Network Traffic Self-Similarity
Presentation transcript:

Hung X. Nguyen and Matthew Roughan The University of Adelaide, Australia SAIL: Statistically Accurate Internet Loss Measurements

Internet Loss Measurement Network operators continuously perform loss measurements o SLA contracts o We need to know that the problem exists before we can fix it Active probing: inject probe packets into the network Many IETF standards (RFC3357, RFC2330) and commercial products (Cisco IOS IP SLA, Agilent's Firehunter PRO) o Poisson Probes – PASTA (Poisson Arrivals See Time Average) N samples, typical loss metrics o loss rate = # of successes/N (RFC2330) o lengths of loss and good runs (RFC3357) SourceDestination Probes good runloss run

Accuracy of Loss Measurements Loss rate Loss run length mean (std)(second) AT&T network, Ciavattone et al. 2003Testbed at Wisconsin, Sommer et al True values2.65%0.136 (0.009) Poisson probes (10Hz) 0.05%0 (0) Poisson probes (20Hz) 0.02%0(0) True values0.93%0.136 (0.009) Poisson probes (10Hz) 0.14%0(0) Poisson probes (20Hz) 0.12%0.022 (0.001) Web-like traffic TCP traffic

Errors in loss estimates PASTA is an asymptotic result (N ∞) We need to compute the statistical errors of the estimations (e.g., variance) o Loss rate:,  I i is the indicator function of probe ith o Variance:,  R( τ ij ) is the auto-covariance function of probes ith and jth Probes miss ON/OFF intervals

The auto-covariance function R( τ ij ) Empirical computation o R( τ ij ) can be computed directly from the samples Assume independent samples (commonly used) But losses are correlated, a model for the underlying loss process that captures sample correlation o Alternating Renewal ON/OFF model: {A i },{B i } are independent o {A i },{B i } are Gamma distributed with parameters (k 0, Θ 0 )and (k 1, Θ 1 )

Inferring model parameters Missing intervals problem o Many short ON (or OFF) periods are not observed o loss run lengths and good run lengths observed by the probes are much larger than the real values Hidden Semi-Markov Model (HSMM) to the rescue

Forward and Backward Algorithm Estimating (k 0, Θ 0 )and (k 1, Θ 1 ) is a statistical inference with missing data problem Direct Maximum Likelihood Estimation is costly o O(2 U ), U is the number of un-observed intervals Forward and Backward algorithm to speed up o Exploiting the renewal properties o Expectation-Maximization algorithm o O(2T 2 ), T is the number of intervals Knowing (k 0, Θ 0 )and (k 1, Θ 1 ), compute R( τ ij ) using inverse Laplace transform o Numerical inversion o Simulation

SAIL Input o Probe sending times {t 1, …, t N } o Probe outcomes {I 1, …, I N } o The length of the discrete time interval Δ T Algorithm o Apply the forward and backward algorithm to compute (k 0, Θ 0 ) and (k 1, Θ 1 ) o Apply the inverse Laplace transform to find R( τ ) o Compute the loss rate and its variance Output o The loss rate and its confidence intervals o The parameters (k 0, Θ 0 ) and (k 1, Θ 1 ) of the loss process

Simulation Alternating ON/OFF renewal process with Gamma intervals, 4 parameters {A i }:(k 0, Θ 0 ) and {B i } : (k 1, Θ 1 ) Poisson probes with rate λ SAIL works when the model assumptions are correct

Simulation- ON/OFF duration SAIL can correct the missing intervals problem and is needed.

Simulation- Loss rate SAIL is more accurate than other methods in computing the statistical errors

Measurements - Datasets UA-EPFL: 1 host at the University of Adelaide and 1 at EPFL, Switzerland PlanetLab: randomly selected source and destination pairs Poisson probes with small packet size (40 bytes) 1 hour traces, in each trace the probing rate is a constant Stationarity tests using heuristics (no big/sudden jump and no gradual trend in the moving average loss rate) UA-EPFLPlanetLab Hours # stationary traces101090

Renewal Properties Autocorrelation function test to verify renewal properties

Cross validation Traces are divided randomly into two sub-segments of equal length. Each sub-segments can be viewed as Poisson samples with rate λ /2.

Empirical Variances It is important to use a correct method to compute the variance (e.g., SAIL) SAILEmpirical

Shape Parameters of the Loss Processes The OFF periods appear to be exponentially distributed ON OFF

Errors in Estimating ON/OFF durations Errors can be quite large because of the missing (short) ON/OFF intervals problem

Prediction SAIL can be used to estimate future loss rate

How Many Probes Increasing sampling rate only yields small improvements in the variance

Summary SAIL: accurately computes errors in loss estimates Better than any existing alternative Future work: o Faster inference algorithm o Non-parametric models for the loss process o On-line o Make SAIL available to network operators/users Code is publicly available, please try Thanks!