1 Reading Report 9 Yin Chen 29 Mar 2004 Reference: Multivariate Resource Performance Forecasting in the Network Weather Service, Martin Swany and Rich.

Slides:



Advertisements
Similar presentations
Decomposition Method.
Advertisements

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Part II – TIME SERIES ANALYSIS C5 ARIMA (Box-Jenkins) Models
The General Linear Model Or, What the Hell’s Going on During Estimation?
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
1 Reading Report 8 Yin Chen 24 Mar 2004 Reference: A Network Performance Tool for Grid Environments, Craig A. Lee, Rich Wolski, Carl Kesselman, Ian Foster,
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Data Sources The most sophisticated forecasting model will fail if it is applied to unreliable data Data should be reliable and accurate Data should be.
Statistics & Modeling By Yan Gao. Terms of measured data Terms used in describing data –For example: “mean of a dataset” –An objectively measurable quantity.
Random Variables Dr. Abdulaziz Almulhem. Almulhem©20022 Preliminary An important design issue of networking is the ability to model and estimate performance.
MAE 552 Heuristic Optimization
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
Evaluating Hypotheses
Lecture 4 Measurement Accuracy and Statistical Variation.
Variance of Aggregated Web Traffic Robert Morris MIT Laboratory for Computer Science IEEE INFOCOM 2000’
BCOR 1020 Business Statistics Lecture 21 – April 8, 2008.
Quantitative Business Analysis for Decision Making Simple Linear Regression.
The Calibration Process
1 Chapter 1 Introduction Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Business Statistics - QBM117 Statistical inference for regression.
Chapter 14 Inferential Data Analysis
Statistical Methods for long-range forecast By Syunji Takahashi Climate Prediction Division JMA.
Lecture II-2: Probability Review
Relationships Among Variables
Radial Basis Function Networks
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 11 Regression.
Statistical Methods For Engineers ChE 477 (UO Lab) Larry Baxter & Stan Harding Brigham Young University.
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
Sampling Theory Determining the distribution of Sample statistics.
1 Ch6. Sampling distribution Dr. Deshi Ye
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
by B. Zadrozny and C. Elkan
Model Building III – Remedial Measures KNNL – Chapter 11.
Statistics Chapter 9. Statistics Statistics, the collection, tabulation, analysis, interpretation, and presentation of numerical data, provide a viable.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
THE MANAGEMENT AND CONTROL OF QUALITY, 5e, © 2002 South-Western/Thomson Learning TM 1 Chapter 9 Statistical Thinking and Applications.
1 Statistical Distribution Fitting Dr. Jason Merrick.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Chapter 7: Sample Variability Empirical Distribution of Sample Means.
Maz Jamilah Masnan Institute of Engineering Mathematics Semester I 2015/ Sampling Distribution of Mean and Proportion EQT271 ENGINEERING STATISTICS.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 11: Bivariate Relationships: t-test for Comparing the Means of Two Groups.
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
APPLIED DATA ANALYSIS IN CRIMINAL JUSTICE CJ 525 MONMOUTH UNIVERSITY Juan P. Rodriguez.
1 ES Chapter 11: Goals Investigate the variability in sample statistics from sample to sample Find measures of central tendency for sample statistics Find.
BME 353 – BIOMEDICAL MEASUREMENTS AND INSTRUMENTATION MEASUREMENT PRINCIPLES.
Sampling Design and Analysis MTH 494 Lecture-21 Ossam Chohan Assistant Professor CIIT Abbottabad.
Chapter 7 Introduction to Sampling Distributions Business Statistics: QMIS 220, by Dr. M. Zainal.
Lecture 8: Measurement Errors 1. Objectives List some sources of measurement errors. Classify measurement errors into systematic and random errors. Study.
Sampling Theory Determining the distribution of Sample statistics.
Computacion Inteligente Least-Square Methods for System Identification.
Forecast 2 Linear trend Forecast error Seasonal demand.
Central Bank of Egypt Basic statistics. Central Bank of Egypt 2 Index I.Measures of Central Tendency II.Measures of variability of distribution III.Covariance.
Resource Characterization Rich Wolski, Dan Nurmi, and John Brevik Computer Science Department University of California, Santa Barbara VGrADS Site Visit.
Sampling and Sampling Distribution
The Calibration Process
Combining Random Variables
Estimating with PROBE II
Basic Estimation Techniques
Introduction to Instrumentation Engineering
Counting Statistics and Error Prediction
Sampling Distributions
Lecture 7 Sampling and Sampling Distributions
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

1 Reading Report 9 Yin Chen 29 Mar 2004 Reference: Multivariate Resource Performance Forecasting in the Network Weather Service, Martin Swany and Rich Wolski

2 Problems Overview Frequently, monitor data is used as a prediction of future performance. The Common Method  “Forecast “ based on the last value : Assume the performance of a given transfer will be the same as it was the last time it was performed.  i.e., For network bandwidth estimation, many users simply conduct lengthy data transfer between end-hosts, observe the throughput, and use that observation as the prediction. Two Problems  Not clear that the last observation is a good estimation.  Require the resource be “loaded” enough : On high-throughput networks with lengthy round-trip times, enough data must be transferred to match the bandwidth-delay product for the end-to-end route. Costly -- waste resource and lost time while the probe take place.

3 Network Weather Service’s Solutions Using statistical techniques Combine other performance monitoring tools  Combine infrequent, irregularly spaced expensive measurements with regularly spaced, but far less intrusive probes  Combine short NWS bandwidth probes and previous HTTP history transfers  Combine data from two or more measurement streams  Combine heavy-weight operations with lightweight, inexpensive probes of a resource

4 Forecasting Using Correlation The univariate forecaster represent the current and future performance of a single dataset. The multivariate forecaster operate on some subset of a collection measurement series, characterized by units and frequency, with a combination of correlation and forecasting. The correlator serves to map X values onto Y values where the X values are plentiful and “cheap” and the Y values are “rare” and expensive.  Given two correlated variables, knowledge of one at a given point in time provides information as to the value of the other.  Fix the value of one of the variables can make reasonable assumptions about the probability of various value of the other variable, based on their history of correlation.

5 Possible Correlator Methodology Possible Mapping Methods :  Linear regression  Traditional correlation operate on datasets of equal sizes  Assume that the variables in question are related linearly Problems -- measurement data gathered form a variety of sources:  Data sets may be of different sizes  Data items in each set may be gathered with different frequencies or with different regularities  The units of measure may be different between data sets.

6 Correlator Methodology of the Work Rank Correlation Techniques  A rank correlation measure sorts datasets and does linear correlation based on their position in the sorted list (their “rank”).  It is non-parametirc, appropriated because the distribution of the data cannot be certain of. Cumulative Distribution Function (CDF)  Defined as the probability for some set of sample points that a value is less than or equal to some real valued x.  If the probability density function (PDF) is known, CDF is defined as:  The reason of using CDF Practicality of delivering this information to Grid A CDF dataset may be compressed with varying degrees of loss. i.e., a handful of points can describe the CDF with linear interpolation between specified points.

7 Correlator Methodology of the Work (Cont.) Correlator Methodology  Approximate the CDF by :  Where count x is the total elements in the sample set. Which is equivalent to  Rank measurements in datasets of different size by computing the CDF for each of them.  Use the position of the value in X (x target ) to produce a value of Y (y forecast )  This gives the value in Y (in CDF Y ) associated with the position x target  If the CDF position of x target (CDF x (x target )) is between measured Y values, linear interpolation is used to determine the value of y forecast  This mapping is similar to quantile-quantile comparision.  The x target can be chosen in a variety of ways.  Choose x target by using the NWS forecasters to produce the value X target = Forecast(X)

8 Correlator Methodology of the Work (Cont.) Figure 1 shows the NWS(X) values. Figure 2 shows the HTTP(X) values. To compute the forecast for Y use the position in the CDF of x target which yields some real number between 0 and 1 This predictive method forms both the X and Y CDFs. At each prediction cycle, a univariate forecast of X is taken and that X value’s position is determined. That value is used to determine the value in Y at the corresponding position, which is the prediction.

9 Correlator Methodology of the Work (Cont.) Error  Compute the Mean Normalized Error Percentage (MNEP) as the last prediction’s error over the current average value.  This is required due to the forecast-oriented application of the system  The mean used (mean t ) is the current mean and is reevaluated at each timestep.

10 Experimental Methodology Generate default-sized NWS bandwidth measurements (64K bytes), using the TCP/IP end-to-end sensor between a parit of machines every 10 sec. Initiate a 16MB HTTP transfer every min. and record the observed transfer bandwidth. Figure 3 shows the NWS data Figure 4 shows the HTTP data The exact correlative relationship is not clear visually, but the shapes appear similar.

11 Experimental Methodology (Cont.) Goal  Use this data to investigate the accuracy with which the short 64K NWS transfer could be used to predict the long 16MB HTTP transfers given successively longer periods. Why HTTP?  Widely used in the Internet  A key components of SOAP and Web services  Used more and more in high-performance computing  This data represents: The relationship between short and long network experiments. Instrumentation data being fed into to the forecasting service. Compare with 3 prediction methods  Last value method  The univariate NWS forecast applied solely to the 16MB probe history  The new correlative methods Why 16MB?  To limit intrusiveness  Less variability in the observed bandwidth as the amount of data transferred gets larger.  Any transient periods of high congestion are amortized over the longer transfer time.  Not so long as to make things easy, but long enough to represent real applications and real data.

12 Results Figure 5 shows the MAE of the univariate compared to the multivariate forecasters.  Multivariate forecaster performs better.  As Y become less frequent, the univariate forecasters lose accuracy.  Using the new forecasting technique, the NWS can predict 16MB bandwidth using 64K measurements with a mean error of 0.47 megabits/sec. Figure 6 shows MNEP of the MAE.  The MNEP shows the percentage of the current average value that an error represents.  At 500 min. the 64K transfers can predict 16MB transfer bandwidth to within 7% on the average. Figure 5. Comparison of MAE between univariate and multivariate forecasts for different frequencies of HTTP measurements. Figure 6. Comparison of MNEP of MAE between univariate and multivariate forecasts for different frequencies of HTTP measurements.

13 Results (Cont.) Figure 7 shows the square root of the MSE of both types of forecasts.  This value bears resemblance to the standard deviation and tends to indicate the variance of the error.  The new method predict relatively high levels of accuracy. Figure 8 show the MNEP of the MAE for the CDF forecaster and the “Last Value” predictor.  Last Value predictor has over 100% error. END… Figure 7. Comparison of the square root of the MSE between univariate and multivariate forecasts for different frequencies of HTTP measurements. Figure 8. Comparison of MNEP of MAE between “Last Value” and multivariate forecasts.