Validating an Access Cost Model for Wide Area Applications Louiqa Raschid University of Maryland CoopIS 2001 Co-authors V. Zadorozhny, T. Zhan and L. Bright.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Chapter 12 Simple Linear Regression
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Sampling: Final and Initial Sample Size Determination
Doc.: IEEE /0604r1 Submission May 2014 Slide 1 Modeling and Evaluating Variable Bit rate Video Steaming for ax Date: Authors:
G. Alonso, D. Kossmann Systems Group
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
Simple Linear Regression and Correlation
Objectives (BPS chapter 24)
Simple Linear Regression
Chapter 10 Simple Regression.
Regression Analysis. Unscheduled Maintenance Issue: l 36 flight squadrons l Each experiences unscheduled maintenance actions (UMAs) l UMAs costs $1000.
BA 555 Practical Business Analysis
1 BA 275 Quantitative Business Methods Residual Analysis Multiple Linear Regression Adjusted R-squared Prediction Dummy Variables Agenda.
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Quantitative Business Analysis for Decision Making Simple Linear Regression.
Chapter 11: Inference for Distributions
Statistics 350 Lecture 17. Today Last Day: Introduction to Multiple Linear Regression Model Today: More Chapter 6.
Chapter 12 Section 1 Inference for Linear Regression.
Simple Linear Regression Analysis
1 PREDICTION In the previous sequence, we saw how to predict the price of a good or asset given the composition of its characteristics. In this sequence,
Correlation & Regression
Quantitative Business Analysis for Decision Making Multiple Linear RegressionAnalysis.
Simple linear regression and correlation analysis
Inference for regression - Simple linear regression
SIMULATION MODELING AND ANALYSIS WITH ARENA
Statistical Modeling with SAS/STAT Cheng Lei Department of Electrical and Computer Engineering University of Victoria April 9, 2015.
Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Sampling and Sample Size Part 1 Cally Ardington. Course Overview 1.What is Evaluation? 2.Outcomes, Impact, and Indicators 3.Why Randomise? 4.How to Randomise?
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
EUROCONTROL EXPERIMENTAL CENTRE INNOVATIVE RESEARCH Characteristics in Flight Data Estimation with Logistic Regression and Support Vector Machines ICRAT.
Chapter 14 Inference for Regression AP Statistics 14.1 – Inference about the Model 14.2 – Predictions and Conditions.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Chapter 10 Verification and Validation of Simulation Models
Lecture 10 Chapter 23. Inference for regression. Objectives (PSLS Chapter 23) Inference for regression (NHST Regression Inference Award)[B level award]
Multivariate Data Analysis Chapter 1 - Introduction.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
1 AReNA: Adaptive Distributed Catalog Infrastructure Based On Relevance Networks Vladimir Zadorozhny, University of Pittsburgh, Pittsburgh, PA Avigdor.
AP STATISTICS LESSON 14 – 1 ( DAY 1 ) INFERENCE ABOUT THE MODEL.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
Correlation & Regression Analysis
Chapter 8: Simple Linear Regression Yang Zhenlin.
1 Exploiting Nonstationarity for Performance Prediction Christopher Stewart (University of Rochester) Terence Kelly and Alex Zhang (HP Labs)
Advanced Residual Analysis Techniques for Model Selection A.Murari 1, D.Mazon 2, J.Vega 3, P.Gaudio 4, M.Gelfusa 4, A.Grognu 5, I.Lupelli 4, M.Odstrcil.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
L Berkley Davis Copyright 2009 MER301: Engineering Reliability Lecture 12 1 MER301: Engineering Reliability LECTURE 12: Chapter 6: Linear Regression Analysis.
Efficient Evaluation of Queries in a Mediator for WebSources Louiqa Raschid University of Maryland Joint work with Zadorozhny, Vidal, Urhan, Bright.
University of Maryland Scaling Heterogeneous Information Access for Wide area Environments Michael Franklin and Louiqa Raschid.
Forecasting. Model with indicator variables The choice of a forecasting technique depends on the components identified in the time series. The techniques.
Regression Analysis AGEC 784.
Simple Linear Regression - Introduction
BA 275 Quantitative Business Methods
CHAPTER 29: Multiple Regression*
CI for μ When σ is Unknown
STAT 5372: Experimental Statistics
Review of Hypothesis Testing
Regression Models - Introduction
Simple Linear Regression
Additional notes on random variables
Additional notes on random variables
Product moment correlation
Analytics – Statistical Approaches
Chapter 14 Inference for Regression
Chapter 13 Additional Topics in Regression Analysis
Probabilistic Surrogate Models
Presentation transcript:

Validating an Access Cost Model for Wide Area Applications Louiqa Raschid University of Maryland CoopIS 2001 Co-authors V. Zadorozhny, T. Zhan and L. Bright

L. Raschid — University of Maryland, CoopIS01 Scalable Wide-Area Applications Problems n Wide area environment is dynamic (noisy) n Wide variability in latency (end-to-end delay) n Network and server workloads are unknown n Time and Day dependencies impact latency n Dynamic environment - constantly monitored Research Objective: Use query feedback to monitor and learn behavior and to predict access cost distributions that may be Time and Day dependent

L. Raschid — University of Maryland, CoopIS01 Talk Outline n Architecture for Wide Area Applications n WebPT: Tool to predict access costs n WebPT based Access Cost Catalog n Grouping of WebSources based on observable WebSource characteristics n Hypothesis to test WebPT based Catalog -- High Prediction Accuracy versus Low Prediction Accuracy n Validation based on experimental case study

L. Raschid — University of Maryland, CoopIS01 Architecture for WebPT based Catalog

L. Raschid — University of Maryland, CoopIS01 Predicting Response Times for Accessing WebSources Problem: Difficulty in determining evaluation costs n Physical implementation details unknown n Load on network and WebSource unknown Objective: Use query feedback to learn access costs Exploit Time of day, Day of week etc., to predict costs Identify easily observable WebSource characteristics Determine prediction accuracy for WebSources based on WebSource characteristics

L. Raschid — University of Maryland, CoopIS01 Metrics in WebPT Access Cost Model n WebSource and Network Costs u Query Processing at WebSource u Downloading data from WebSource (extraction cost) n Wrapper Statistics u Number of Pages Accessed u Cardinality of Result n Statistics may be dependent on value of query binding n WebPT - a tool for learning using query feedback and predicting access cost based on parameters such as Day, Time, Qty of data, Cardinality, etc.

L. Raschid — University of Maryland, CoopIS01 WebPT Learning

L. Raschid — University of Maryland, CoopIS01 WebPT based Prediction WebPT is configured for some hierarchy of dimensions Quantity, Day,Time, Cardinality n WebPT Learning algorithm u Cell splitting u Smoothing u Estimate response time and confidence u Similar to CART (regression versus heuristics) u Cell merging n Heuristics used in calibration of each cell u Dimension - min/ max/ scale u Allowed deviation u Confidence window

L. Raschid — University of Maryland, CoopIS01 Prediction Accuracy of WebPT based Cost Model is strongly correlated with the following: n Observable WebSource Characteristics u Significance of Time and Day in predicting workload at the server and on the network u Variance (noise) in accessing server n Quality of available statistics - cardinality u Random bindings - large variance in cardinality u Fixed bindings - better estimation of cardinality

L. Raschid — University of Maryland, CoopIS01 Case Study: Data gathering and Experiment n 6 data sources in the public domain n Data gathered for several weeks in 1999, 2000 n Queries submitted to WebSources periodically n Recorded TTF TTL n Query bindings affected result cardinality u Random bindings - >50 bindings u Fixed bindings - 2 bindings each for [S,M,L] n Mediator queries - simple scan to complex 5 way join over data in 5 WebSources (not reported)

L. Raschid — University of Maryland, CoopIS01 Characteristics of Access Costs from WebSources

L. Raschid — University of Maryland, CoopIS01 Observable WebSource Characteristics

L. Raschid — University of Maryland, CoopIS01 Grouping of WebSources based on Characteristics G1: T and D significant; Noise can vary G2: Noise High G3: T, D not significant; Noise Low - EMPTY

L. Raschid — University of Maryland, CoopIS01 Hypothesis to test WebPT based Access Cost Catalog n H1: High prediction Accuracy for the following u T, D, are significant and Low Noise u Sources are in G1 but not in G2 n H2: Catalog will improve prediction accuracy for the following WebSources u T, D are significant independent of noise u Group G1 n H3: Statistics may be dependent on value of query binding u Prediction accuracy improves with learning on fixed bindings u Sources in both groups

L. Raschid — University of Maryland, CoopIS01 Prediction Accuracy for WebSources WebPT(Lo) - Random bindings

L. Raschid — University of Maryland, CoopIS01 WebSource Characteristics and Correlation With Prediction Accuracy

L. Raschid — University of Maryland, CoopIS01 Groupings of WebSources and Correlation with Prediction Accuracy G1: T and D significant G2: Noise High GNIS: High Pred Accuracy G1 AND G2 FAA; FishBase: Low Pred Accuracy while in G1; Noisy

L. Raschid — University of Maryland, CoopIS01 Quantile Plots of Relative Error of Prediction for ACM, Aircraft

L. Raschid — University of Maryland, CoopIS01 Quantile Plot of Relative Error of Prediction for FAA, GNIS

L. Raschid — University of Maryland, CoopIS01 Correlation of Prediction Accuracy and Characteristics of WebSources

L. Raschid — University of Maryland, CoopIS01 Summary + Impact n Unique Case Study: WebPT based Access Cost Catalog and Cost distributions n Grouping of WebSources based on observable WebSource characteristics n High Prediction Accuracy for some sources in G1 (T,D significant) with low noise n High Prediction Accuracy for some sources in G1 and in G2 (High Noise) n Similar results for Mediator cost model and complex N-way joins over multiple WebSources