Download presentation
Presentation is loading. Please wait.
Published byGavin Rice Modified over 9 years ago
1
Validating an Access Cost Model for Wide Area Applications Louiqa Raschid University of Maryland CoopIS 2001 Co-authors V. Zadorozhny, T. Zhan and L. Bright
2
L. Raschid — University of Maryland, CoopIS01 Scalable Wide-Area Applications Problems n Wide area environment is dynamic (noisy) n Wide variability in latency (end-to-end delay) n Network and server workloads are unknown n Time and Day dependencies impact latency n Dynamic environment - constantly monitored Research Objective: Use query feedback to monitor and learn behavior and to predict access cost distributions that may be Time and Day dependent
3
L. Raschid — University of Maryland, CoopIS01 Talk Outline n Architecture for Wide Area Applications n WebPT: Tool to predict access costs n WebPT based Access Cost Catalog n Grouping of WebSources based on observable WebSource characteristics n Hypothesis to test WebPT based Catalog -- High Prediction Accuracy versus Low Prediction Accuracy n Validation based on experimental case study
4
L. Raschid — University of Maryland, CoopIS01 Architecture for WebPT based Catalog
5
L. Raschid — University of Maryland, CoopIS01 Predicting Response Times for Accessing WebSources Problem: Difficulty in determining evaluation costs n Physical implementation details unknown n Load on network and WebSource unknown Objective: Use query feedback to learn access costs Exploit Time of day, Day of week etc., to predict costs Identify easily observable WebSource characteristics Determine prediction accuracy for WebSources based on WebSource characteristics
6
L. Raschid — University of Maryland, CoopIS01 Metrics in WebPT Access Cost Model n WebSource and Network Costs u Query Processing at WebSource u Downloading data from WebSource (extraction cost) n Wrapper Statistics u Number of Pages Accessed u Cardinality of Result n Statistics may be dependent on value of query binding n WebPT - a tool for learning using query feedback and predicting access cost based on parameters such as Day, Time, Qty of data, Cardinality, etc.
7
L. Raschid — University of Maryland, CoopIS01 WebPT Learning
8
L. Raschid — University of Maryland, CoopIS01 WebPT based Prediction WebPT is configured for some hierarchy of dimensions Quantity, Day,Time, Cardinality n WebPT Learning algorithm u Cell splitting u Smoothing u Estimate response time and confidence u Similar to CART (regression versus heuristics) u Cell merging n Heuristics used in calibration of each cell u Dimension - min/ max/ scale u Allowed deviation u Confidence window
9
L. Raschid — University of Maryland, CoopIS01 Prediction Accuracy of WebPT based Cost Model is strongly correlated with the following: n Observable WebSource Characteristics u Significance of Time and Day in predicting workload at the server and on the network u Variance (noise) in accessing server n Quality of available statistics - cardinality u Random bindings - large variance in cardinality u Fixed bindings - better estimation of cardinality
10
L. Raschid — University of Maryland, CoopIS01 Case Study: Data gathering and Experiment n 6 data sources in the public domain n Data gathered for several weeks in 1999, 2000 n Queries submitted to WebSources periodically n Recorded TTF TTL n Query bindings affected result cardinality u Random bindings - >50 bindings u Fixed bindings - 2 bindings each for [S,M,L] n Mediator queries - simple scan to complex 5 way join over data in 5 WebSources (not reported)
11
L. Raschid — University of Maryland, CoopIS01 Characteristics of Access Costs from WebSources
12
L. Raschid — University of Maryland, CoopIS01 Observable WebSource Characteristics
13
L. Raschid — University of Maryland, CoopIS01 Grouping of WebSources based on Characteristics G1: T and D significant; Noise can vary G2: Noise High G3: T, D not significant; Noise Low - EMPTY
14
L. Raschid — University of Maryland, CoopIS01 Hypothesis to test WebPT based Access Cost Catalog n H1: High prediction Accuracy for the following u T, D, are significant and Low Noise u Sources are in G1 but not in G2 n H2: Catalog will improve prediction accuracy for the following WebSources u T, D are significant independent of noise u Group G1 n H3: Statistics may be dependent on value of query binding u Prediction accuracy improves with learning on fixed bindings u Sources in both groups
15
L. Raschid — University of Maryland, CoopIS01 Prediction Accuracy for WebSources WebPT(Lo) - Random bindings
16
L. Raschid — University of Maryland, CoopIS01 WebSource Characteristics and Correlation With Prediction Accuracy
17
L. Raschid — University of Maryland, CoopIS01 Groupings of WebSources and Correlation with Prediction Accuracy G1: T and D significant G2: Noise High GNIS: High Pred Accuracy G1 AND G2 FAA; FishBase: Low Pred Accuracy while in G1; Noisy
18
L. Raschid — University of Maryland, CoopIS01 Quantile Plots of Relative Error of Prediction for ACM, Aircraft
19
L. Raschid — University of Maryland, CoopIS01 Quantile Plot of Relative Error of Prediction for FAA, GNIS
20
L. Raschid — University of Maryland, CoopIS01 Correlation of Prediction Accuracy and Characteristics of WebSources
21
L. Raschid — University of Maryland, CoopIS01 Summary + Impact n Unique Case Study: WebPT based Access Cost Catalog and Cost distributions n Grouping of WebSources based on observable WebSource characteristics n High Prediction Accuracy for some sources in G1 (T,D significant) with low noise n High Prediction Accuracy for some sources in G1 and in G2 (High Noise) n Similar results for Mediator cost model and complex N-way joins over multiple WebSources
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.