Download presentation
Presentation is loading. Please wait.
Published byStuart Blankenship Modified over 9 years ago
1
University of Maryland Scaling Heterogeneous Information Access for Wide area Environments Michael Franklin and Louiqa Raschid
2
University of Maryland Wide-Area Data Access Problems n Scalability of Wrapper-Mediator Systems n Publishing and Discovery of Sources n Dissemination of Relevant Information Relevant Technologies n Flexible Architectures n Adaptive Systems n Metadata Management
3
University of Maryland The Big Picture
4
University of Maryland the little picture Predator O-R DBMS Remote wrapper interface Planner Scrambler MDT Wrapper interface Web sources
5
University of Maryland Querying Web Sources n Generating wrappers for Web accessible sources to provide an API for queries and structured answers. n Obtaining and representing source capability and content descriptions to use in query planning. n Estimating the response time for cost-based optimization
6
University of Maryland Web application wrapper toolkit n Define the capabilities of Web sources n A wrapper interface to publish source capability n A wrapper toolkit u Translation from query + bindings –› URL u Declarative language to specify Extractors Simple extractors HTML or XMLData –» structured object Complex extractors - customizable crawler utility for extraction of meta-information n Generator for JDBC compliant wrappers n Metadata and query and answer interface
7
University of Maryland Weather source
8
University of Maryland Results from the Weather source
9
University of Maryland
10
Query Planning for Web sources Objective: Generate safe optimal plans with possibly replicated sources n Multiple heterogeneous sources u Limited capability (bindings) u Possible replication of contents u Complete / Incomplete sources n Use meta-information to construct lattices n Generate safe plans with alternatives n Mediator algebra and rules for optimization
11
University of Maryland
12
Content and Capability Descriptions n Domain information n Capability descriptions: u I/O relationships: Time,Date Channel,Title,Category u Content: Date:CurrentYear Time:{0, …,23} Channel:CNW u Completeness information, Complete. Source S3 provides complete answer when Time and Date are bound and Channel=ppv and Category=Movies. F Explicitly provided by the source DBA. F Augmented by inference. F Augmented by learning based on query feedback
13
University of Maryland Sources in Lattices
14
University of Maryland Display pay-per-view movies shown on August 14th,1998 at 9:30am. Using Buckets (S1|S3) in AlternatePartition and (S5 S1) and (S5 S3)in SimilarPartition
15
University of Maryland Web Source Response Time Estimation Tool - MDT Problem: Difficulty in determining evaluation costs n Physical implementation details unknown n Load on network and source unknown Objective: Tool to estimate response time based on query feedback and estimate confidence. To be used in a combined cost-model and to choose between alternate sources. n MDT is a tool that estimates response time based on Day, Time, Quantity, etc.
16
University of Maryland Configuring and learning in the MDT MDT is configured for some hierarchy of dimensions n Calibration of each dimension u min/ max/ scale u Allowed deviation u Confidence window n Learning algorithm u Cell splitting algorithm u Value correction algorithm u Estimate response time and confidence
17
University of Maryland Correcting the confidence of estimated value
18
University of Maryland
23
Conclusions n Extend the Predator O-R DBMS with scalable mediator functionality n Current implementation status u Scrambling enabled optimizer u Mediator algebra and logical optimizer u Cost-based optimizer based on MDT estimation n Toolkit for generating wrappers for Web sources
24
University of Maryland Still to come … n Publishing source metadata n Discovering sources n Source selection using metadata n User profiles n Dissemination of relevant data
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.