Download presentation
Presentation is loading. Please wait.
Published byDouglas Dennis Modified over 8 years ago
1
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the Fly Zhen Zhang, Bin He, and Kevin C. Chang
2
MetaQuerier 2 The Context: MetaQuerier @ UIUC Exploring and integrating the deep Web Explorer source discovery source modeling source indexing Integrator source selection schema integration query mediation FIND sources QUERY sources db of dbs unified query interface Amazon.com Cars.com 411localte.com Apartments.com
3
MetaQuerier 3 The Need: Querying alternative sources in the same domain Sources are proliferating in the same domain 2004 survey found 10% Web sites are “deep” totaling 450,000 DBs on the Web Each query can often find many useful DBs Different query needs different sources How to query across dynamic sources?
4
MetaQuerier 4 The Problem: Query translation on-the-fly Challenge: No pre-configured source-specific translation knowledge Requirements: Within domain: Source generality Across domain: Domain portability
5
MetaQuerier 5 Dynamic query translation – Essential tasks Reconcile three levels of query heterogeneities Attribute level: schema matching Predicate level: predicate mapping Query level: query rewriting
6
MetaQuerier 6 Demo. Form Assistant to help navigate the deep Web.
7
MetaQuerier 7 Translation objective: Closest among the valid Tom Clancy Source query Q s on source form S U Target query form T Query Translation Filter : σ title contain “red storm” and price 12 Union Query Q t *: Input: output: Two goals: Syntactic valid semantic close
8
MetaQuerier 8 What is valid? Each source has a query model Vocabulary: predicate templates { P 1, P 2, P 3, P 4, P 5 } Syntax: valid combination of predicate templates { F 1, F 2, F 3, F 4, F 5, F 6, F 7, F 8 } P1P1 P3P3 P4P4 P2P2 F1F1 F2F2 F3F3 F4F4 F5F5 F6F6 F7F7 F8F8 P1P1 νν P2P2 νν P3P3 νν P4P4 νν P5P5 νννν Tom Clancy P5P5 F5:F5: F6:F6:
9
MetaQuerier 9 What is close? Define semantic closeness. Minimal subsuming C min No false positive: Miss no answer Minimizing false negative: Fewest extra answers Clear semantics: DB content independent Modular translation: Reduce translation complexity t1:t1: 025 t2:t2: 45 s: 350 t1 v t2:t1 v t2: 045 t3:t3: 6545 t1 v t2 v t3:t1 v t2 v t3: 065 ? C min
10
MetaQuerier 10 Target Query Source Query Enumerate valid Search for closest Target Query Translation Source Query What mechanism? Attribute Match Predicate Mapping Query Rewriter C min ?
11
MetaQuerier 11 Form Extractor Source query Q s Target query form QI Attribute Matcher: Syntax-based schema matching Predicate Mapper: Type-based search-driven mapping Query Rewriter: Constraint-based query rewriting Target query Q t * Domain-specific Thesaurus Domain-specific type handlers System architecture: Modular & lightweight Modularized mechanism Lightweight domain knowledge [RahmBernstein- VLDBJ01] [Halevy-VLDBJ01] ? [ZhangHC- SIGMOD04] [HeChang- SIGMOD03] [WuYDM- SIGMOD04]
12
MetaQuerier 12 The core challenge: Predicate mapping Tasks Choose operator Fill in values Union of target predicate t* Predicate Mapping U Objective Minimal subsuming Input: output:
13
MetaQuerier 13 Is source-specific translation applicable? 1……… 1 ………….. 1 …… 1 ……. adult = $t passenger = $t … price<$t if $t<25: [price:between:0,25] elseif $t<45: … …
14
MetaQuerier 14 Enable source-generic predicate mapping? What is the scope of translation? What is the mechanism of translation?
15
MetaQuerier 15 The right scope? Survey 150 sources for the Correspondence Matrix. Correspondences occur within localities!
16
MetaQuerier 16 The right scope? Correspondence locality Type-based translation Target template P Target Predicate t* Type Recognizer Domain Specific Handler Text Handler Numeric Handler Datetime Handler Predicate Mapper Source predicate s Correspondences occur within localities Translation by type-handler
17
MetaQuerier 17 The right mechanism: Is pairwise-rule based mechanism suitable? Template new template 1nn+1 1 n Adding one template needs to add 2n rules! And need knowledge of the old templates. attr<$t if $t<25: [attr:between:0,25] elseif $t<45: … … Rule:
18
MetaQuerier 18 More extendable mechanism? Search-driven. Values of the type (virtual database) Evaluate over “database” Templates of same type Evaluation results Search for closest evaluator -infinite+infinite01 t1:t1: 025 t2:t2: 45 s:s: 350 t 1 v t 2 : 2545 s t … u evaluator
19
MetaQuerier 19 Greedy search to construct C min mapping Find mapping iteratively Each iteration, greedily choose the one covering maximal uncovered t1:t1: 025 t2:t2: 45 s:s: 350 t3:t3: 4565
20
MetaQuerier 20 Experiments Translating 120 queries in total Between randomly paired sources from 8 domains With domain thesaurus but no type handler Accuracy as ratio of correct condition per query Matching 18% 40% 42% Extraction Mapping Average accuracyError distribution Basic: 3 domainsNew: 5 domains
21
MetaQuerier 21 Conclusion System: Form assistant for querying Web databases Problem Dynamic query translation Contributions: Framework: Light-weight domain-based architecture Techniques: Type-based search-driven pred. mapping Insight: Holistic integration holds promise!
22
MetaQuerier 22 Thank You! For more information: http://metaquerier.cs.uiuc.edu kcchang@cs.uiuc.edu
23
MetaQuerier 23 What is close? Define semantic closeness. Minimal subsuming C min No false positive Miss no correct answer Minimizing false negative Contain fewest extra answers Clear semantic Database content independent Modular translation Reduce translation complexity t1:t1: 025 t2:t2: 45 s: 350 t 1 v t 2 : 2565 t3:t3: 45 t 2 v t 3 : 2565 ? C min
24
MetaQuerier 24 Experiment: Accuracy distribution Accuracy distribution for Basic dataset Accuracy distribution for New dataset
25
MetaQuerier 25 Text handler: Search space Conceptually, union of all target predicate Practically, close-world assumption
26
MetaQuerier 26 Text handler: Closeness estimation Ideally, logic reasoning Practically, evaluation-by-materialization Materialize query against a “complete” database
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.