Download presentation
Presentation is loading. Please wait.
Published byMaud Skinner Modified over 8 years ago
1
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Understanding Web Query Interfaces: Best-Efforts Parsing with Hidden Syntax Zhen Zhang, Bin He and Kevin C. Chang
2
MetaQuerier 2 MetaQuerier Goals: Exploring and integrating the deep Web Explorer source discovery source modeling source indexing Integrator source selection schema integration query mediation FIND sourcesQUERY sources The Deep Web: Databases on the Web Amazon.com Apartments.com Cars.com 411localte.com
3
MetaQuerier 3 Problem: Source capability extraction– Or, query interface understanding. Book sources: Music sources
4
MetaQuerier 4 Form understanding– What are the essential tasks? Output all the conditions, for each: Grouping elements (into query conditions) Tagging elements with their “semantic roles” attributeoperatorvalue
5
MetaQuerier 5 Demo summary: Multiple interpretations Query form Understanding: form structure
6
MetaQuerier 6 Certainly not a trivial task -– Recall the “butterfly ballot” in U.S. Election 2000. Even just grouping can be hard!
7
MetaQuerier 7 Baseline approach? The problem seems to be rather heuristic in nature… There seem to be no clear criteria, but only fuzzy heuristics Grouping is hard; it is often n-ary Heuristic: Group two elements if they are “close” But … Tagging is hard; no semantic labeling in HTML forms Heuristic: Tag the closest text as the “attribute” But … We need many such heuristics! Goal : A principled mechanism to encode and use the various heuristics systematically?
8
MetaQuerier 8 Our observation: concerted structures of QI Condition pattern as building blocks Convergence condition patterns
9
MetaQuerier 9 Our insight: Cope with form complexity by their “composition patterns.” “Lego”-like building blocks: Pattern of elements composed into conditions Pattern of conditions composed into a form So, how to realize our divide-and-conquer idea? Any computation paradigm? Q-Form Source ? Semantic Structure “Lego” Building Blocks
10
MetaQuerier 10 Query-form creation is guided by hidden syntax Our Hypothesis: Existence of Hidden-Syntax Semantic Structure (Query Conditions) Presentation (Query Interface) Hidden Syntax (Grammar) Composer Attr : title Operator : title words,…. Value : string Parser Parsing is thus a principled mechanism for the inverse
11
MetaQuerier 11 This “language” paradigm enables principled solution to a seemingly heuristic problem Essential notions: Grammar and Parser— Grammar: Pattern specification Declarative No need to hard-code heuristics Collective Capture both micro and macro patterns Parser: Pattern recognition Global Coherently interpret an entire query form Systematic Systematically assembles the building blocks
12
MetaQuerier 12 However, the hidden-syntax hypothesis itself entails challenges in its realization Hidden syntax is only hypothetical We must derive a grammar in its place What should be captured in a “derived grammar”? 2P-Grammar: Production + Preference productions for patterns; preferences for their precedence Derived grammar is secondary to any input Inherently incomplete and ambiguous What should be the machinery of a “soft parser”? Best-effort Parser: multiple, maximal-partial parse trees
13
MetaQuerier 13 Our Paradigm: Best-Effort Visual Language Parsing Framework HTML Layout Engine Tokenizer BE-Parser Ambiguity Resolution Error Handling Output: semantic structure Input: HTML query form Productions Preferences 2P Grammar X
14
MetaQuerier 14 Grammar: Layout based TextCond :- [ left (TextAttr, TextVal) above (TextAttr, TextVal) ] above (TextVal, TextOp) 3 * 5 E :- E * E, or E :- sequential (E, *, E) Presentation Grammar Traditional grammar (Sequential based 1-D) Our grammar (Layout based 2-D)
15
Parser: Logic programming style Traditional parsing Scan input sequentially Our parsing Nonlinear input Arbitrary constraints... fix-point iterative construction tokenization … EnumSel Form EnumRB EnumSel Form EnumRB EnumSel Parse trees
16
MetaQuerier 16 That’s not all: complications of hypothetical syntax Hidden syntax is only hypothetical ! Parser Ambiguous Multiple parse trees Incomplete Partial parse trees Grammar
17
MetaQuerier 17 Ambiguity Grammar: Preferences to capture the conventional precedence eg. RButton ≥ TextCond Parser: Just-in-time pruning by preference Multiple trees possible TextCond: Below(Attr,Selection) RButton: Left(radio,text))
18
MetaQuerier 18 Incompleteness Grammar Cannot capture all patterns Parser : Cannot interpret entire query interfaces Interpret as much as possible Greedily choose the maximum parse trees Reasoning: they look at big picture and consider more context
19
MetaQuerier 19 Union all the conditions interpreted by all the parse trees. Report both conflicts and missing errors Error Handling: “Best-effort” parser can output multiple and partial parse trees EnumSel Form EnumRB EnumSel ParsingUnion EnumSel Form EnumRB EnumSel
20
MetaQuerier 20 Experiment: How a “global grammar” will do? Global grammar : Derived from Basic; captures 21 patterns 82 productions, 39 non-terminals, 16 terminals Datasets : Basic : 3 domains (Airfare, Autos, Books); 150 sources NewSource : same domains, 30 sources NewDomain : 6 new domains (Music, …), 42 sources Random : 30 sources (from invisible- web.net) Correctness judgment: Number of correctly identified (grouping and tagging) conditions
21
MetaQuerier 21 Conclusion– Syntactic Parsing for Interface Understanding Query interface understanding by syntactic parsing with hidden grammars Insight: Exploit how semantics connects to presentation, in a syntactic way Future work: Constructing grammar automatically Developing more sophisticated preference framework Extending the framework to other applications
22
MetaQuerier 22 Thank you ! For more information: Online demo at MetaQuerier project Web site http://metaquerier.cs.uiuc.edu Invite you to our MetaQuerier demo in the afternoon
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.