Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Understanding Web Query Interfaces: Best-Efforts Parsing with Hidden Syntax.

Similar presentations


Presentation on theme: "The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Understanding Web Query Interfaces: Best-Efforts Parsing with Hidden Syntax."— Presentation transcript:

1 The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Understanding Web Query Interfaces: Best-Efforts Parsing with Hidden Syntax Zhen Zhang, Bin He and Kevin C. Chang

2 MetaQuerier 2 MetaQuerier Goals: Exploring and integrating the deep Web Explorer source discovery source modeling source indexing Integrator source selection schema integration query mediation FIND sourcesQUERY sources The Deep Web: Databases on the Web Amazon.com Apartments.com Cars.com 411localte.com

3 MetaQuerier 3 Problem: Source capability extraction– Or, query interface understanding. Book sources: Music sources

4 MetaQuerier 4 Form understanding– What are the essential tasks? Output all the conditions, for each:  Grouping elements (into query conditions)  Tagging elements with their “semantic roles” attributeoperatorvalue

5 MetaQuerier 5 Demo summary: Multiple interpretations Query form Understanding: form structure

6 MetaQuerier 6 Certainly not a trivial task -– Recall the “butterfly ballot” in U.S. Election 2000. Even just grouping can be hard!

7 MetaQuerier 7 Baseline approach? The problem seems to be rather heuristic in nature… There seem to be no clear criteria, but only fuzzy heuristics  Grouping is hard; it is often n-ary Heuristic: Group two elements if they are “close” But …  Tagging is hard; no semantic labeling in HTML forms Heuristic: Tag the closest text as the “attribute” But … We need many such heuristics!  Goal : A principled mechanism to encode and use the various heuristics systematically?

8 MetaQuerier 8 Our observation: concerted structures of QI Condition pattern as building blocks Convergence condition patterns

9 MetaQuerier 9 Our insight: Cope with form complexity by their “composition patterns.” “Lego”-like building blocks:  Pattern of elements composed into conditions  Pattern of conditions composed into a form So, how to realize our divide-and-conquer idea? Any computation paradigm? Q-Form Source ? Semantic Structure “Lego” Building Blocks

10 MetaQuerier 10 Query-form creation is guided by hidden syntax Our Hypothesis: Existence of Hidden-Syntax Semantic Structure (Query Conditions) Presentation (Query Interface) Hidden Syntax (Grammar) Composer Attr : title Operator : title words,…. Value : string Parser Parsing is thus a principled mechanism for the inverse

11 MetaQuerier 11 This “language” paradigm enables principled solution to a seemingly heuristic problem Essential notions: Grammar and Parser— Grammar: Pattern specification  Declarative No need to hard-code heuristics  Collective Capture both micro and macro patterns Parser: Pattern recognition  Global Coherently interpret an entire query form  Systematic Systematically assembles the building blocks

12 MetaQuerier 12 However, the hidden-syntax hypothesis itself entails challenges in its realization Hidden syntax is only hypothetical  We must derive a grammar in its place  What should be captured in a “derived grammar”?  2P-Grammar: Production + Preference productions for patterns; preferences for their precedence Derived grammar is secondary to any input  Inherently incomplete and ambiguous  What should be the machinery of a “soft parser”?  Best-effort Parser: multiple, maximal-partial parse trees

13 MetaQuerier 13 Our Paradigm: Best-Effort Visual Language Parsing Framework HTML Layout Engine Tokenizer BE-Parser Ambiguity Resolution Error Handling Output: semantic structure Input: HTML query form Productions Preferences 2P Grammar X

14 MetaQuerier 14 Grammar: Layout based TextCond :- [ left (TextAttr, TextVal)   above (TextAttr, TextVal) ]    above (TextVal, TextOp) 3 * 5 E :- E * E, or E :- sequential (E, *, E) Presentation Grammar Traditional grammar (Sequential based 1-D) Our grammar (Layout based 2-D)

15 Parser: Logic programming style Traditional parsing  Scan input sequentially Our parsing  Nonlinear input  Arbitrary constraints... fix-point iterative construction tokenization … EnumSel Form EnumRB EnumSel Form EnumRB EnumSel Parse trees

16 MetaQuerier 16 That’s not all: complications of hypothetical syntax Hidden syntax is only hypothetical ! Parser Ambiguous Multiple parse trees Incomplete Partial parse trees Grammar

17 MetaQuerier 17 Ambiguity Grammar:  Preferences to capture the conventional precedence  eg. RButton ≥ TextCond Parser:  Just-in-time pruning by preference  Multiple trees possible TextCond: Below(Attr,Selection) RButton: Left(radio,text))

18 MetaQuerier 18 Incompleteness Grammar  Cannot capture all patterns Parser :  Cannot interpret entire query interfaces  Interpret as much as possible Greedily choose the maximum parse trees Reasoning: they look at big picture and consider more context

19 MetaQuerier 19 Union all the conditions interpreted by all the parse trees. Report both conflicts and missing errors Error Handling: “Best-effort” parser can output multiple and partial parse trees EnumSel Form EnumRB EnumSel ParsingUnion EnumSel Form EnumRB EnumSel

20 MetaQuerier 20 Experiment: How a “global grammar” will do? Global grammar : Derived from Basic; captures 21 patterns 82 productions, 39 non-terminals, 16 terminals Datasets : Basic : 3 domains (Airfare, Autos, Books); 150 sources NewSource : same domains, 30 sources NewDomain : 6 new domains (Music, …), 42 sources Random : 30 sources (from invisible- web.net) Correctness judgment:  Number of correctly identified (grouping and tagging) conditions

21 MetaQuerier 21 Conclusion– Syntactic Parsing for Interface Understanding Query interface understanding by syntactic parsing with hidden grammars Insight: Exploit how semantics connects to presentation, in a syntactic way Future work:  Constructing grammar automatically  Developing more sophisticated preference framework  Extending the framework to other applications

22 MetaQuerier 22 Thank you ! For more information:  Online demo at MetaQuerier project Web site http://metaquerier.cs.uiuc.edu Invite you to our MetaQuerier demo in the afternoon


Download ppt "The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Understanding Web Query Interfaces: Best-Efforts Parsing with Hidden Syntax."

Similar presentations


Ads by Google