Presentation is loading. Please wait.

Presentation is loading. Please wait.

Identifying Meaningful Return Information for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University.

Similar presentations


Presentation on theme: "Identifying Meaningful Return Information for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University."— Presentation transcript:

1 Identifying Meaningful Return Information for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University

2 SIGMOD 2007 Searching XML Data XQueryfor $x in doc(“DB.xml”)//player $y in $x/namewhere $y = “Mutombo” return $x/position Find the position of the player with name “Mutombo” Keyword SearchMutombo, position team foundedstadium players player namepositionnationality Congo centerMutombo division 1967Toyota southwest name Rockets league team … … team … … player … Center player namepositionnationality U.S guardWells founded 1967 name Rockets

3 SIGMOD 2007 How to identify meaningful return information?  Inferring return clauses in XQuery  Limited research has been done  Users or system administrators specify [Hristidis et al 03, Li et al 04]  Whole document [Carmel et al 02]  Subtree Return [Cohen et al 03, Guo et al 03, Xu et al 05]  Path Return variants [Hristidis et al 06] Challenges in XML Keyword Search How to select relevant keyword matches and connect them?  Inferring for clauses (with variable bindings) and where clauses in XQuery  Have been much studied  XRank [Guo et al 03]  XSEarch [Cohen et al 03]  Meaningful LCA [Li et al 04]  Smallest LCA[Xu et al 05] XSeek XSeek: automatically and intelligently identifies return information

4 SIGMOD 2007 Selecting and Connecting Keyword Matches Identify relevant matches using variants of LCA concepts [Cohen et al 03, Li et al 04, Xu et al 05] Q1: Mutombo, position team foundedstadium players player namepositionnationality Congo centerMutombo division 1967Toyota southwest name Rockets league team … … team … … player … Center player namepositionnationality U.S guardWells founded 1967 name Rockets

5 SIGMOD 2007 Selecting and Connecting Keyword Matches Q1: Mutombo, position team foundedstadium players player namepositionnationality Congo centerMutombo division 1967Toyota southwest name Rockets league team … … team … … player … Center player namepositionnationality U.S guardWells founded 1967 name Rockets Given relevant matches, what should be returned?

6 SIGMOD 2007 Example I: Subtree Return Q1: Mutombo, position team foundedstadium players player namepositionnationality Congo centerMutombo division 1967Toyota southwest name Rockets league team … … team … … player … Center player namepositionnationality U.S guardWells founded 1967 name Rockets Q2: Mutombo, center

7 SIGMOD 2007 Example I: Path Return Q1: Mutombo, position team foundedstadium players player namepositionnationality Congo centerMutombo division 1967Toyota southwest name Rockets league team … … team … … player … Center player namepositionnationality U.S guardWells founded 1967 name Rockets Q2: Mutombo, center

8 SIGMOD 2007 Example I: XSeek Q1: Mutombo, position team foundedstadium players player namepositionnationality Congo centerMutombo division 1967Toyota southwest name Rockets league team … … team … … player … Center player namepositionnationality U.S guardWells founded 1967 name Rockets Q2: Mutombo, center

9 SIGMOD 2007 Example II: Subtree Return, Path Return Q3: Rockets team foundedstadium players player namepositionnationality Congo centerMutombo division 1967Toyota southwest name Rockets league team … … team … … player … Center player namepositionnationality U.S guardWells founded 1967 name Rockets

10 SIGMOD 2007 Example II: XSeek Q3: Rockets team foundedstadium players player namepositionnationality Congo centerMutombo division 1967Toyota southwest name Rockets league team … … team … … player … Center player namepositionnationality U.S guardWells founded 1967 name Rockets

11 SIGMOD 2007 Contributions XSeek: automatically infers meaningful return information for XML keyword Search  No elicitation from users or system administrators is required  No schema information is required Inferring search semantics  Analyzing XML data structure  Analyzing keyword match pattern  Determining search results based on node types and match types Efficient implementation of the search semantics Experimental verification on effectiveness and efficiency

12 SIGMOD 2007 Roadmap Motivation Inferring search semantics  Analyzing keyword match patterns  Analyzing XML data structure  Identifying search results XSeek architecture Experiments Conclusions

13 SIGMOD 2007 Analyzing Keyword Match Patterns Identifying search predicates and return nodes in keywords Examples of keyword searches  Q1: Mutombo, position  Q2: Mutombo, center  Q3: Rockets Examples of structured queries  SQL: select position from Player where name = “Mutombo”  XQuery: for $x in doc(“DB.xml”)//player where $x/name = “Mutombo” return $x/position Return Nodes Search Predicates Return Nodes Search Predicates

14 SIGMOD 2007 Analyzing XML Data Structure Three types of data nodes Entity nodes Attribute nodes Connection nodes Related work on identifying node types [Xu et al 06] team foundedstadium players player namepositionnationality Congo centerMutombo division 1967Toyota southwest name Rockets league team … … team … … player … Center player namepositionnationality U.S guardWells founded 1967 name Rockets

15 SIGMOD 2007 Identifying Search Results Search results consist of Matches to search predicates  This allows users to verify the relevance of search results Matches to return nodes  This is what the user is searching for  Matches are output according to node types  Attribute node: display name, value  Entity node: display name, attributes, optionally entity and connection descendants  Connection node: display name, optionally entity and connection descendants Nodes that connect these matches

16 SIGMOD 2007 A Search Result Example Q1: Mutombo, position team foundedstadium players player namepositionnationality Congo centerMutombo division 1967Toyota southwest name Rockets league team … … team … … player … Center player namepositionnationality U.S guardWells founded 1967 name Rockets

17 SIGMOD 2007 What if Return Nodes Are Absent? Explicit return nodes: nodes that are explicitly identified in input keywords Inferring implicit return nodes if no explicit return nodes in input keywords  Users may be interested in general information of entities that are relevant to the search  Master entity: the lowest ancestor-or-self entity of the LCA node, or the XML tree root  Relevant entity: the entities on a path from a master entity to a relevant keyword match, inclusively

18 SIGMOD 2007 Search with Implicit Return Nodes (I) team foundedstadium players player namepositionnationality Congo centerMutombo division 1967Toyota southwest name Rockets league team … … team … … player … Center player namepositionnationality U.S guardWells founded 1967 name Rockets Q2: Mutombo, center

19 SIGMOD 2007 Search with Implicit Return Nodes (II) Q3: Rockets team foundedstadium players player namepositionnationality Congo centerMutombo division 1967Toyota southwest name Rockets league team … … team … … player … Center player namepositionnationality U.S guardWells founded 1967 name Rockets

20 SIGMOD 2007 Roadmap Motivation Inferring search semantics  Analyzing keyword match patterns  Analyzing XML data structure  Identifying search results XSeek architecture Experiments Conclusions

21 SIGMOD 2007 Data Analyzer Architecture of XSeek Index Builder Keyword Matcher Match Grouper Keyword Analyzer Return Node Recognizer Result Generator Indexes Search Result XML Keywords Entities Attributes Connection nodes Search predicates Return nodes Explicit return nodes Implicit return nodes

22 SIGMOD 2007 Experimental Setup Compare the performance of  XSeek  Subtree Return  Path Return Measurements  Search quality  Speed  Scalability Data sets: Mondial, WSU, XMark benchmark Query sets: eight queries for each data set

23 SIGMOD 2007 Search Quality: Precision Precision: measures the soundness of search results XSeek in general has a precision as good as Path Return open auction, person257 seller, person179, buyer, price, date

24 SIGMOD 2007 Recall: measures the completeness of search results XSeek in general has a recall as good as Subtree Return Search Quality: Recall

25 SIGMOD 2007 F-Measure is a weighted harmonic mean of precision and recall XSeek has the best F-Measure Search Quality: F-Measure

26 SIGMOD 2007 Speed: Benchmark Data seller, person179, buyer, price, date person257, person133

27 SIGMOD 2007 Conclusions The first work that automatically infers meaningful return information for XML keyword search  No elicitation from users or system administrators, no schema information is required Analyzing keyword match patterns  Search predicates  Return nodes Analyzing XML node types  Entities  Attributes  Connection nodes Identifying two types of return information  Explicit return nodes  Implicit return nodes Outputting an XML node based on its match type and node type Experiments verify XSeek’s effectiveness and efficiency

28 Thank You! Questions? Welcome to visit XSeek demo in VLDB 07


Download ppt "Identifying Meaningful Return Information for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University."

Similar presentations


Ads by Google