Identifying Meaningful Return Information for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University.

Slides:



Advertisements
Similar presentations
Effective XML Keyword Search with Relevance Oriented Ranking Zhifeng Bao, Tok Wang Ling, Bo Chen, Jiaheng Lu 1.
Advertisements

Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA.
Querying on the Web: XQuery, RDQL, SparQL Semantic Web - Spring 2006 Computer Engineering Department Sharif University of Technology.
Efficient Top-k Search across Heterogeneous XML Data Sources Jianxin Li 1 Chengfei Liu 1 Jeffrey Xu Yu 2 Rui Zhou 1 1 Swinburne University of Technology.
A Prototype Implementation of a Framework for Organising Virtual Exhibitions over the Web Ali Elbekai, Nick Rossiter School of Computing, Engineering and.
Haystack: Per-User Information Environment 1999 Conference on Information and Knowledge Management Eytan Adar et al Presented by Xiao Hu CS491CXZ.
Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.
Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
BY ANISH D. SARMA, XIN DONG, ALON HALEVY, PROCEEDINGS OF SIGMOD'08, VANCOUVER, BRITISH COLUMBIA, CANADA, JUNE 2008 Bootstrapping Pay-As-You-Go Data Integration.
13/04/20151 SPARK: Top- k Keyword Query in Relational Database Wei Wang University of New South Wales Australia.
Jianxin Li, Chengfei Liu, Rui Zhou Swinburne University of Technology, Australia Wei Wang University of New South Wales, Australia Top-k Keyword Search.
Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,
Efficient Keyword Search for Smallest LCAs in XML Database Yu Xu Department of Computer Science & Engineering University of California, San Diego Yannis.
DISCOVER: Keyword Search in Relational Databases Vagelis Hristidis University of California, San Diego Yannis Papakonstantinou University of California,
TIMBER A Native XML Database Xiali He The Overview of the TIMBER System in University of Michigan.
Reasoning and Identifying Relevant Matches for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University.
Suggestion of Promising Result Types for XML Keyword Search Joint work with Jianxin Li, Chengfei Liu and Rui Zhou ( Swinburne University of Technology,
Xyleme A Dynamic Warehouse for XML Data of the Web.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
DYNAMIC ELEMENT RETRIEVAL IN A STRUCTURED ENVIRONMENT MAYURI UMRANIKAR.
Comparing path-based and vertically-partitioned RDF databases Preetha Lakshmi & Chris Mueller 12/10/2007 CSCI 8715 Shashi Shekhar.
A Graphical Environment to Query XML Data with XQuery
1 Configurable Indexing and Ranking for XML Information Retrieval Shaorong Liu, Qinghua Zou and Wesley W. Chu UCLA Computer Science Department {sliu, zou,
FACT: A Learning Based Web Query Processing System Hongjun Lu, Yanlei Diao Hong Kong U. of Science & Technology Songting Chen, Zengping Tian Fudan University.
Flexible and Efficient XML Search with Complex Full-Text Predicates Sihem Amer-Yahia - AT&T Labs Research → Yahoo! Research Emiran Curtmola - University.
XSEarch: A Semantic Search Engine for XML Sara Cohen Jonathan Mamou Yaron Kanza Yehoshua Sagiv Presented at VLDB 2003, Germany.
Comparing path-based and vertically-partitioned RDF databases Preetha Lakshmi & Chris Mueller 12/10/2007 CSCI 8715 Shashi Shekhar.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Query Biased Snippet Generation in XML Search Yi Chen Yu Huang, Ziyang Liu, Yi Chen Arizona State University.
Effective XML Keyword Search with Relevance Oriented Ranking Paper by: Zhifeng Bao, Tok Wang Ling, Bo Chen, Jiaheng Lu Presented by: Ilanit Goldshtein.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
Sangam: A Transformation Modeling Framework Kajal T. Claypool (U Mass Lowell) and Elke A. Rundensteiner (WPI)
Memoplex Browser: Searching and Browsing in Semantic Networks CPSC 533C - Project Update Yoel Lanir.
BYU Data Extraction Group Funded by NSF1 Brigham Young University Li Xu Source Discovery and Schema Mapping for Data Integration.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
BACKGROUND KNOWLEDGE IN ONTOLOGY MATCHING Pavel Shvaiko joint work with Fausto Giunchiglia and Mikalai Yatskevich INFINT 2007 Bertinoro Workshop on Information.
1 Maintaining Semantics in the Design of Valid and Reversible SemiStructured Views Yabing Chen, Tok Wang Ling, Mong Li Lee Department of Computer Science.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Pattern tree algebras: sets or sequences? Stelios Paparizos, H. V. Jagadish University of Michigan Ann Arbor, MI USA.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
Querying Structured Text in an XML Database By Xuemei Luo.
NaLIX Natural Language Interface for querying XML Huahai Yang Department of Information Studies Joint work with Yunyao Li and H.V. Jagadish at University.
EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Cuoliang Li, Beng Chin Ooi, Jianhua Feng, Jianyong.
1 Flexible Querying of XML Documents Krishnaprasad Thirunarayan and Trivikram Immaneni Department of Computer Science and Engineering Wright State University.
Crimson: A Data Management System to Support Evaluating Phylogenetic Tree Reconstruction Algorithms Yifeng Zheng, Stephen Fisher, Shirley cohen, Sheng.
Database Systems Part VII: XML Querying Software School of Hunan University
Q2Semantic: A Lightweight Keyword Interface to Semantic Search Haofen Wang 1, Kang Zhang 1, Qiaoling Liu 1, Thanh Tran 2, and Yong Yu 1 1 Apex Lab, Shanghai.
Question Answering over Implicitly Structured Web Content
Efficient Instant-Fuzzy Search with Proximity Ranking Authors: Inci Centidil, Jamshid Esmaelnezhad, Taewoo Kim, and Chen Li IDCE Conference 2014 Presented.
BNCOD07Indexing & Searching XML Documents based on Content and Structure Synopses1 Indexing and Searching XML Documents based on Content and Structure.
Keyword Searching Weighted Federated Search with Key Word in Context Date: 10/2/2008 Dan McCreary President Dan McCreary & Associates
[ Part III of The XML seminar ] Presenter: Xiaogeng Zhao A Introduction of XQL.
1 Automatic Generation of XQuery View Definitions from ORA-SS Views Ya Bing Chen Tok Wang Ling Mong Li Lee School of Computing National University of Singapore.
Templated Search over Relational Databases Date: 2015/01/15 Author: Anastasios Zouzias, Michail Vlachos, Vagelis Hristidis Source: ACM CIKM’14 Advisor:
CHI 2003 – Visualization & Navigation1 Efficient User Interest Estimation in Fisheye Views Jeffrey Heer and Stuart K. Card 1 Palo Alto Research.
Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.
ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, Keyword Search on Relational Data Streams Alexander Markowetz Yin.
A Novel Visualization Model for Web Search Results Nguyen T, and Zhang J IEEE Transactions on Visualization and Computer Graphics PAWS Meeting Presented.
Session 1 Module 1: Introduction to Data Integrity
Co-funded by the European Union Semantic CMS Community Reference Architecture for Semantic CMS Copyright IKS Consortium 1 Lecturer Organization Date of.
Query Caching and View Selection for XML Databases Bhushan Mandhani Dan Suciu University of Washington Seattle, USA.
Querying Structured Text in an XML Database Shurug Al-Khalifa Cong Yu H. V. Jagadish (University of Michigan) Presented by Vedat Güray AFŞAR & Esra KIRBAŞ.
1 Keyword Search over XML. 2 Inexact Querying Until now, our queries have been complex patterns, represented by trees or graphs Such query languages are.
Probabilistic Data Management
MCN: A New Semantics Towards Effective XML Keyword Search
Combining Keyword and Semantic Search for Best Effort Information Retrieval  Andrew Zitzelberger 1.
Flexible Querying of XML Documents
Introduction to XML IR XML Group.
CoXML: A Cooperative XML Query Answering System
Presentation transcript:

Identifying Meaningful Return Information for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University

SIGMOD 2007 Searching XML Data XQueryfor $x in doc(“DB.xml”)//player $y in $x/namewhere $y = “Mutombo” return $x/position Find the position of the player with name “Mutombo” Keyword SearchMutombo, position team foundedstadium players player namepositionnationality Congo centerMutombo division 1967Toyota southwest name Rockets league team … … team … … player … Center player namepositionnationality U.S guardWells founded 1967 name Rockets

SIGMOD 2007 How to identify meaningful return information?  Inferring return clauses in XQuery  Limited research has been done  Users or system administrators specify [Hristidis et al 03, Li et al 04]  Whole document [Carmel et al 02]  Subtree Return [Cohen et al 03, Guo et al 03, Xu et al 05]  Path Return variants [Hristidis et al 06] Challenges in XML Keyword Search How to select relevant keyword matches and connect them?  Inferring for clauses (with variable bindings) and where clauses in XQuery  Have been much studied  XRank [Guo et al 03]  XSEarch [Cohen et al 03]  Meaningful LCA [Li et al 04]  Smallest LCA[Xu et al 05] XSeek XSeek: automatically and intelligently identifies return information

SIGMOD 2007 Selecting and Connecting Keyword Matches Identify relevant matches using variants of LCA concepts [Cohen et al 03, Li et al 04, Xu et al 05] Q1: Mutombo, position team foundedstadium players player namepositionnationality Congo centerMutombo division 1967Toyota southwest name Rockets league team … … team … … player … Center player namepositionnationality U.S guardWells founded 1967 name Rockets

SIGMOD 2007 Selecting and Connecting Keyword Matches Q1: Mutombo, position team foundedstadium players player namepositionnationality Congo centerMutombo division 1967Toyota southwest name Rockets league team … … team … … player … Center player namepositionnationality U.S guardWells founded 1967 name Rockets Given relevant matches, what should be returned?

SIGMOD 2007 Example I: Subtree Return Q1: Mutombo, position team foundedstadium players player namepositionnationality Congo centerMutombo division 1967Toyota southwest name Rockets league team … … team … … player … Center player namepositionnationality U.S guardWells founded 1967 name Rockets Q2: Mutombo, center

SIGMOD 2007 Example I: Path Return Q1: Mutombo, position team foundedstadium players player namepositionnationality Congo centerMutombo division 1967Toyota southwest name Rockets league team … … team … … player … Center player namepositionnationality U.S guardWells founded 1967 name Rockets Q2: Mutombo, center

SIGMOD 2007 Example I: XSeek Q1: Mutombo, position team foundedstadium players player namepositionnationality Congo centerMutombo division 1967Toyota southwest name Rockets league team … … team … … player … Center player namepositionnationality U.S guardWells founded 1967 name Rockets Q2: Mutombo, center

SIGMOD 2007 Example II: Subtree Return, Path Return Q3: Rockets team foundedstadium players player namepositionnationality Congo centerMutombo division 1967Toyota southwest name Rockets league team … … team … … player … Center player namepositionnationality U.S guardWells founded 1967 name Rockets

SIGMOD 2007 Example II: XSeek Q3: Rockets team foundedstadium players player namepositionnationality Congo centerMutombo division 1967Toyota southwest name Rockets league team … … team … … player … Center player namepositionnationality U.S guardWells founded 1967 name Rockets

SIGMOD 2007 Contributions XSeek: automatically infers meaningful return information for XML keyword Search  No elicitation from users or system administrators is required  No schema information is required Inferring search semantics  Analyzing XML data structure  Analyzing keyword match pattern  Determining search results based on node types and match types Efficient implementation of the search semantics Experimental verification on effectiveness and efficiency

SIGMOD 2007 Roadmap Motivation Inferring search semantics  Analyzing keyword match patterns  Analyzing XML data structure  Identifying search results XSeek architecture Experiments Conclusions

SIGMOD 2007 Analyzing Keyword Match Patterns Identifying search predicates and return nodes in keywords Examples of keyword searches  Q1: Mutombo, position  Q2: Mutombo, center  Q3: Rockets Examples of structured queries  SQL: select position from Player where name = “Mutombo”  XQuery: for $x in doc(“DB.xml”)//player where $x/name = “Mutombo” return $x/position Return Nodes Search Predicates Return Nodes Search Predicates

SIGMOD 2007 Analyzing XML Data Structure Three types of data nodes Entity nodes Attribute nodes Connection nodes Related work on identifying node types [Xu et al 06] team foundedstadium players player namepositionnationality Congo centerMutombo division 1967Toyota southwest name Rockets league team … … team … … player … Center player namepositionnationality U.S guardWells founded 1967 name Rockets

SIGMOD 2007 Identifying Search Results Search results consist of Matches to search predicates  This allows users to verify the relevance of search results Matches to return nodes  This is what the user is searching for  Matches are output according to node types  Attribute node: display name, value  Entity node: display name, attributes, optionally entity and connection descendants  Connection node: display name, optionally entity and connection descendants Nodes that connect these matches

SIGMOD 2007 A Search Result Example Q1: Mutombo, position team foundedstadium players player namepositionnationality Congo centerMutombo division 1967Toyota southwest name Rockets league team … … team … … player … Center player namepositionnationality U.S guardWells founded 1967 name Rockets

SIGMOD 2007 What if Return Nodes Are Absent? Explicit return nodes: nodes that are explicitly identified in input keywords Inferring implicit return nodes if no explicit return nodes in input keywords  Users may be interested in general information of entities that are relevant to the search  Master entity: the lowest ancestor-or-self entity of the LCA node, or the XML tree root  Relevant entity: the entities on a path from a master entity to a relevant keyword match, inclusively

SIGMOD 2007 Search with Implicit Return Nodes (I) team foundedstadium players player namepositionnationality Congo centerMutombo division 1967Toyota southwest name Rockets league team … … team … … player … Center player namepositionnationality U.S guardWells founded 1967 name Rockets Q2: Mutombo, center

SIGMOD 2007 Search with Implicit Return Nodes (II) Q3: Rockets team foundedstadium players player namepositionnationality Congo centerMutombo division 1967Toyota southwest name Rockets league team … … team … … player … Center player namepositionnationality U.S guardWells founded 1967 name Rockets

SIGMOD 2007 Roadmap Motivation Inferring search semantics  Analyzing keyword match patterns  Analyzing XML data structure  Identifying search results XSeek architecture Experiments Conclusions

SIGMOD 2007 Data Analyzer Architecture of XSeek Index Builder Keyword Matcher Match Grouper Keyword Analyzer Return Node Recognizer Result Generator Indexes Search Result XML Keywords Entities Attributes Connection nodes Search predicates Return nodes Explicit return nodes Implicit return nodes

SIGMOD 2007 Experimental Setup Compare the performance of  XSeek  Subtree Return  Path Return Measurements  Search quality  Speed  Scalability Data sets: Mondial, WSU, XMark benchmark Query sets: eight queries for each data set

SIGMOD 2007 Search Quality: Precision Precision: measures the soundness of search results XSeek in general has a precision as good as Path Return open auction, person257 seller, person179, buyer, price, date

SIGMOD 2007 Recall: measures the completeness of search results XSeek in general has a recall as good as Subtree Return Search Quality: Recall

SIGMOD 2007 F-Measure is a weighted harmonic mean of precision and recall XSeek has the best F-Measure Search Quality: F-Measure

SIGMOD 2007 Speed: Benchmark Data seller, person179, buyer, price, date person257, person133

SIGMOD 2007 Conclusions The first work that automatically infers meaningful return information for XML keyword search  No elicitation from users or system administrators, no schema information is required Analyzing keyword match patterns  Search predicates  Return nodes Analyzing XML node types  Entities  Attributes  Connection nodes Identifying two types of return information  Explicit return nodes  Implicit return nodes Outputting an XML node based on its match type and node type Experiments verify XSeek’s effectiveness and efficiency

Thank You! Questions? Welcome to visit XSeek demo in VLDB 07