Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,

Slides:

Advertisements

Similar presentations

Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.

Advertisements

Efficient Top-k Search across Heterogeneous XML Data Sources Jianxin Li 1 Chengfei Liu 1 Jeffrey Xu Yu 2 Rui Zhou 1 1 Swinburne University of Technology.

Computing Structural Similarity of Source XML Schemas against Domain XML Schema Jianxin Li 1 Chengfei Liu 1 Jeffrey Xu Yu 2 Jixue Liu 3 Guoren Wang 4 Chi.

Bottom-up Evaluation of XPath Queries Stephanie H. Li Zhiping Zou.

13/04/20151 SPARK: Top- k Keyword Query in Relational Database Wei Wang University of New South Wales Australia.

Jianxin Li, Chengfei Liu, Rui Zhou Swinburne University of Technology, Australia Wei Wang University of New South Wales, Australia Top-k Keyword Search.

Efficient IR-Style Keyword Search over Relational Databases Vagelis Hristidis University of California, San Diego Luis Gravano Columbia University Yannis.

Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.

Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng.

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.

1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.

Reasoning and Identifying Relevant Matches for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University.

Suggestion of Promising Result Types for XML Keyword Search Joint work with Jianxin Li, Chengfei Liu and Rui Zhou ( Swinburne University of Technology,

Xyleme A Dynamic Warehouse for XML Data of the Web.

DYNAMIC ELEMENT RETRIEVAL IN A STRUCTURED ENVIRONMENT MAYURI UMRANIKAR.

1 Extending PRIX for Similarity-based XML Query Group Members: Yan Qi, Jicheng Zhao, Dan Situ, Ning Liao.

XSEarch: A Semantic Search Engine for XML Sara Cohen Jonathan Mamou Yaron Kanza Yehoshua Sagiv Presented at VLDB 2003, Germany.

Query Biased Snippet Generation in XML Search Yi Chen Yu Huang, Ziyang Liu, Yi Chen Arizona State University.

Identifying Meaningful Return Information for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University.

1 MARG-DARSHAK: A Scrapbook on Web Search engines allow the users to enter keywords relating to a topic and retrieve information about internet sites (URLs)

Memoplex Browser: Searching and Browsing in Semantic Networks CPSC 533C - Project Update Yoel Lanir.

Indexing XML Data Stored in a Relational Database VLDB`2004 Shankar Pal, Istvan Cseri, Gideon Schaller, Oliver Seeliger, Leo Giakoumakis, Vasili Vasili.

LOGO XML Keyword Search Refinement 郭青松. Outline  Introduction  Query Refinement in Traditional IR  XML Keyword Query Refinement  My work.

NUITS: A Novel User Interface for Efficient Keyword Search over Databases The integration of DB and IR provides users with a wide range of high quality.

Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.

1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002.

Automated Creation of a Forms- based Database Query Interface Magesh Jayapandian H.V. Jagadish Univ. of Michigan VLDB

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

Querying Structured Text in an XML Database By Xuemei Luo.

NaLIX Natural Language Interface for querying XML Huahai Yang Department of Information Studies Joint work with Yunyao Li and H.V. Jagadish at University.

April 14, 2003Hang Cui, Ji-Rong Wen and Tat- Seng Chua 1 Hierarchical Indexing and Flexible Element Retrieval for Structured Document Hang Cui School of.

EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Cuoliang Li, Beng Chin Ooi, Jianhua Feng, Jianyong.

Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.

Chapter 6: Information Retrieval and Web Search

Q2Semantic: A Lightweight Keyword Interface to Semantic Search Haofen Wang 1, Kang Zhang 1, Qiaoling Liu 1, Thanh Tran 2, and Yong Yu 1 1 Apex Lab, Shanghai.

Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.

Efficient Instant-Fuzzy Search with Proximity Ranking Authors: Inci Centidil, Jamshid Esmaelnezhad, Taewoo Kim, and Chen Li IDCE Conference 2014 Presented.

Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.

Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,

A Practical Web-based Approach to Generating Topic Hierarchy for Text Segments CIKM2004 Speaker ： Yao-Min Huang Date ： 2005/03/10.

Effective Keyword-Based Selection of Relational Databases By Bei Yu, Guoliang Li, Karen Sollins & Anthony K. H. Tung Presented by Deborah Kallina.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Improving the performance of personal name disambiguation.

Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.

1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.

Sudhanshu Khemka.  Treats each document as a vector with one component corresponding to each term in the dictionary  Weight of a component is calculated.

Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.

From Region Encoding To Extended Dewey: On Efficient Processing of XML Twig Pattern Matching Jiaheng Lu, Tok Wang Ling, Chee-Yong Chan, Ting Chen National.

Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.

Holistic Twig Joins Optimal XML Pattern Matching Nicolas Bruno Columbia University Nick Koudas Divesh Srivastava AT&T Labs-Research SIGMOD 2002.

1 Holistic Twig Joins: Optimal XML Pattern Matching Nicolas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 2002 Presented by Jun-Ki Min.

GENERATING RELEVANT AND DIVERSE QUERY PHRASE SUGGESTIONS USING TOPICAL N-GRAMS ELENA HIRST.

MMM2005The Chinese University of Hong Kong MMM2005 The Chinese University of Hong Kong 1 Video Summarization Using Mutual Reinforcement Principle and Shot.

Using category-Based Adherence to Cluster Market-Basket Data Author : Ching-Huang Yun, Kun-Ta Chuang, Ming-Syan Chen Graduate : Chien-Ming Hsiao.

Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,

Integrated Departmental Information Service IDIS provides integration in three aspects Integrate relational querying and text retrieval Integrate search.

Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.

1 Efficient Processing of Partially Specified Twig Queries Junfeng Zhou Renmin University of China.

Querying Structured Text in an XML Database Shurug Al-Khalifa Cong Yu H. V. Jagadish (University of Michigan) Presented by Vedat Güray AFŞAR & Esra KIRBAŞ.

Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance Hello everyone,

Associative Query Answering via Query Feature Similarity

(b) Tree representation

Structure and Content Scoring for XML

Learning Literature Search Models from Citation Behavior

Early Profile Pruning on XML-aware Publish-Subscribe Systems

MCN: A New Semantics Towards Effective XML Keyword Search

Structure and Content Scoring for XML

Efficient Processing of Top-k Spatial Preference Queries

WSExpress: A QoS-Aware Search Engine for Web Services

Introduction to XML IR XML Group.

CoXML: A Cooperative XML Query Answering System

Presentation transcript:

Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology, Australia

Outline Motivation of Keyword Search in XML Brief Review of Related Work Existing Problems Construct Structured Query Templates Ranking Function Processing Algorithms Conclusions

Motivation of XML Keyword Search Keyword search is easy-to-use  Users don’t need to know the structure of XML data and specific query languages.  The XML data with different structures can be searched equivalently by a keyword query because it doesn’t specify the structures of the retrieved results.

Brief Review of Related Work We focus on 4 references using label and term as keyword query format:  [YunyaoLi2004VLDB] Schema-Free XQuery.  [DanielaFlorescu2002ComputerNetworks] Integrating keyword search into XML query processing.  [SaraCohen2003VLDB] XSEarch: A semantic search engine for XML.  [WeidongYang2007CIT] Schema-aware keyword search over xml streams. Other relevant work can be found in our paper.

Brief Review of Related Work All the four work utilized label and term as keyword query format. The difference: the first three work shared the similar basic strategy that first retrieves the relevant keyword lists and then merges them into the results; while the last one first generate a big template that covers all the kinds of results w.r.t. XML schema and then cache the possible results over xml streams. Template-based strategy can obtain better performance [WeidongYang2007CIT] !

Existing Problems [WeidongYang2007CIT] was used to query over XML streams, which is not enough because of the challenges:  Different templates may exist in one XML data repository.  Users prefer to see part of the results, e.g., top k results.  Domain knowledge can be helped to process the labels with the same meaning. Therefore, it is required to study the problem of applying template-based keyword search strategy to XML data repository.

Construct Structured Query Templates Example: There are two data sources that conform to t1 and t2 respectively. Schema t1Schema t2 Keyword query – (year:2006, title:xml, author:philip)

Construct Structured Query Templates Identifying context of keywords  Determine master entities using labels in keyword query and XML schema.  Generate FOR clause for each entity.  Judge the occurrences of every label under each master entity.  Once a time – Generate WHERE clauses  More than once – First cluster and then generate WHERE clauses.

Step 1: determine master entity and its corresponding label set  Q1 = “ For $b in bibliography/books/book ”  Q2 = “ For $a in bibliography/articles/article ” Schema t1 Step 2: only one occurrence of each label in each master entity.  Q1 += “ Where $b/year=‘2006’ and $b/title.contains(xml) and $b/author.contains(philip)”  Q2 += “ Where $a/year=‘2006’ and $a/title.contains(xml) and $a/author.contains(philip)” Keyword query – (year:2006, title:xml, author:philip)

Schema t2 Step 1: determine master entity and its corresponding label set  Q = “For $bi in bibliography/bib” Step 2: only two occurrences of each label in the master entity. Cluster title and author using book and article respectively  Q1 += Q + “For $bo in $bi/book”  Q2 += Q + “For $a in $bi/article” Keyword query – (year:2006, title:xml, author:philip) Step 3: only one occurrence of each label in each cluster.  Q1 += “ Where $bi/year=‘2006’ and $bo/title.contains(xml) and $bo/author.contains(philip)”  Q2 …

Construct Structured Query Templates Identifying returned nodes  Step1: If the cardinality of a master entity satisfies “*” and no cluster operation is activated, we take the master entity as a return node in constructed queries;  Step 2: If the cardinality of a master entity satisfies “*” and clusters are generated, we first check the root node of each cluster in a recursive procedure (back to step 1);  Step 3: If the cardinality of a master entity does not satisfy “*”, we will probe its ancestor nodes one by one until this kind of node exists or the root of the xml schema.

Schema t1 Master entities are the returned nodes.  Q1 += “$b ”  Q2 += “$a ” Keyword query – (year:2006, title:xml, author:philip) Schema t2 Roots of clusters are the returned nodes.  Q1 += “$bo ”  Q2 += “$a ” The constructed queries can be read in our paper!

Ranking Function  v m is the master entity nodes;  ω(v i, t i ) is calculated by using tf*idf weight model. Feature of the function: The Score() consists of two parts ContextScore() and tf*idf weight, and the former is the upper bound of the score of the results.

Processing Strategy Algorithm 1 is used to generate structured queries with their corresponding context score. Algorithm 2 is used to schedule the query plan according to the conditions:  Users’ requirements, e.g., number of results;  Context scores of all generated queries;  And the intermediate results.

Experiments Dataset:  Sigmod record  three variant of DBLP Keyword Queries:  q1 (author:David, title:XML)  q2 (year:2002, title:XML)

Experimental Results q1q1 q2q2 q 1 (k = 10) q 2 (k = 20)

Conclusions XBridge is proposed to process keyword query over XML data repository, which can efficiently find the top k results by evaluating generated structured queries. A precise ranking function is provided to evaluate the relevance of the results. Limitation of this work:  We take XML schema as tree patterns;  We didn’t consider reference relationships of XML data.