CoXML: A Cooperative XML Query Answering System

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
Chapter 5: Introduction to Information Retrieval
Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
Fast Algorithms For Hierarchical Range Histogram Constructions
Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.
Search Engines and Information Retrieval
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
IR Models: Structural Models
ADVISE: Advanced Digital Video Information Segmentation Engine
DYNAMIC ELEMENT RETRIEVAL IN A STRUCTURED ENVIRONMENT MAYURI UMRANIKAR.
1 Configurable Indexing and Ranking for XML Information Retrieval Shaorong Liu, Qinghua Zou and Wesley W. Chu UCLA Computer Science Department {sliu, zou,
Retrieval Evaluation: Precision and Recall. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity.
Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions.
Queensland University of Technology An Ontology-based Mining Approach for User Search Intent Discovery Yan Shen, Yuefeng Li, Yue Xu, Renato Iannella, Abdulmohsen.
1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman
1 MARG-DARSHAK: A Scrapbook on Web Search engines allow the users to enter keywords relating to a topic and retrieve information about internet sites (URLs)
Quality-driven Integration of Heterogeneous Information System by Felix Naumann, et al. (VLDB1999) 17 Feb 2006 Presented by Heasoo Hwang.
Scalable Text Mining with Sparse Generative Models
Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.
Query Relaxation for XML Database Award #: PI: Wesley W. Chu Computer Science Dept. UCLA.
Chapter 5: Information Retrieval and Web Search
Indexing XML Data Stored in a Relational Database VLDB`2004 Shankar Pal, Istvan Cseri, Gideon Schaller, Oliver Seeliger, Leo Giakoumakis, Vasili Vasili.
Chapter 4 Query Languages.... Introduction Cover different kinds of queries posed to text retrieval systems Keyword-based query languages  include simple.
Introduction n Keyword-based query answering considers that the documents are flat i.e., a word in the title has the same weight as a word in the body.
Search Engines and Information Retrieval Chapter 1.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Topical Crawlers for Building Digital Library Collections Presenter: Qiaozhu Mei.
Querying Structured Text in an XML Database By Xuemei Luo.
1 University of Palestine Topics In CIS ITBS 3202 Ms. Eman Alajrami 2 nd Semester
Approximate XML Joins Huang-Chun Yu Li Xu. Introduction XML is widely used to integrate data from different sources. Perform join operation for XML documents:
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
Controlling Overlap in Content-Oriented XML Retrieval Charles L. A. Clarke School of Computer Science University of Waterloo Waterloo, Canada.
Chapter 6: Information Retrieval and Web Search
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Efficient Instant-Fuzzy Search with Proximity Ranking Authors: Inci Centidil, Jamshid Esmaelnezhad, Taewoo Kim, and Chen Li IDCE Conference 2014 Presented.
Alexey Kolosoff, Michael Bogatyrev 1 Tula State University Faculty of Cybernetics Laboratory of Information Systems.
BNCOD07Indexing & Searching XML Documents based on Content and Structure Synopses1 Indexing and Searching XML Documents based on Content and Structure.
Effective Keyword-Based Selection of Relational Databases By Bei Yu, Guoliang Li, Karen Sollins & Anthony K. H. Tung Presented by Deborah Kallina.
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.
Text Clustering Hongning Wang
Generating Query Substitutions Alicia Wood. What is the problem to be solved?
Identifying Spam Web Pages Based on Content Similarity Sole Pera CS 653 – Term paper project.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
Outline Introduction State-of-the-art solutions
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Text Based Information Retrieval
Cristian Ferent and Alex Doboli
Probabilistic Data Management
Associative Query Answering via Query Feature Similarity
Toshiyuki Shimizu (Kyoto University)
Structure and Content Scoring for XML
Presented by: Jacky Ma Date: 11 Dec 2001
Early Profile Pruning on XML-aware Publish-Subscribe Systems
Block Matching for Ontologies
Chapter 5: Information Retrieval and Web Search
Structure and Content Scoring for XML
Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.
INF 141: Information Retrieval
Recuperação de Informação B
Information Retrieval and Web Design
Relax and Adapt: Computing Top-k Matches to XPath Queries
Introduction to Search Engines
Introduction to XML IR XML Group.
Presentation transcript:

CoXML: A Cooperative XML Query Answering System Shaorong Liu and Wesley W. Chu APWeb/WAIM 2007 12/6/2019

Motivation XML has become the standard format for information representation and data exchange XML schema is usually very complex E.g., the XML schema for the IEEE Computer Society publications contains about 170 distinct tags and more than 1000 distinct paths It is often unrealistic for users to fully understand a schema before asking queries Exact query answering is inadequate and approximate query answering is more desirable! 12/6/2019

Our Contribution: CoXML Query Approximate Answers Cooperative XML Query Answering XML Database Engine XML Documents A new paradigm for XML approximate query answering that places users and their demands in the center of the design approach 12/6/2019

Roadmap Introduction Background CoXML Related Work Conclusion 12/6/2019

XML Query Relaxation Types Value relaxation: enlarging a value condition’s search scope Node relabel: changing the label a node to a similar or a more general label by domain knowledge article title year search engine 2003 section spam detection 2000-2005 article title year search engine 2003 section spam detection document 12/6/2019 [1] Tree Pattern Relaxation (S. Amer-Yahia, et al., 2000)

XML Query Relaxation Types Edge generalization: relaxing a ‘/’ edge to a ‘//’ edge Node deletion: dropping a node from a query tree article title year search engine 2003 section spam detection article title year search engine 2003 section spam detection 12/6/2019

XML Relaxation Properties Definition Relaxation operation: an application of a relaxation type to a specific query node or edge Lemma Given a query tree with n applicable relaxation operations, there are potentially up to 2n relaxed trees Possible combinations: 12/6/2019

XML Query Relaxation Challenges Query relaxation is often user-specific Different users may have different approximate matching specifications for a given query tree How to provide user-specific approximate query answering? A query with n relaxation operations has potentially up to 2n relaxed queries How to systematically relax a query? Query relaxation generates a set of approximate answers How to effectively rank the returned approximate answers? 12/6/2019

CoXML System Overview Ranking Module Relaxation Engine Relaxation relaxation language ranked results Relaxation Index query Ranking Module Relaxation Engine results relaxed query Relaxation Index Builder query exact answers CoXML XML Database Engine XML Documents 12/6/2019

Roadmap Introduction Background CoXML Related Work Conclusion Relaxation Language Relaxation Index Structure Ranking of Approximate Answers Experimental Studies Related Work Conclusion 12/6/2019

Relaxation Language A relaxation-enabled query is a tuple {T, R, C, S} T: tree-pattern query R: relaxation constructs E.g., delete/re-label a node, generalize an edge C: relaxation controls E.g., prefer/reject certain relaxation operations, use certain relaxation types, control relaxation orders, etc S: stop condition E.g., the minimum # of approximate answers to be returned 12/6/2019

Relaxation Language Example <inex_topic topic_id="267" > <castitle> //article//fm//atl[about(., "digital libraries")] </castitle> <description> Articles containing "digital libraries" in their title. </description> <narrative> I'm interested in articles discussing Digital Libraries as their main subject. Therefore I require that the title of any relevant article mentions "digital library" explicitly. Documents that mention digital libraries only under the bibliography are not relevant, as well as documents that do not have the phrase "digital library" in their title. </narrative> </inex_topic> article fm atl “digital libraries” $1 $2 $3 C = !Rel($3, -)  !Del($3)  Reject($2, bb) !Rel($3, -) : $3 cannot be re-labeled !Del($3): $3 cannot be deleted Reject($2, bb): $2 cannot be re-labeled to bb 12/6/2019

How to Relax Queries? Naïve approach Observation Generate all possible relaxed queries & iteratively select the best relaxed query to derive approximate answers Exhaustive, but not scalable Observation Many queries share the same (or similar) tree structures Our approach: relaxation index structure Consider the structure of a query tree T as a template Build indexes on the relaxed trees of T Use the index to guide the relaxations of any query with the same (or similar) tree structure as that of T 12/6/2019

Relaxation Index Structure - XTAH A hierarchical multi-level labeled cluster of relaxed trees for a given query tree Building an XTAH Given a query structure template T, generate all possible relaxed trees Each relaxed trees uses an unique set of relaxation operations Cluster relaxed trees into groups based on relaxation operations and distances -- similar to “suffix-tree” clustering 12/6/2019

XTAH Example for Template Structure T {gen(e$1,$2)} … {gen(e$3, $4)} {del($2)} node_relabel edge_generalization node_deletion relax {gen(e$3, $4), gen(e$1,$3)} ... article body section T6 {gen(e$1, $2), gen(e$3, $4)} {del($2), del($3)} title T2 T4 T3 T1 T7 article title body section $1 $2 $3 $4 Template structure T gen(e$u, $v) – relaxing the edge between $u and $v del($u) – deleting the node $u 12/6/2019

XTAH Properties Each group consists of a set of relaxed trees derived from similar relaxation operations The relaxed trees can be located efficiently based on the type of relaxation operation The higher level group in the XTAH yields lesser relaxation than the lower group Query can be relaxed to different level of granularities by traversing up and down the XTAH 12/6/2019

Ranking of XML Approximate Answers Content similarity – cont_sim(A, Q) An extended vector space model [2] Structure similarity – struct_dist(A, Q) Use tree editing distance for measuring structure similarity Propose a cost model that assigns operation cost based on relaxation semantics Overall relevancy – sim(A, Q) A ranking model combing both content similarity and structure distance  is a small constant between 0 and 1 12/6/2019 [2] Configurable Indexing and Ranking for XML Information Retrieval (S. Liu, et al., 2004)

Experimental Studies Experiment Setup Evaluation Metrics INEX (INitiative for the Evaluation of Xml) 05 test collection Document collection Query set Gold standard Evaluation Metrics nxCG (normalized extended cumulative gain) the official evaluation metric used in INEX 05 Given a number i (i1), nxCG@i, similar to precision@i, measures the relative gain users accumulated up to the rank i 12/6/2019

Retrieval performance improvements with semantic cost model Query set: all content-and-structure queries in INEX 05 nxCG@10 (, cost model)  Cost Model 0.1 0.3 0.5 0.7 0.9 Uniform 0.2584 0.2616 0.2828 0.2894 0.2916 Semantic 0.3319 (+28.44%) 0.3190 (+21.94%) 0.3196 (+13.04%) 0.3068 (+6%) 0.2957 (+4.08%) Assigning relaxation operation with different cost based on the similarities of the nodes being operated improves retrieval performance! nxCG@25 and nxCG@50 yield similar results 12/6/2019

Evaluation of Relaxation Control article fm atl “digital libraries” $1 $2 $3 C = !Rel($3, -)  !Del($3)  Reject($2, bb) Query: topic 267 Result: Evaluation Metric Method nxCG@10 nxCG@25 No relaxation control 0.1013 0.2365 With relaxation control 1.0 0.8986 Perfect accuracy Relaxation control enables the system to provide answers with greater relevancy! 12/6/2019

Related Work Relaxation based on schema conversions ([LC01, LMC01], [LMC03]) Without structure relaxation Native XML relaxation Proposed structure relaxation types [e.g., KS01, ACS02] Used the relaxation types [ACS02] in our work Investigate efficient algorithms for deriving top-K answers based on relaxation types [e.g, Sch02, ACS02, ALP04, AKM05] Without relaxation control 12/6/2019

Conclusion Cooperative XML (CoXML) query answering Relaxation-enabled query language allows users to effectively express the relaxed query conditions as well as controlling the relaxation process XTAH provides systematic query relaxation guidance Used both content and structure similarity metrics for evaluating the relevancy of approximate answers Evaluation studies with the INEX test collections validate the effectiveness of our methodology 12/6/2019