CoXML: A Cooperative XML Query Answering System

CoXML: A Cooperative XML Query Answering System
Shaorong Liu and Wesley W. Chu APWeb/WAIM 2007 12/6/2019

Motivation XML has become the standard format for information representation and data exchange XML schema is usually very complex E.g., the XML schema for the IEEE Computer Society publications contains about 170 distinct tags and more than 1000 distinct paths It is often unrealistic for users to fully understand a schema before asking queries Exact query answering is inadequate and approximate query answering is more desirable! 12/6/2019

Our Contribution: CoXML
Query Approximate Answers Cooperative XML Query Answering XML Database Engine XML Documents A new paradigm for XML approximate query answering that places users and their demands in the center of the design approach 12/6/2019

Roadmap Introduction Background CoXML Related Work Conclusion
12/6/2019

XML Query Relaxation Types
Value relaxation: enlarging a value condition’s search scope Node relabel: changing the label a node to a similar or a more general label by domain knowledge article title year search engine 2003 section spam detection article title year search engine 2003 section spam detection document 12/6/2019 [1] Tree Pattern Relaxation (S. Amer-Yahia, et al., 2000)

XML Query Relaxation Types
Edge generalization: relaxing a ‘/’ edge to a ‘//’ edge Node deletion: dropping a node from a query tree article title year search engine 2003 section spam detection article title year search engine 2003 section spam detection 12/6/2019

XML Relaxation Properties
Definition Relaxation operation: an application of a relaxation type to a specific query node or edge Lemma Given a query tree with n applicable relaxation operations, there are potentially up to 2n relaxed trees Possible combinations: 12/6/2019

XML Query Relaxation Challenges
Query relaxation is often user-specific Different users may have different approximate matching specifications for a given query tree How to provide user-specific approximate query answering? A query with n relaxation operations has potentially up to 2n relaxed queries How to systematically relax a query? Query relaxation generates a set of approximate answers How to effectively rank the returned approximate answers? 12/6/2019

CoXML System Overview Ranking Module Relaxation Engine Relaxation
relaxation language ranked results Relaxation Index query Ranking Module Relaxation Engine results relaxed query Relaxation Index Builder query exact answers CoXML XML Database Engine XML Documents 12/6/2019

Roadmap Introduction Background CoXML Related Work Conclusion
Relaxation Language Relaxation Index Structure Ranking of Approximate Answers Experimental Studies Related Work Conclusion 12/6/2019

Relaxation Language A relaxation-enabled query is a tuple {T, R, C, S}
T: tree-pattern query R: relaxation constructs E.g., delete/re-label a node, generalize an edge C: relaxation controls E.g., prefer/reject certain relaxation operations, use certain relaxation types, control relaxation orders, etc S: stop condition E.g., the minimum # of approximate answers to be returned 12/6/2019

Relaxation Language Example
<inex_topic topic_id="267" > <castitle> //article//fm//atl[about(., "digital libraries")] </castitle> <description> Articles containing "digital libraries" in their title. </description> <narrative> I'm interested in articles discussing Digital Libraries as their main subject. Therefore I require that the title of any relevant article mentions "digital library" explicitly. Documents that mention digital libraries only under the bibliography are not relevant, as well as documents that do not have the phrase "digital library" in their title. </narrative> </inex_topic> article fm atl “digital libraries” $1 $2 $3 C = !Rel($3, -)  !Del($3)  Reject($2, bb) !Rel($3, -) : $3 cannot be re-labeled !Del($3): $3 cannot be deleted Reject($2, bb): $2 cannot be re-labeled to bb 12/6/2019

How to Relax Queries? Naïve approach Observation
Generate all possible relaxed queries & iteratively select the best relaxed query to derive approximate answers Exhaustive, but not scalable Observation Many queries share the same (or similar) tree structures Our approach: relaxation index structure Consider the structure of a query tree T as a template Build indexes on the relaxed trees of T Use the index to guide the relaxations of any query with the same (or similar) tree structure as that of T 12/6/2019

Relaxation Index Structure - XTAH
A hierarchical multi-level labeled cluster of relaxed trees for a given query tree Building an XTAH Given a query structure template T, generate all possible relaxed trees Each relaxed trees uses an unique set of relaxation operations Cluster relaxed trees into groups based on relaxation operations and distances -- similar to “suffix-tree” clustering 12/6/2019

XTAH Example for Template Structure T
{gen(e$1,$2)} … {gen(e$3, $4)} {del($2)} node_relabel edge_generalization node_deletion relax {gen(e$3, $4), gen(e$1,$3)} ... article body section T6 {gen(e$1, $2), gen(e$3, $4)} {del($2), del($3)} title T2 T4 T3 T1 T7 article title body section $1 $2 $3 $4 Template structure T gen(e$u, $v) – relaxing the edge between $u and $v del($u) – deleting the node $u 12/6/2019

XTAH Properties Each group consists of a set of relaxed trees derived from similar relaxation operations The relaxed trees can be located efficiently based on the type of relaxation operation The higher level group in the XTAH yields lesser relaxation than the lower group Query can be relaxed to different level of granularities by traversing up and down the XTAH 12/6/2019

Ranking of XML Approximate Answers
Content similarity – cont_sim(A, Q) An extended vector space model [2] Structure similarity – struct_dist(A, Q) Use tree editing distance for measuring structure similarity Propose a cost model that assigns operation cost based on relaxation semantics Overall relevancy – sim(A, Q) A ranking model combing both content similarity and structure distance  is a small constant between 0 and 1 12/6/2019 [2] Configurable Indexing and Ranking for XML Information Retrieval (S. Liu, et al., 2004)

Experimental Studies Experiment Setup Evaluation Metrics
INEX (INitiative for the Evaluation of Xml) 05 test collection Document collection Query set Gold standard Evaluation Metrics nxCG (normalized extended cumulative gain) the official evaluation metric used in INEX 05 Given a number i (i1), similar to measures the relative gain users accumulated up to the rank i 12/6/2019

Retrieval performance improvements with semantic cost model
Query set: all content-and-structure queries in INEX 05 (, cost model)  Cost Model 0.1 0.3 0.5 0.7 0.9 Uniform 0.2584 0.2616 0.2828 0.2894 0.2916 Semantic (+28.44%) (+21.94%) (+13.04%) (+6%) (+4.08%) Assigning relaxation operation with different cost based on the similarities of the nodes being operated improves retrieval performance! and yield similar results 12/6/2019

Evaluation of Relaxation Control
article fm atl “digital libraries” $1 $2 $3 C = !Rel($3, -)  !Del($3)  Reject($2, bb) Query: topic 267 Result: Evaluation Metric Method No relaxation control 0.1013 0.2365 With relaxation control 1.0 0.8986 Perfect accuracy Relaxation control enables the system to provide answers with greater relevancy! 12/6/2019

Related Work Relaxation based on schema conversions ([LC01, LMC01], [LMC03]) Without structure relaxation Native XML relaxation Proposed structure relaxation types [e.g., KS01, ACS02] Used the relaxation types [ACS02] in our work Investigate efficient algorithms for deriving top-K answers based on relaxation types [e.g, Sch02, ACS02, ALP04, AKM05] Without relaxation control 12/6/2019

Conclusion Cooperative XML (CoXML) query answering
Relaxation-enabled query language allows users to effectively express the relaxed query conditions as well as controlling the relaxation process XTAH provides systematic query relaxation guidance Used both content and structure similarity metrics for evaluating the relevancy of approximate answers Evaluation studies with the INEX test collections validate the effectiveness of our methodology 12/6/2019

CoXML: A Cooperative XML Query Answering System

Similar presentations

Presentation on theme: "CoXML: A Cooperative XML Query Answering System"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CoXML: A Cooperative XML Query Answering System

Similar presentations

Presentation on theme: "CoXML: A Cooperative XML Query Answering System"— Presentation transcript:

Similar presentations

About project

Feedback