Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Abdeslame ALILAOUAR, Florence SEDES Fuzzy Querying of XML Documents The minimum spanning tree IRIT - CNRS IRIT : IRIT : Research Institute for Computer.

Similar presentations


Presentation on theme: "1 Abdeslame ALILAOUAR, Florence SEDES Fuzzy Querying of XML Documents The minimum spanning tree IRIT - CNRS IRIT : IRIT : Research Institute for Computer."— Presentation transcript:

1 1 Abdeslame ALILAOUAR, Florence SEDES Fuzzy Querying of XML Documents The minimum spanning tree IRIT - CNRS IRIT : IRIT : Research Institute for Computer Science of Toulouse (France)

2 2  The XML model  The problem of querying XML documents  Proposed techniques  Our approach  Implementation details  Conclusion and future tasks Talk Outline

3 3 Document-centric vs. Data-centric  Less regular or irregular structure,  The order of sibling elements is important,  Examples : Emails, books, etc. Document-centric  More structured  The order of sibling elements is often unimportant  Examples : sales orders, configuration files, etc. Data-centric The XML Data Model

4 4 The XML Data Model (continued)  Data are commonly modeled by a tree structure  Nodes represent objects  Edges represent relationships between objects  Atomic values are attached to leaf nodes

5 The XML Data Model (continued) 4 2 1700 4 1100 1300 Variations in Structure cottage price 1300 identifier ″40″ character nbeds 4 cottage character identifier ″23″ nbeds 4 price room 1700 room cotglist nbeds 2 1100 summer winter

6 Query = Content + Structure Unknown, Irregular XML Document = Content + Structure Structure matching R.I. The Problem of Querying XML Documents Content matching Result Irregular structure In most cases, the queries return empty or incomplete set of answers  Data has structural variations Relationships between objects are represented differently in different parts of the documents  Data has ontology variations Different labels are used to describe objects of the same type (e.g. house, cottage)

7  Query should deal with different data structures Solution The Problem of Querying XML Documents (continued)  The queries should not be rigid patterns (structure)  Flexible handling of queries in order to find not only the answers that match exactly, but also with a similar structure and/or content

8 8 Proposed Techniques  Query relaxation (S. AmerYahia, AT&T, 2002)  Tree-edit distance (D. Shasha, K. Zhang, 1989 )  Correlation (A. Tversky, 1977 )  Data Relaxation (Damiani & Tanca, 2000 )

9 Our approach The minimum spanning tree (MST) - Optimization problem - A weighted graph Input Output - The cheapest subset of edges that keeps the graph in one connected component The minimum spanning tree

10 Proposed algorithm : Prim's algorithm (1957) Compute a minimum spanning tree by beginning with any vertex as the current tree. At each step add a least edge between any vertex not in the tree and any vertex in the tree. Continue until all vertices have been added. Kruskal's algorithm (1956) It maintains a set of partial minimum spanning trees, and repeatedly adds the shortest edge in the graph whose vertices are in different partial minimum spanning trees.

11 Querying XML documents with MST  Define a similarity function that we will use for estimating the matching degree of the preferences The importance level determines the priority between the preferences  replace the criteria by preferences with their importance levels The satisfaction degree of one preference is at least equal it importance level  The answers subtrees are built gradually, starting by evaluating the leaf nodes and the most important preferences, going up until construct the answers tree like a Kruskal’s algorithm. cottage nbeds price 4 1400 0,8 0,6 Example :  represent the queries by a weighted tree pattern

12 12 cottage i dentifier ″140″ character nbeds 4 cottage character identifier ″123″ nbeds 4 price room 1700 room cotglist nbeds 2 cottage nbeds price 4 1400 0,8 0,6 Sim(1300,1400)=0,9 Sim(price,price)=1 Sim(1300,1700) = 0,7 Sim=1 Sim=1,0 Sim=0,9 Sim=0,7 Example : price 1100 1300 summer winter

13 Index builder Query Processor Query Answer list Tag Index Attribute Index Data Index Term Index XML document XML collection Indexed collection The architecture of our querying system Some Implementation Details

14 Indexing method  Efficiently determine the ancestors and descendent s of any node  Dietz’s method ( 1982)  Why Dietz’s method - for two given nodes x and y of a tree T, x is an ancestor of y iff x occurs before y in the preorder traversal and after y in the postorder traversal.  A straightforward method Traversal order to determine the ancestor-descendant relationship

15 15 Future work  Experiments within INEX (Initiative for the Evaluation of XML retrieval) Uses a  Improving the similarity functions (Uses a thesaurus, etc.)  Introducing the qualitative preferences (cheapest, nearest, small, etc.)

16 16 Thank You Questions?


Download ppt "1 Abdeslame ALILAOUAR, Florence SEDES Fuzzy Querying of XML Documents The minimum spanning tree IRIT - CNRS IRIT : IRIT : Research Institute for Computer."

Similar presentations


Ads by Google