Presentation is loading. Please wait.

Presentation is loading. Please wait.

TOSS: An Extension of TAX with Ontologies and Similarity Queries Edward Hung Yu Deng V.S. Subrahmanian Presentation by: Valentina Bonsi Roberto Gamboni.

Similar presentations


Presentation on theme: "TOSS: An Extension of TAX with Ontologies and Similarity Queries Edward Hung Yu Deng V.S. Subrahmanian Presentation by: Valentina Bonsi Roberto Gamboni."— Presentation transcript:

1 TOSS: An Extension of TAX with Ontologies and Similarity Queries Edward Hung Yu Deng V.S. Subrahmanian Presentation by: Valentina Bonsi Roberto Gamboni Giuseppe Vitalone Speaker: Roberto Gamboni

2 Outline Abstract TAX overview Quality problems TOSS architecture TOSS algebra Experiments Conclusions & Related works

3 Abstract Tree Algebra for XML  an algebra developed for XML DB  100% precision but low recall  semantic not considered TAX with Ontologies and Similarity Queries  ontology  similarity enhancement  improves recall Much higher quality!

4 Tree Algebra for XML Semistructured instance: I = (V,E,t)  G = (V,E) is a set of rooted directed trees where V is a set of nodes and E is a set of edges  V x V.  t assigns for each object o  V a type for its tag and content, i.e. o.tag = string and o.content = int. Pattern tree: P = (T,F)  T = (V,E) is object labeled (a distinct integer) and edge labeled (‘pc’ or ‘ad’) tree  F is a selection condition applicable to objects in T.

5 TAX selection example DB1 car carModel [Toyota/Yaris] price [10000] year [2002] km [30000] carDealer [RBV] fuelCons [10] carModel [Vw/Polo] price [14000] year [2004] km [40000] carDealer [Pico] fuelCons [12] carModel [Vw/Golf] price [20000] year [2005] km [10000] carDealer [RBV S.p.A.] fuelCons [13] #1 #2 #3 pc #1.tag=car & #2.tag=price & #3.tag=carModel & #2.content<15000 car price [10000] carModel [Toyota/Yaris] car price [14000] carModel [Vw/Polo] Witness trees Pattern tree

6 TAX similarity problems biblio book title[Operating Systems] price [45,50] author [W. Stallings] publisher [MacMillan] year [1992] ISBN [002945671] book title [Cryptography] price [42,50] author [William Stallings] publisher [Prentice Hall] year[2003] ISBN[003456783] #1 #2 #3 pc #1.tag=book & #2.tag=title & #3.tag=author & #3.content= “W. Stallings” Low recall!!! W. Stallings and William Stallings are probably the same person but TAX does not use any notion of similarity between terms.  Solution: improve TAX with some similarity measure d s (W. Stallings, William Stallings) = 0,1 (very similar) d s (W. Stallings, Shakespeare) = 5 (much less similar)

7 TAX multi-DB example cars car carModel [Toyota/Yaris] price [10000] year [2002] km [30000] carDealer [RBV] fuelCons [10] carModel [Vw/Polo] price [14000] year [2004] km [40000] carDealer [Pico] fuelCons [12] carModel [Vw/Golf] price [20000] year [2005] km [10000] carDealer [RBV] fuelCons [13] vendor car make [Volkswagen] model [Fox] year [2005] miles [30000] cost [5000] fuelCons [15] make [AstonMartin] model [Vanquish] year [2004] miles [10000] cost [70000] fuelCons [6] make [Ferrari] model [360] year [2002] miles [15000] cost [80000] fuelCons [6] automobiles dealerName[RVB] location[Bologna] feedback[5] DB1 DB2

8 TAX problems with multi-DB Different tags can refer to the same thing. The same content can be stored differently. Tags like km and miles or price and cost may contain values expressed in different units (i.e. EUR or USD).

9 Inter-term lexical relationships Web search Company Computer Company Google Company isa “Return all authors of papers written by someone in a Web Search Company” Google’s authors are never returned! Ontology authors author firstName[Marco] lastName[Pivi] company[Google] firstName[Samuele] lastName[Salti] company[Eclipse Found.] #1 #2 #3 pc #1.tag = author & #2.tag = lastName & #3.tag = company & #3.content = “Web Search Company”

10 TOSS: Architecture’s birdseye view Xindice system threshold  similarity measure User queries Fusion of Ontologies XML files Similarity Enhancer SEO Query Executor results Ontology Maker WordNe t User-specified rules Goal: extend and enhance TAX to return high quality answers using ontology and similarity measures

11 Ontology maker animals black widow elephant dog name [Fuffi] race [African] age [50] name [Fido] race [Collie] age [4] XML DB: Derived ontology: mammal spider arachnid proboscidean carnivore canine isa name [Pito] race [Mactans] age [7]

12 Ontology Integration cars car carModel price year km carDealer fuelCons car make model year miles cost fuelCons vendor automobiles Interoperation Constraints (specified by user) dealerName location feedback

13 Fusion of Ontologies cars car carModel price year km fuelCons automobiles vendor dealerName location feedback miles cost make:2 and model:2 are both mapped into carModel not grouped! as different units might be used in istances, the administrator has to define a conversion function to compare these values

14 User-specified rules TOSS: Architecture’s birdseye view Xindice system User queries Fusion of Ontologies XML files threshold  similarity measure Similarity Enhancer SEO Query Executor results Ontology Maker WordNe t

15 Similarity Enhancer airports LAX – CA (Los Angeles) LB – CA (Long Beach) London City Airport London BAA Heathrow London Gatwick Roma Fiumicino British Airways American Airlines Delta Airlines Alitalia United Airlines Threshold = 2 d(LAX,LB) =1,5 d(London City,London Heathrow)=1 d(London City,London Gatwick)= 1,3 d(London Gatwick, London Heathrow)=1,6 d(London City,Roma Fumicino) =3,5 d(Roma Fiumicino,LAX) = 9 1.Preserves the original partial order 2. All nodes mapped into the same node are similar to each other 3. Two strings are similar iff they are mapped into the same node 4. There are not redundant nodes (no subset)

16 TOSS: Architecture’s birdseye view threshold  similarity measure User queries Fusion of Ontologies XML files Similarity Enhancer SEOresults Ontology Maker WordNe t User-specified rules Xindice system Query Executor

17 Transforms a user query into a query that takes the similarity enhanced and (fused) ontology into account. Implements an ontology extended algebra that improves TAX algebra. In TOSS algebra, a simple selection condition is X op Y, where op  {=, ≠,, ≥, ~, instance_of, is_a, subtype_of, above, below} and X, Y are terms (attributes, types etc..).

18 A selection condition is a simple selection condition or conjunction, disjunction, negation of selection conditions. C = X ~ Y is true iff  a node containing both of them in SEO; C = X instance_of Y is true iff type of X is a subtype of Y and its value  dom(Y); C = X subtype_of Y is true iff type(X) ≤ type(Y); C = X below Y is true iff X instance_of Y or X subtype_of Y; C = X above Y is true iff Y below X. TOSS Algebra

19 Query Example biblio book title[Operating Systems] price [45,50] author [W. Stallings] publisher [MacMillan] year [1992] ISBN [002945671] book title [Cryptography] price [42,50] author [William Stallings] publisher [Prentice Hall] year[2003] ISBN[003456783] #1 #2 #3 pc #1.tag=book & #2.tag=title & #3.tag=author & #3.content ~ “W. Stallings” book author [William Stallings] title [Cryptography] book author [W. Stallings] title [Operating Systems] ds(W. Stallings, William Stallings) <  NOW all correct answers are returned!

20 Query Example(2) animals elephant dog black widow Name [Fuffi] Name [Pito] Name [Fido] Age [50] Age [7] Age [4] “Return the list of all mammals” Mammal ??? ontology Elephant IS A mammal Dog IS A mammal elephant Name [Fuffi] Age [50] dog Name [Fido] Age [4]

21 Implementation and Experiments TOSS implemented in Java. Built on top of Xindice DBMS. Experiments over DBLP:  Recall and precision  12 selection queries on 3 data sets (each containing 100 random papers)

22 Recall and precision =TAX X = TOSS (  =2) + = TOSS (  =3) TAX always get 100% precision but low recall!  TOSS maintains its precision close to 1 with much higher recall! For queries with lowest TOSS precision, a precision degradation of 1/3 corresponds to a 3 times increase of recall

23 Recall and precision (2) TOSS quality is always better than TAX! =TAX X = TOSS (  =2) + = TOSS (  =3)

24 Recall and precision (3) In TOSS most of the queries get their normalized recall more than doubled TOSS results with threshold=3 are not necessarily better than the ones with threshold=2 X = improvement (  =2) + = improvement (  =3)

25 Conclusions & Related works Ontologies to improve the quality of answers to queries (Wiederhold’s group); Merge ontologies under interoperation constraints; Semistructured instances with associated ontologies can be queried; Introduct the concept of similarity search in semistructured DBs. Scored pattern tree (TIX)

26 Bibliography H.V. Jagadish, L.V.S. Lakshmanan, D. Srivastava and K. Thompson. TAX: A tree algebra for XML. In Proc. DBPL Conf, Rome, Italy 2001. G.A. Miller et al. WordNet – a lexical database for english. Cognitive Science Laboratory, Princeton University. G. Wiederhold. Interoperation, mediation and ontologies. In Interantional Symp. On Fifth Generation Computer Systems, Workshop on Heterogeneus Cooperative Knowledge Bases, ICOT, pages 33 – 48, 1994. SIGMOD Record in XML. Available at http://www.acm.org/sigmod/record/xml/, Nov 2002. http://www.acm.org/sigmod/record/xml/

27 Questions and answers


Download ppt "TOSS: An Extension of TAX with Ontologies and Similarity Queries Edward Hung Yu Deng V.S. Subrahmanian Presentation by: Valentina Bonsi Roberto Gamboni."

Similar presentations


Ads by Google