TOSS: An Extension of TAX with Ontologies and Similarity Queries Edward Hung Yu Deng V.S. Subrahmanian Presentation by: Valentina Bonsi Roberto Gamboni.

Slides:



Advertisements
Similar presentations
1 University of Namur, Belgium PReCISE Research Center Using context to improve data semantic mediation in web services composition Michaël Mrissa (spokesman)
Advertisements

CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
A Stepwise Modeling Approach for Individual Media Semantics Annett Mitschick, Klaus Meißner TU Dresden, Department of Computer Science, Multimedia Technology.
Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
Creating a Similarity Graph from WordNet
Trust and Profit Sensitive Ranking for Web Databases and On-line Advertisements Raju Balakrishnan (Arizona State University)
©Silberschatz, Korth and Sudarshan1.1Database System Concepts Chapter 1: Introduction Purpose of Database Systems View of Data Data Models Data Definition.
Interactive Generation of Integrated Schemas Laura Chiticariu et al. Presented by: Meher Talat Shaikh.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
An Ontology-Extended Relational Algebra Piero Bonatti Università di Napoli "Federico II" Yu Deng V.S. Subrahmanian University of Maryland College Park.
TOSS: An Extension of TAX with Ontologies and Similarity Queries Edward Hung, Yu Deng, V.S. Subrahmanian Department of Computer Science University of Maryland,
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
©Silberschatz, Korth and Sudarshan1.1Database System Concepts Chapter 1: Introduction Purpose of Database Systems View of Data Data Models Data Definition.
Ontology-Based Free-Form Query Processing for the Semantic Web Mark Vickers Brigham Young University MS Thesis Defense Supported by:
CS246 Query Translation. Mind Your Vocabulary Q: What is the problem? A: How to integrate heterogeneous sources when their schema & capability are different.
Probabilistic answers to relational queries (PARQ) Octavian Udrea Yu Deng Edward Hung V. S. Subrahmanian.
Dr. Kalpakis CMSC 461, Database Management Systems Introduction.
DeSiamorewww.desiamore.com/ifm1 Database Management Systems (DBMS)  B. Computer Science and BSc IT Year 1.
Semantic Matching Pavel Shvaiko Stanford University, October 31, 2003 Paper with Fausto Giunchiglia Research group (alphabetically ordered): Fausto Giunchiglia,
Welcome to CPSC 534B: Web Data Integration & Management Laks V.S. Lakshmanan Rm. CICSR Main Mall.
IS432: Semi-Structured Data Dr. Azeddine Chikh. 1. Semi Structured Data Object Exchange Model.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 1: Introduction.
Cost-based Optimization of Graph Queries Silke Trißl Humboldt-Universität zu Berlin Knowledge Management in Bioinformatics IDAR 2007.
XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Wayne State University Joint work with Mustafa Atay,
Information storage: Introduction of database 10/7/2004 Xiangming Mu.
Selective and Authentic Third-Party distribution of XML Documents - Yashaswini Harsha Kumar - Netaji Mandava (Oct 16 th 2006)
The Relational Model. Review Why use a DBMS? OS provides RAM and disk.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
1 XML-KSI, 2004 XML- : an extendible framework for manipulating XML data Jaroslav Pokorny Charles University Praha.
Extracting Relations from XML Documents C. T. Howard HoJoerg GerhardtEugene Agichtein*Vanja Josifovski IBM Almaden and Columbia University*
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Querying Structured Text in an XML Database By Xuemei Luo.
NaLIX Natural Language Interface for querying XML Huahai Yang Department of Information Studies Joint work with Yunyao Li and H.V. Jagadish at University.
 2004 Prentice Hall, Inc. All rights reserved. 1 Segment – 6 Web Server & database.
Minor Thesis A scalable schema matching framework for relational databases Student: Ahmed Saimon Adam ID: Award: MSc (Computer & Information.
updated CmpE 583 Fall 2008 Ontology Integration- 1 CmpE 583- Web Semantics: Theory and Practice ONTOLOGY INTEGRATION Atilla ELÇİ Computer.
Ontoprise: B 3 - Semantic B2B Broker whitepaper review Bernhard Schueler CSCI 8350, Spring 2002,UGA.
©Silberschatz, Korth and Sudarshan1.1Database System Concepts Chapter 1: Introduction Purpose of Database Systems View of Data Data Models Data Definition.
Artificial Intelligence LECTURE 2 ARTIFICIAL INTELLIGENCE LECTURES BY ENGR. QAZI ZIA 1.
TAX: A Tree Algebra for XML H.V. Jagadish Laks V.S. Lakshmanan Univ. of Michigan Univ. of British Columbia Divesh Srivastava Keith Thompson AT&T Labs –
Logics for Data and Knowledge Representation Applications of ClassL: Lightweight Ontologies.
 2001 Prentice Hall Business Publishing, Accounting Information Systems, 8/E, Bodnar/Hopwood A field may be a single character or number, or it.
DeSiamorePowered by DeSiaMore1 Database Management Systems (DBMS)  B. Computer Science and BSc IT Year 1.
Msigwaemhttp//:msigwaem.ueuo.com/1 Database Management Systems (DBMS)  B. Computer Science and BSc IT Year 1.
Logics for Data and Knowledge Representation
XML and Database.
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
1 Resolving Schematic Discrepancy in the Integration of Entity-Relationship Schemas Qi He Tok Wang Ling Dept. of Computer Science School of Computing National.
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
Mr.Prasad Sawant, MIT Pune India Introduction to DBMS.
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
Holistic Twig Joins Optimal XML Pattern Matching Nicolas Bruno Columbia University Nick Koudas Divesh Srivastava AT&T Labs-Research SIGMOD 2002.
1 Holistic Twig Joins: Optimal XML Pattern Matching Nicolas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 2002 Presented by Jun-Ki Min.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
1 Storing and Maintaining Semistructured Data Efficiently in an Object- Relational Database Mo Yuanying and Ling Tok Wang.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
1 Integrating Databases into the Semantic Web through an Ontology-based Framework Dejing Dou, Paea LePendu, Shiwoong Kim Computer and Information Science,
Databases Salihu Ibrahim Dasuki (PhD) CSC102 INTRODUCTION TO COMPUTER SCIENCE.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
©Silberschatz, Korth and Sudarshan 1.1 Database System Concepts قواعد البيانات Data Base قواعد البيانات CCS 402 Mr. Nedal hayajneh E- mail
A Mixed-Initiative System for Building Mixed-Initiative Systems Craig A. Knoblock, Pedro Szekely, and Rattapoom Tuchinda Information Science Institute.
1 Efficient Processing of Partially Specified Twig Queries Junfeng Zhou Renmin University of China.
Querying Structured Text in an XML Database Shurug Al-Khalifa Cong Yu H. V. Jagadish (University of Michigan) Presented by Vedat Güray AFŞAR & Esra KIRBAŞ.
Maitrayee Mukerji. INPUT MEMORY PROCESS OUTPUT DATA INFO.
CS 325 Spring ‘09 Chapter 1 Goals:
Chapter 2 .Computational Models
Cross-language Information Retrieval
Associative Query Answering via Query Feature Similarity
Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.
Presentation transcript:

TOSS: An Extension of TAX with Ontologies and Similarity Queries Edward Hung Yu Deng V.S. Subrahmanian Presentation by: Valentina Bonsi Roberto Gamboni Giuseppe Vitalone Speaker: Roberto Gamboni

Outline Abstract TAX overview Quality problems TOSS architecture TOSS algebra Experiments Conclusions & Related works

Abstract Tree Algebra for XML  an algebra developed for XML DB  100% precision but low recall  semantic not considered TAX with Ontologies and Similarity Queries  ontology  similarity enhancement  improves recall Much higher quality!

Tree Algebra for XML Semistructured instance: I = (V,E,t)  G = (V,E) is a set of rooted directed trees where V is a set of nodes and E is a set of edges  V x V.  t assigns for each object o  V a type for its tag and content, i.e. o.tag = string and o.content = int. Pattern tree: P = (T,F)  T = (V,E) is object labeled (a distinct integer) and edge labeled (‘pc’ or ‘ad’) tree  F is a selection condition applicable to objects in T.

TAX selection example DB1 car carModel [Toyota/Yaris] price [10000] year [2002] km [30000] carDealer [RBV] fuelCons [10] carModel [Vw/Polo] price [14000] year [2004] km [40000] carDealer [Pico] fuelCons [12] carModel [Vw/Golf] price [20000] year [2005] km [10000] carDealer [RBV S.p.A.] fuelCons [13] #1 #2 #3 pc #1.tag=car & #2.tag=price & #3.tag=carModel & #2.content<15000 car price [10000] carModel [Toyota/Yaris] car price [14000] carModel [Vw/Polo] Witness trees Pattern tree

TAX similarity problems biblio book title[Operating Systems] price [45,50] author [W. Stallings] publisher [MacMillan] year [1992] ISBN [ ] book title [Cryptography] price [42,50] author [William Stallings] publisher [Prentice Hall] year[2003] ISBN[ ] #1 #2 #3 pc #1.tag=book & #2.tag=title & #3.tag=author & #3.content= “W. Stallings” Low recall!!! W. Stallings and William Stallings are probably the same person but TAX does not use any notion of similarity between terms.  Solution: improve TAX with some similarity measure d s (W. Stallings, William Stallings) = 0,1 (very similar) d s (W. Stallings, Shakespeare) = 5 (much less similar)

TAX multi-DB example cars car carModel [Toyota/Yaris] price [10000] year [2002] km [30000] carDealer [RBV] fuelCons [10] carModel [Vw/Polo] price [14000] year [2004] km [40000] carDealer [Pico] fuelCons [12] carModel [Vw/Golf] price [20000] year [2005] km [10000] carDealer [RBV] fuelCons [13] vendor car make [Volkswagen] model [Fox] year [2005] miles [30000] cost [5000] fuelCons [15] make [AstonMartin] model [Vanquish] year [2004] miles [10000] cost [70000] fuelCons [6] make [Ferrari] model [360] year [2002] miles [15000] cost [80000] fuelCons [6] automobiles dealerName[RVB] location[Bologna] feedback[5] DB1 DB2

TAX problems with multi-DB Different tags can refer to the same thing. The same content can be stored differently. Tags like km and miles or price and cost may contain values expressed in different units (i.e. EUR or USD).

Inter-term lexical relationships Web search Company Computer Company Google Company isa “Return all authors of papers written by someone in a Web Search Company” Google’s authors are never returned! Ontology authors author firstName[Marco] lastName[Pivi] company[Google] firstName[Samuele] lastName[Salti] company[Eclipse Found.] #1 #2 #3 pc #1.tag = author & #2.tag = lastName & #3.tag = company & #3.content = “Web Search Company”

TOSS: Architecture’s birdseye view Xindice system threshold  similarity measure User queries Fusion of Ontologies XML files Similarity Enhancer SEO Query Executor results Ontology Maker WordNe t User-specified rules Goal: extend and enhance TAX to return high quality answers using ontology and similarity measures

Ontology maker animals black widow elephant dog name [Fuffi] race [African] age [50] name [Fido] race [Collie] age [4] XML DB: Derived ontology: mammal spider arachnid proboscidean carnivore canine isa name [Pito] race [Mactans] age [7]

Ontology Integration cars car carModel price year km carDealer fuelCons car make model year miles cost fuelCons vendor automobiles Interoperation Constraints (specified by user) dealerName location feedback

Fusion of Ontologies cars car carModel price year km fuelCons automobiles vendor dealerName location feedback miles cost make:2 and model:2 are both mapped into carModel not grouped! as different units might be used in istances, the administrator has to define a conversion function to compare these values

User-specified rules TOSS: Architecture’s birdseye view Xindice system User queries Fusion of Ontologies XML files threshold  similarity measure Similarity Enhancer SEO Query Executor results Ontology Maker WordNe t

Similarity Enhancer airports LAX – CA (Los Angeles) LB – CA (Long Beach) London City Airport London BAA Heathrow London Gatwick Roma Fiumicino British Airways American Airlines Delta Airlines Alitalia United Airlines Threshold = 2 d(LAX,LB) =1,5 d(London City,London Heathrow)=1 d(London City,London Gatwick)= 1,3 d(London Gatwick, London Heathrow)=1,6 d(London City,Roma Fumicino) =3,5 d(Roma Fiumicino,LAX) = 9 1.Preserves the original partial order 2. All nodes mapped into the same node are similar to each other 3. Two strings are similar iff they are mapped into the same node 4. There are not redundant nodes (no subset)

TOSS: Architecture’s birdseye view threshold  similarity measure User queries Fusion of Ontologies XML files Similarity Enhancer SEOresults Ontology Maker WordNe t User-specified rules Xindice system Query Executor

Transforms a user query into a query that takes the similarity enhanced and (fused) ontology into account. Implements an ontology extended algebra that improves TAX algebra. In TOSS algebra, a simple selection condition is X op Y, where op  {=, ≠,, ≥, ~, instance_of, is_a, subtype_of, above, below} and X, Y are terms (attributes, types etc..).

A selection condition is a simple selection condition or conjunction, disjunction, negation of selection conditions. C = X ~ Y is true iff  a node containing both of them in SEO; C = X instance_of Y is true iff type of X is a subtype of Y and its value  dom(Y); C = X subtype_of Y is true iff type(X) ≤ type(Y); C = X below Y is true iff X instance_of Y or X subtype_of Y; C = X above Y is true iff Y below X. TOSS Algebra

Query Example biblio book title[Operating Systems] price [45,50] author [W. Stallings] publisher [MacMillan] year [1992] ISBN [ ] book title [Cryptography] price [42,50] author [William Stallings] publisher [Prentice Hall] year[2003] ISBN[ ] #1 #2 #3 pc #1.tag=book & #2.tag=title & #3.tag=author & #3.content ~ “W. Stallings” book author [William Stallings] title [Cryptography] book author [W. Stallings] title [Operating Systems] ds(W. Stallings, William Stallings) <  NOW all correct answers are returned!

Query Example(2) animals elephant dog black widow Name [Fuffi] Name [Pito] Name [Fido] Age [50] Age [7] Age [4] “Return the list of all mammals” Mammal ??? ontology Elephant IS A mammal Dog IS A mammal elephant Name [Fuffi] Age [50] dog Name [Fido] Age [4]

Implementation and Experiments TOSS implemented in Java. Built on top of Xindice DBMS. Experiments over DBLP:  Recall and precision  12 selection queries on 3 data sets (each containing 100 random papers)

Recall and precision =TAX X = TOSS (  =2) + = TOSS (  =3) TAX always get 100% precision but low recall!  TOSS maintains its precision close to 1 with much higher recall! For queries with lowest TOSS precision, a precision degradation of 1/3 corresponds to a 3 times increase of recall

Recall and precision (2) TOSS quality is always better than TAX! =TAX X = TOSS (  =2) + = TOSS (  =3)

Recall and precision (3) In TOSS most of the queries get their normalized recall more than doubled TOSS results with threshold=3 are not necessarily better than the ones with threshold=2 X = improvement (  =2) + = improvement (  =3)

Conclusions & Related works Ontologies to improve the quality of answers to queries (Wiederhold’s group); Merge ontologies under interoperation constraints; Semistructured instances with associated ontologies can be queried; Introduct the concept of similarity search in semistructured DBs. Scored pattern tree (TIX)

Bibliography H.V. Jagadish, L.V.S. Lakshmanan, D. Srivastava and K. Thompson. TAX: A tree algebra for XML. In Proc. DBPL Conf, Rome, Italy G.A. Miller et al. WordNet – a lexical database for english. Cognitive Science Laboratory, Princeton University. G. Wiederhold. Interoperation, mediation and ontologies. In Interantional Symp. On Fifth Generation Computer Systems, Workshop on Heterogeneus Cooperative Knowledge Bases, ICOT, pages 33 – 48, SIGMOD Record in XML. Available at Nov

Questions and answers