TOSS: An Extension of TAX with Ontologies and Similarity Queries Edward Hung, Yu Deng, V.S. Subrahmanian Department of Computer Science University of Maryland,

Slides:



Advertisements
Similar presentations
Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA.
Advertisements

1 ICS-FORTH EU-NSF Semantic Web Workshop 3-5 Oct Christophides Vassilis Database Technology for the Semantic Web Vassilis Christophides Dimitris Plexousakis.
Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.
Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,
1 Abdeslame ALILAOUAR, Florence SEDES Fuzzy Querying of XML Documents The minimum spanning tree IRIT - CNRS IRIT : IRIT : Research Institute for Computer.
TIMBER A Native XML Database Xiali He The Overview of the TIMBER System in University of Michigan.
Case Study: BibFinder BibFinder: A popular CS bibliographic mediator –Integrating 8 online sources: DBLP, ACM DL, ACM Guide, IEEE Xplore, ScienceDirect,
Leveraging Data and Structure in Ontology Integration Octavian Udrea 1 Lise Getoor 1 Renée J. Miller 2 1 University of Maryland College Park 2 University.
TOSS: An Extension of TAX with Ontologies and Similarity Queries Edward Hung Yu Deng V.S. Subrahmanian Presentation by: Valentina Bonsi Roberto Gamboni.
Trust and Profit Sensitive Ranking for Web Databases and On-line Advertisements Raju Balakrishnan (Arizona State University)
Maurice Hermans.  Ontologies  Ontology Mapping  Research Question  String Similarities  Winkler Extension  Proposed Extension  Evaluation  Results.
Introduction to Spatial Database System Presented by Xiaozhi Yu.
C++ Programming: Program Design Including Data Structures, Third Edition Chapter 21: Graphs.
Interactive Generation of Integrated Schemas Laura Chiticariu et al. Presented by: Meher Talat Shaikh.
1 RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan.
Firewall Policy Queries Author: Alex X. Liu, Mohamed G. Gouda Publisher: IEEE Transaction on Parallel and Distributed Systems 2009 Presenter: Chen-Yu Chang.
An Ontology-Extended Relational Algebra Piero Bonatti Università di Napoli "Federico II" Yu Deng V.S. Subrahmanian University of Maryland College Park.
Annotated RDF Octavian Udrea Diego Reforgiato Recupero V.S. Subrahmanian University of Maryland.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
XSEarch: A Semantic Search Engine for XML Sara Cohen Jonathan Mamou Yaron Kanza Yehoshua Sagiv Presented at VLDB 2003, Germany.
Toward Making Online Biological Data Machine Understandable Cui Tao Data Extraction Research Group Department of Computer Science, Brigham Young University,
CS405G: Introduction to Database Systems Final Review.
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
Probabilistic answers to relational queries (PARQ) Octavian Udrea Yu Deng Edward Hung V. S. Subrahmanian.
Chapter 4 Query Languages.... Introduction Cover different kinds of queries posed to text retrieval systems Keyword-based query languages  include simple.
4/20/2017.
IS432: Semi-Structured Data Dr. Azeddine Chikh. 1. Semi Structured Data Object Exchange Model.
Cost-based Optimization of Graph Queries Silke Trißl Humboldt-Universität zu Berlin Knowledge Management in Bioinformatics IDAR 2007.
NUITS: A Novel User Interface for Efficient Keyword Search over Databases The integration of DB and IR provides users with a wide range of high quality.
Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.
CMPS 3223 Theory of Computation Automata, Computability, & Complexity by Elaine Rich ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Slides provided.
A TREE BASED ALGEBRA FRAMEWORK FOR XML DATA SYSTEMS
1 SAMT’08 Semantic-driven multimedia retrieval with the MPEG Query Format Ruben Tous and Jaime Delgado Distributed Multimedia Applications Group (DMAG)
DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer.
OWL Capturing Semantic Information using a Standard Web Ontology Language Aditya Kalyanpur Jennifer Jay Banerjee James Hendler Presented By Rami Al-Ghanmi.
1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002.
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
LDK R Logics for Data and Knowledge Representation Lightweight Ontologies.
Querying Structured Text in an XML Database By Xuemei Luo.
1 University of Palestine Topics In CIS ITBS 3202 Ms. Eman Alajrami 2 nd Semester
Approximate XML Joins Huang-Chun Yu Li Xu. Introduction XML is widely used to integrate data from different sources. Perform join operation for XML documents:
updated CmpE 583 Fall 2008 Ontology Integration- 1 CmpE 583- Web Semantics: Theory and Practice ONTOLOGY INTEGRATION Atilla ELÇİ Computer.
Logics for Data and Knowledge Representation Applications of ClassL: Lightweight Ontologies.
ICS 321 Fall 2011 The Relational Model of Data (i) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 8/29/20111Lipyeow.
Logics for Data and Knowledge Representation
[ Part III of The XML seminar ] Presenter: Xiaogeng Zhao A Introduction of XQL.
XML and Database.
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
Lecture 7: Foundations of Query Languages Tuesday, January 23, 2001.
Semi-structured Data In many applications, data does not have a rigidly and predefined schema: –e.g., structured files, scientific data, XML. Managing.
Holistic Twig Joins Optimal XML Pattern Matching Nicolas Bruno Columbia University Nick Koudas Divesh Srivastava AT&T Labs-Research SIGMOD 2002.
Welcome to CPSC 534B: Information Integration Laks V.S. Lakshmanan Rm. 315.
Relational-Style XML Query Taro L. Saito, Shinichi Morishita University of Tokyo June 10 th, SIGMOD 2008 Vancouver, Canada Presented by Sangkeun-Lee Reference.
Logics for Data and Knowledge Representation ClassL (part 1): syntax and semantics.
Incomplete Answers over Semistructured Data Kanza, Nutt, Sagiv PODS 1999 Slides by Yaron Kanza.
1 Efficient Processing of Partially Specified Twig Queries Junfeng Zhou Renmin University of China.
Querying Structured Text in an XML Database Shurug Al-Khalifa Cong Yu H. V. Jagadish (University of Michigan) Presented by Vedat Güray AFŞAR & Esra KIRBAŞ.
Computing Full Disjunctions
A Normal Form for XML Documents
CS405G: Introduction to Database Systems
Logics for Data and Knowledge Representation
Structure and Content Scoring for XML
A Graph-Based Approach to Learn Semantic Descriptions of Data Sources
Structure and Content Scoring for XML
A Framework for Testing Query Transformation Rules
Enhancing ER Diagrams to View Data Transformations Computed with Queries Carlos Ordonez, Ladjel Bellatreche UH (USA), ENSMA (France) 1.
Relax and Adapt: Computing Top-k Matches to XPath Queries
Introduction to XML IR XML Group.
Presentation transcript:

TOSS: An Extension of TAX with Ontologies and Similarity Queries Edward Hung, Yu Deng, V.S. Subrahmanian Department of Computer Science University of Maryland, College Park SIGMOD, Paris, France, June, 2004

Outline Introduction Ontologies and Integration Similarity Enhanced Ontology (SEO) TOSS Algebra Implementation and Experiments Related Work

Introduction [Jagadish et al., TAX: A Tree Algebra in XML, in DBPL, 2001] one of the best algebra developed for XML DB

DBLP SIGMOD Problems!

Problems Lack of lexical semantics in answering queries Find papers written by “J. Ullman”: J.D. Ullman? Jeffrey Ullman? Find papers whose at least one author is from “U.S. government”: U.S. Census Bureau? U.S. Army? High precision, poor recall Quality = (recall  precision) 1/2

Our approach Goal: extend and enhance the semantics of TAX to return high quality answers using ontology and similarity measures 1. c apture inter-term lexical relationships by ontology and integrate ontologies of different DBs 2. use existing similarity measures to enhance the integrated ontology 3. TOSS: extend TAX algebra to query with ontology and similarity

Motivating Examples and TAX DBLP and SIGMOD bibliographies in XML TAX selection projection product

DBLP

Pattern tree Selection

Pattern tree Selection

Pattern tree Selection

Pattern tree Projection

Product The product of two instances (two sets of trees) contains, for each pair of trees (from the two instances), a tree whose root is a new node (called tax_prod_root). X tax_prod_root

DBLP SIGMOD Problems!

Architecture

Ontology a set S S = {article, author, title} a partially ordered set (S, ≤ S ) part_of relation ≤ S = {(author, article), (title, article), (article, article), (author, author), (article, article)} a hierarchy (H, ≤ H ) is Hasse diagram for (S, ≤ S ) a DAG with a minimal set of edges s.t. there’s a path from u to v iff u ≤ S v ≤ H = {(author, article), (title, article)}

Ontology Suppose Σ is some finite set of strings and S is some set. An ontology w.r.t. Σ is a partial mapping Θ from Σ to hierarchies for S Σ = {part_of} Θ(part_of) = (H, ≤ H )

Ontology Suppose Σ is some finite set of strings and S is some set. An ontology w.r.t. Σ is a partial mapping Θ from Σ to hierarchies for S S = {article, author, title} Σ = {part_of} ≤ H = {(author, article), (title, article)} Θ(part_of) = (H, ≤ H )

Ontology Integration SIGMODDBLP

Ontology Integration SIGMODDBLP IC (interoperation constraints)

Ontology Integration Hierarchy graph associated with SIGMOD and DBLP

Ontology Integration Fusion of ontologies of SIGMOD and DBLP

Architecture

Similarity Enhanced Ontology A string similarity measure d S is any function which takes two strings X,Y and returns a non- negative real number such that  X, d S (X,X) = 0  X,Y, d S (X,Y) = d S (Y,X) Any string similarity measure can be used. For example: Levenstein distance which assigns a unit cost to every edit operation. d S (“relation”, “relational”)=2

Similarity Enhanced Ontology A similarity measure is any function which takes nodes A, B as input and returns a non-negative real numbers such that d(A,B) = min X  S,Y  T d S (X,Y), where d S is a string similarity measure, S,T are sets of strings contained in nodes A,B.

Similarity Enhanced Ontology Suppose H is an integrated hierarchy, d is a similarity measure and   0. (H’,  ) is a similarity enhancement of H w.r.t. d,  iff H’ is a hierarchy and  is a function from H to 2 H’ such that: the original partial orderings in H are preserved, and no unwarranted orderings are included all nodes mapped into the same node are similar to each other (by the threshold  ) two strings are similar iff they are jointly present in some node in (H’,  ) no redundant node whose string set is a subset of some other node

Similarity Enhanced Ontology An example ontologyIts similarity enhancement

Similarity Enhanced Ontology (H, d,  ) is similarity consistent iff there exists a similarity enhancement of H w.r.t. d, . Theorem If (H, d,  ) is similarity consistent, then all similarity enhancements of H are equivalent.

Architecture

TOSS Algebra A simple selection condition has the form X op Y op  { =, ,, , ~, instance_of, isa, part_of, subtype_of, above, below }, and X, Y are terms, i.e.,attributes (tag, content), types, or typed values v:  with v  dom(  ). A selection condition is a simple selection condition OR a conjunction/disjunction of two selection conditions

TOSS Algebra The pattern tree to find the titles of all papers in DBLP related to Microsoft (independently of the field in which Microsoft appears): #1.tag = inproceedings & #2.tag = title & #3.tag part_of inproceedings & #3.content ~ “Microsoft”

TOSS Algebra In order to ensure an embedding to be correct w.r.t. a semistructured DB with an associated similarity enhanced ontology, we define a selection condition to be well-typed if X and Y have a least common supertype  and there exists a function to convert their types to . we define (1) the type and value of a term w.r.t. a mapping h, and (2) the satisfaction of a selection condition We extend the following algebraic operations: selection, projection, product, union, intersection, difference.

Implementation and Experiments TOSS system implemented in Java built on top of Xindice DBMS Experiments: Recall and precision Scalability selection join

Recall and Precision =TAX X = TOSS (  =2) + = TOSS (  =3)

Quality of Answers QueryTAX TOSS (  =2)TOSS (  =3)

Quality of Answers =TAX X = TOSS (  =2) + = TOSS (  =3) Quality =

Related Work Wiederhold et al. [ICOT’ 94, EDBT’00,…] ontology algebra (LISP-style logical statements) IC (interoperation constraints) are not considered A similar concept as IC is considered in EDBT’00, but their integrated ontologies were not concise. Besides, we deal with XML documents.

Related Work [Jagadish et al., TAX: A Tree Algebra in XML, in DBPL, 2001] algebra to query XML documents ontology is not used [Al-Khalifa et al., Querying structured text in an XML database, in SIGMOD 2003] IR-style query to find relevant results with weighting and ranking support in run-time We use ontologies and similarity measures; we consider integration of ontologies and precompute SEO.

Questions and Answers Thank you very much!