Unifying Data and Domain Knowledge Using Virtual Views Lipyeow Lim IBM T.J. Watson Research Ctr Haixun Wang IBM T.J. Watson Research Ctr. Min Wang IBM.

Slides:

Advertisements

Similar presentations

Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.

Advertisements

1 ICS-FORTH EU-NSF Semantic Web Workshop 3-5 Oct Christophides Vassilis Database Technology for the Semantic Web Vassilis Christophides Dimitris Plexousakis.

1 ICS-FORTH & Univ. of Crete SeLene November 15, 2002 A View Definition Language for the Semantic Web Maganaraki Aimilia.

Schema Matching and Query Rewriting in Ontology-based Data Integration Zdeňka Linková ICS AS CR Advisor: Július Štuller.

CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.

XML: Extensible Markup Language

GridVine: Building Internet-Scale Semantic Overlay Networks By Lan Tian.

The Volcano/Cascades Query Optimization Framework

CSE 6331 © Leonidas Fegaras XML and Relational Databases 1 XML and Relational Databases Leonidas Fegaras.

Jennifer Widom NoSQL Systems Overview (as of November 2011 )

Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.

Interactive Generation of Integrated Schemas Laura Chiticariu et al. Presented by: Meher Talat Shaikh.

A First Attempt towards a Logical Model for the PBMS PANDA Meeting, Milano, 18 April 2002 National Technical University of Athens Patterns for Next-Generation.

Inbal Yahav A Framework for Using Materialized XPath Views in XML Query Processing VLDB ‘04 DB Seminar, Spring 2005 By: Andrey Balmin Fatma Ozcan Kevin.

CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky.

Chapter One Overview of Database Objectives: -Introduction -DBMS architecture -Definitions -Data models -DB lifecycle.

RDF (Resource Description Framework) Why?. XML XML is a metalanguage that allows users to define markup XML separates content and structure from formatting.

Information storage: Introduction of database 10/7/2004 Xiangming Mu.

Chapter 4 The Relational Model.

The Relational Model. Review Why use a DBMS? OS provides RAM and disk.

Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.

Unifying Data and Domain Knowledge Using Virtual Views IBM T.J. Watson Research Center Lipyeow Lim, Haixun Wang, Min Wang, VLDB Summarized.

CODD’s 12 RULES OF RELATIONAL DATABASE

SQL Databases are a Moving Target Juan F. Sequeda – Syed Hamid Tirmizi –

IDB, SNU Dong-Hyuk Im Efficient Computing Deltas between RDF Models using RDFS Entailment Rules (working title)

Master Informatique 1 Semantic Technologies Part 11Direct Mapping Werner Nutt.

Querying Structured Text in an XML Database By Xuemei Luo.

1 CS 430 Database Theory Winter 2005 Lecture 17: Objects, XML, and DBMSs.

1/26/2004TCSS545A Isabelle Bichindaritz1 Database Management Systems Design Methodology.

DATABASE MGMT SYSTEM (BCS 1423) Chapter 5: Methodology – Conceptual Database Design.

Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.

Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.

Dimitrios Skoutas Alkis Simitsis

“INTRODUCTION TO DATABASE AND SQL”. Outlines 2  Introduction To Database  Database Concepts  Database Properties  What is Database Management System.

Introduction to Database Systems1. 2 Basic Definitions Mini-world Some part of the real world about which data is stored in a database. Data Known facts.

SQL Fundamentals SQL: Structured Query Language is a simple and powerful language used to create, access, and manipulate data and structure in the database.

Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.

CS Introduction to AI Tutorial 8 Resolution Tutorial 8 Resolution.

Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.

Visual Programing SQL Overview Section 1.

XML and Database.

DISCRETE COMPUTATIONAL STRUCTURES CSE 2353 Fall 2010 Most slides modified from Discrete Mathematical Structures: Theory and Applications by D.S. Malik.

Steven Seida How Does an RDF Knowledge Store Compare to an RDBMS?

Mr.Prasad Sawant, MIT Pune India Introduction to DBMS.

Session 1 Module 1: Introduction to Data Integrity

NoSQL Systems Motivation. NoSQL: The Name  “SQL” = Traditional relational DBMS  Recognition over past decade or so: Not every data management/analysis.

Object storage and object interoperability

A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.

Semantic Data Extraction for B2B Integration Syntactic-to-Semantic Middleware Bruno Silva 1, Jorge Cardoso 2 1 2

Experience with XML Schema Ashok Malhotra Schema Usage  Mapping XML Schema and XML documents controlled by the Schema to object classes and instances.

Chapter 3: Relational Databases

RDF storages and indexes Maciej Janik September 1, 2005 Enterprise Integration – Semantic Web.

Lecture 15: Query Optimization. Very Big Picture Usually, there are many possible query execution plans. The optimizer is trying to chose a good one.

Presented by Kyumars Sheykh Esmaili Description Logics for Data Bases (DLHB,Chapter 16) Semantic Web Seminar.

1 Storing and Maintaining Semistructured Data Efficiently in an Object- Relational Database Mo Yuanying and Ling Tok Wang.

Author: Akiyoshi Matonoy, Toshiyuki Amagasay, Masatoshi Yoshikawaz, Shunsuke Uemuray.

LECTURE TWO Introduction to Databases: Data models Relational database concepts Introduction to DDL & DML.

Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.

CS 405G: Introduction to Database Systems

Learn about relations and their basic properties

Probabilistic Data Management

Tools for Memory: Database Management Systems

NoSQL Systems Overview (as of November 2011).

“Introduction To Database and SQL”

Introduction to Database Management System

Chapter 2 Database Environment.

Chapter 2: Intro to Relational Model

Chapter 2: Intro to Relational Model

Query Optimization.

Presentation transcript:

Unifying Data and Domain Knowledge Using Virtual Views Lipyeow Lim IBM T.J. Watson Research Ctr Haixun Wang IBM T.J. Watson Research Ctr. Min Wang IBM T.J. Watson Research Ctr.

Overview Better integration of data management and knowledge management. – Unfortunately, current DBMSs, albeit improved by many extensions over the past years, are not ready to manipulate data in connection with knowledge. Semantic Data Management – Domain Knowledge merged with DBMS framework so that users can query both data and domain knowledge as relational data.

Motivation Queries: Which wine is from the United States? Which one is red?

Benefits of DBMS DBMS provides a wide range of transactional and analytical support that is indispensable in data processing. SQL can insulate the users from the details of data representation and manipulation. SQL offers query optimization.

Goal To find wines that originate from the US, we may naively issue the following SQL query: SELECT W.Id FROM Wine AS W WHERE W.Origin = ‘US’; To find red wines, we may naively issue the following SQL query: SELECT W.Id FROM Wine AS W WHERE W.hasColor = ‘red’;

Challenges Ontology is currently represented as semistructured data, using either OWL or RDF. The relational data model remains ill-suited for storing and processing semi-structured data efficiently. XML models come with a big storage and processing overhead. Transitivity is difficult to express and process in an RDBMS. Therefore neither pure XML databases or relational databases can do the task alone.

Approach Framework

Approach A virtual view is created by specifying how the data in relational tables relate to the domain knowledge encoded as ontologies in the ontology repository. Ontology files registered with the ontology repository. Class hierarchies and transitive properties are extracted into trees, and implications are extracted into an implication graph. Framework processes the queries on the virtual view by re-writing them into queries on both the base table and the ontological information in the ontology repository.

Virtual View Virtual View incorporates both data and domain knowledge. LocatedIn obtained from locatedIn property of ontology HasColor obtained from: (type = Zinfandel) ⇒ (hasColor = red) (type = Riesling) ⇒ (hasColor = white)

Querying the Virtual View To find wines that originate from the US, we issue the following SQL query against the virtual view: SELECT W.Id FROM WineView AS W WHERE ‘US’ IN W.LocatedIn; To find red wines, we issue the following SQL query against the virtual view: SELECT W.Id FROM WineView AS W WHERE W.HasColor= ‘red’;

Virtual view creation Definition 1 Create a Virtual View CREATE VIRTUAL VIEW View(Column1, · · ·, ColumnN) AS SELECT head1, · · ·, headN FROM BaseTable AS T, Ontology AS O WHERE constructor AND p1 AND · · · AND pk AND m1 AND · · · AND mj Integration between data and domain : 1.Constructor 2.Constraints 3.Mappings

Virtual view creation (cont’d) Constructor: O.type = expr Instantiates ontology object of type O.type for a record in the relational table. Constraints: p1,…,pk Each pi can be a traditional boolean predicate on the relational table T, for example, T.price ≥ 30 Each pi can also be an ontological constraint, which is a triplet in the form of (Object1, Relation, Object2) Mappings: m1,…,mk Create a mapping between the schema of the base table and the properties in the ontology. For example, W.origin = O.locatedIn

Example CREATE VIRTUAL VIEW WineView( Id, Type, Origin, Maker, Price, LocatedIn, HasColor) AS SELECT W.*, O.locatedIn, O.hasColor FROM Wine AS W, WineOntology AS O WHERE O.type=W.type /*constructor*/ AND (O.type isA ’Wine’) /*constraint*/ AND W.origin = O.locatedIn /*mapping* / View Triples

HYBRID RELATIONAL-XML DBMS Hybrid relational-XML DBMSs for physical level support. Examples in this paper consider IBM’s DB2. XML is supported as a basic data type. Users can create a table with one or more XML type columns. CREATE TABLE ClassHierarchy (id integer, name VARCHAR(27), hierarchy XML) To insert an XML document into a table, it must be parsed, for instance, with SQL/X function or XMLParse.

Ontology Repository

Ontology The user registers each ontology file identifier (ID) via registerOntology( ontid, ontology File ) When an existing logical ontology in the repository needs to be removed, the stored procedure dropOntology( ontid ) is called with the ontology ID. A Horn rule or clause is a logic expression of the form H ← A1 ∧... ∧ Am ∧ ∼ Am+1 ∧... ∧ ∼ An

Ontology Tables XML

Class Hierarchies Correspond to subsumption rules dealing with: a.the special subClassOf relationship subClassOf (A,C) ← subClassOf (A,B) ∧ subClassOf (B,C) b.and isA relationship isA(B,X) ← isA(A,X) ∧ subClassOf (A,B).

OWL Representation

Ontology(Class hierarchy)

Transitive Properties Corresponds to subsumption rules dealing with transitive relationships defined in the ontology by the ontology-author. locatedIn(A,C) ← locatedIn(A,B) ∧ locatedIn(B,C) The facts can be extracted from the ontology into a tree representation to facilitate query re-writing and processing.

OWL Representation... Example:US California Texas

Transitive locatedIn Property

Implication Rules Captures non-recursive rules encoded in the ontology,represented internally as an implication graph. An implication graph G is a directed acyclic graph consisting of two types of vertices and two types of edges. Predicate nodes P(G) are associated with atoms in the of Horn clauses. Conjunction nodes C(G) represent the conjunction of two or more atoms in the body of a Horn clause

OWL... Can be converted into a conjunction of Horn Rules: [isA(Zinfandel,X)  hasColor(X,Red)] & [isA(Zinfandel,X)  hasSugar(X,Dry)].

Ontology Properties (implications) (type=RedBurgundy)  (type=Burgundy)  (type=RedWine) (type=RedBurgundy)  (madeFromGrape=PinotNoirGrape) (type=RedBurgundy)  (madeFromGrape.cardinality=1) (type=RedWine)  (type=Wine)  (hasColor=red) (type=Zinfandel)  (hasColor=red) (type=Zinfandel)  (hasSugar=dry)

Implication Graph Example

Query Processing R denotes the set of recursive relations View triples refer to the mapping between the ontology, the relation database and the virtual view. A view triple (b, r, v) encodes a binary association between any pair of a base table b, a property or relationship r in the ontology, and a column v in the virtual view.

View Triples relational view triples (W.id, ǫ, V.Id), (W.type, ǫ, V.Type), (W.origin, ǫ, V.Origin), (W.maker, ǫ, V.Maker ), (W.price, ǫ, V.Price) virtual column triples (ǫ, O.locatedIn, V.LocatedIn), (ǫ, O.hasColor, V.HasColor) ontology triples (W.type, O.type, ǫ), (W.origin, O.locatedIn, ǫ) Query

Algorithm 1 REWRITE(Q,G,R, V) Input: Q is a set of atomic predicates, G is the implication graph, R is the set of recursive implications, V is the view definition Output: Q’ is the set of expanded predicate expression 1: Let Q = {A1,A2,A3,...} 2: Q′ =null; 3: for all Ai in Q do 4: Let Ai = (vcol, op, value) 5: (b, r, vcol)  getViewTriple(V, vcol ) 6: if r = ǫ then 7: Q′  Q′+ [ {(b, op, value)} 8: else 9: a  findRuleNode(G,R, (r, op, value)) 10: if a not found then 11: Q′  Q′ + {false} 12: else 13: A′  EXPAND(a, G, R, V) 14: if A′ = null then 15: Q′  Q′ + {false} 16: else 17: Q′  Q′ + Ai’ 18: return Q′

Algorithm 1 REWRITE(Q,G,R, V) Rewriting algorithm for a WHERE-clause Q in a query on a virtual view. Takes as input the set of atoms from the WHERE-clause Q, the implication graph G, the set of recursive relationships R,and the virtual view definition V, and outputs a rewritten query expression Q′. The getViewTriple procedure retrieves from the DBMS catalog tables the view triple associated with the column in the atom. If the column is a virtual column, EXPAND is called.

Algorithm 2 EXPAND(h,G,R, V) Input: h is the node in implication graph G to be expanded, R is the set of recursive relationships, and V is the virtual view definition Output: e is the expanded predicate expression 1: if h is a ground node then 2: (b, r, v)  getViewTriple(V, pred(h)) 3: e  {(b, op(h), val(h))} 4: if h is a recursive node then 5: e  e +‘ISSUBSUMED( val(h), pred(h), b)’ 6: /* R-Expansion */ 7: if h is a recursive node then 8: for all s in subsumedAtoms(R, h) do 9: if s in P(G) then 10: for all rulebody in dependentExp(s,G) do 11: tmp  null 12: for all i in rulebody do 13: tmp  tmp^EXPAND(i,G,R, V) 14: e  e || tmp 15: /* G-Expansion */ 16: for all rulebody in dependentExp(h,G) do 17: tmp =null ; 18: for all I in rulebody do 19: tmp  tmp^EXPAND(i,G,R, V) 20: e  e || tmp 21: return e

Algorithm 2 EXPAND(h,G,R, V) If h is a ground node,it means that h is associated with a base table column and can be checked against the base table column directly. If h is also recursive, then an additional subsumption check needs to be added to the rewritten predicate. For the case where h is not a ground node, it is clear that the algorithm needs to traverse the implication graph. We also need to expand h with non-recursive rules contained in the implication graph G, i.e. G-expansions

Example Consider the following SQL query on the virtual view WineView(id, hasColor) SELECT V.Id FROM WineView AS V WHERE V.hasColor=v1; where the virtual view definition consists of the following triples: {(id, ǫ, id),(ǫ, A, hasColor),(type, B, ǫ),(origin, D, ǫ)}

Example Algorithm 1 calls the EXPAND procedure to expand the query predicate. Only B=v2 and D=v4 are ground nodes in G, so EXPAND tries to traverse G and the tree for C towards the ground nodes. SELECT W.id FROM Wine AS W WHERE W.type=v2 AND W.origin=v4; Implication Graph

OPTIMIZATION Definition (Live and Dead Nodes) For a given virtual view definition, a node n ∈ V(G) from the implication graph G is a live node if 1.n ∈ P(G) and n is a ground node or a recursive node, or 2.there exists some u ∈ Adj(n) such that u is a live node Conversely, a node n that is not a live node is called a dead node

Optimization Mark nodes in the implication graph G that are dead because there is no path from those nodes to any recursive or ground nodes. The expansion of the atoms in a rule body can be safely skipped if at least one of the atoms is dead. To prune the number of expansions due to R- expansion, mark nodes in the recursive tree R that are not associated with live nodes. The fourth optimization uses memoization techniques to avoid traversing nodes in the implication graph more than once.

Experiments

Experiments (cont’d)