Unifying Data and Domain Knowledge Using Virtual Views IBM T.J. Watson Research Center Lipyeow Lim, Haixun Wang, Min Wang, VLDB2007 2008. 1. 4 Summarized.

Slides:



Advertisements
Similar presentations
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Advertisements

RDF and RDB 1 Some slides adapted from a presentation by Ivan Herman at the Semantic Technology & Business Conference, 2012.
Ontology-based User Modeling for Web-based Information Systems Anton Andrejko, Michal Barla and Mária Bieliková {andrejko, barla,
RDF and RDB 1 Some slides adapted from a presentation by Ivan Herman at the Semantic Technology & Business Conference, 2012.
Progress Update Semantic Web, Ontology Integration, and Web Query Seminar Department of Computing David George.
Unifying Data and Domain Knowledge Using Virtual Views Lipyeow Lim IBM T.J. Watson Research Ctr Haixun Wang IBM T.J. Watson Research Ctr. Min Wang IBM.
Xyleme A Dynamic Warehouse for XML Data of the Web.
The Hierarchy of Data Bit (a binary digit): a circuit that is either on or off Byte: 8 bits Character: each byte represents a character; the basic building.
Database Management: Getting Data Together Chapter 14.
Automatic Data Ramon Lawrence University of Manitoba
Attribute databases. GIS Definition Diagram Output Query Results.
Chapter 4: Organizing and Manipulating the Data in Databases
Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.
Managing & Integrating Enterprise Data with Semantic Technologies Susie Stephens Principal Product Manager, Oracle
Information Integration Intelligence with TopBraid Suite SemTech, San Jose, Holger Knublauch
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
Intro to MIS – MGS351 Databases and Data Warehouses Chapter 3.
Chapter 5 Lecture 2. Principles of Information Systems2 Objectives Understand Data definition language (DDL) and data dictionary Learn about popular DBMSs.
Fundamentals of Information Systems, Fifth Edition
Hexastore: Sextuple Indexing for Semantic Web Data Management
Organizing Data and Information AD660 – Databases, Security, and Web Technologies Marcus Goncalves Spring 2013.
Chapter 4: Organizing and Manipulating the Data in Databases
7.1 Managing Data Resources Chapter 7 Essentials of Management Information Systems, 6e Chapter 7 Managing Data Resources © 2005 by Prentice Hall.
CODD’s 12 RULES OF RELATIONAL DATABASE
DBSQL 14-1 Copyright © Genetic Computer School 2009 Chapter 14 Microsoft SQL Server.
Ultrawrap: SPARQL Execution on Relational Data Juan F. Sequeda, Daniel P. Miranker University of Texas - Austin ISWC 2009 Seoul National University Internet.
Master Informatique 1 Semantic Technologies Part 11Direct Mapping Werner Nutt.
1 CS 430 Database Theory Winter 2005 Lecture 17: Objects, XML, and DBMSs.
1 Lessons from the TSIMMIS Project Yannis Papakonstantinou Department of Computer Science & Engineering University of California, San Diego.
Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
Database A database is a collection of data organized to meet users’ needs. In this section: Database Structure Database Tools Industrial Databases Concepts.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
POLICY ENGINE Research: Design & Language IRT Lab, Columbia University.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Oracle Database 11g Semantics Overview Xavier Lopez, Ph.D., Dir. Of Product Mgt., Spatial & Semantic Technologies Souripriya Das, Ph.D., Consultant Member.
13 1 Chapter 13 The Data Warehouse Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Monitoring Business Processes with Queries VLDB2007 CatrielBeeri, AnatEyal, Tova Milo, AlonPilberg Summarized by Gong GI Hyun, IDS Lab., Seoul.
Knowledge Base Building Project 5 th meeting Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University,
Information Integration BIRN supports integration across complex data sources – Can process wide variety of structured & semi-structured sources (DBMS,
CS453: Databases and State in Web Applications (Part 2) Prof. Tom Horton.
Intro to GIS | Summer 2012 Attribute Tables – Part 1.
RDF-3X: a RISC-style Engine for RDF Presented by Thomas Neumann, Gerhard Weikum Max-Planck-Institut fur Informatik Saarbrucken, Germany Session 19: System.
1 Resolving Schematic Discrepancy in the Integration of Entity-Relationship Schemas Qi He Tok Wang Ling Dept. of Computer Science School of Computing National.
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
Steven Seida How Does an RDF Knowledge Store Compare to an RDBMS?
Dec. 13, 2002 WISE2002 Processing XML View Queries Including User-defined Foreign Functions on Relational Databases Yoshiharu Ishikawa Jun Kawada Hiroyuki.
Sesame: An Architecture for Storing and Querying RDF Data and Schema Inf. Yasser Ganji Saffar When they were out of sight Ali Baba.
Triple Storage. Copyright  2006 by CEBT Triple(RDF) Storages  A triple store is designed to store and retrieve identities that are constructed from.
What is OLAP?.
Object storage and object interoperability
Towards Unifying Vector and Raster Data Models for Hybrid Spatial Regions Philip Dougherty.
DB Tuning : Chapter 10. Optimizer Center for E-Business Technology Seoul National University Seoul, Korea 이상근 Intelligent Database Systems Lab School of.
CS 540 Database Management Systems
©2007 Really Strategies, Inc. CONFIDENTIAL 1 Native XML Content Management Philadelphia XML Users’ Group.
Semantic Data Extraction for B2B Integration Syntactic-to-Semantic Middleware Bruno Silva 1, Jorge Cardoso 2 1 2
Lock-Free Consistency Control for Web 2.0 Applications Jiang-Ming Yang 1,3, Hai-Xun Wang 2, Ning Gu 1, Yi-Ming Liu 1, Chun-Song Wang 1, Qi-Wei Zhang 1.
1 Storing and Maintaining Semistructured Data Efficiently in an Object- Relational Database Mo Yuanying and Ling Tok Wang.
Sesame A generic architecture for storing and querying RDF and RDFs Written by Jeen Broekstra, Arjohn Kampman Summarized by Gihyun Gong.
Implementation of Ontology Based Context-awareness Framework Ki-Chul Lee, Jung-Hoon Kim International Conference on Multimedia and Ubiquitous Engineering.
Copyright © 2006, Oracle. All rights reserved. Czinkóczki László oktató Using the Oracle Warehouse Builder.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
Introduction  Model contains different kinds of elements (such as hosts, databases, web servers, applications, etc)  Relations between these elements.
Intro to MIS – MGS351 Databases and Data Warehouses
RDF and RDB 1 Some slides adapted from a presentation by Ivan Herman at the Semantic Technology & Business Conference, 2012.
Building Trustworthy Semantic Webs
DataMart (Data Warehouse) Tool:
Database management systems
Presentation transcript:

Unifying Data and Domain Knowledge Using Virtual Views IBM T.J. Watson Research Center Lipyeow Lim, Haixun Wang, Min Wang, VLDB Summarized by Gong GI Hyun, IDS Lab., Seoul National University

Copyright  2008 by CEBT Background  DBMS originally designed for transaction data.  Current DBMSs are not ready to manipulate data in connection with knowledge.  An unending quest Database or Knowledge-base? New applications: the Semantic web, etc.  New extensions are required to bridge the gap between data representation and knowledge representation/inferencing IDS Lab. Seminar - 2Center for E-Business Technology

Copyright  2008 by CEBT A Motivating Example  RDBMS allows us to query wines through attributes ID, Type, Origin, Maker, Price.  Human intelligence operates in a quite different way. Humans have ability to combine data with the domain Knowledge. IDS Lab. Seminar - 3Center for E-Business Technology

Copyright  2008 by CEBT A Motivating Example  Query 1 To find wines that originate from the US. RDB Case : – Select W.ID From Wine as W Where W.Origin = ‘US’; – Result : No result Human’s answer : Zinfandel – Zinfandel’s Origin EdnaValley is located in California. IDS Lab. Seminar - 4Center for E-Business Technology

Copyright  2008 by CEBT A Motivating Example  Query 2 Which wine is a red wine? RDB Case : – Select W.ID From Wine as W Where W.hasColor = ‘red’; – Result : ERROR! – HasColor is not in the schema of the wine table. Human’s Answer : Zinfandel – Zinfandel is red.  Both the user and the DBMS must know what HasColor stands for when it appears in a query, and how to derive the value for HasColor for any given wine. IDS Lab. Seminar - 5Center for E-Business Technology

Copyright  2008 by CEBT Domain Knowledge from OWL Ontology  Wine Ontology from the web ontology language OWL (W3C)  Extract class hierarchies, (transitive) properties, implications from OWL IDS Lab. Seminar - 6Center for E-Business Technology

Copyright  2008 by CEBT Challenges  How to incorporate domain knowledge (ontology) into a RDBMS? Relational Data model remains ill-suited for semi-structured data.  How to integrate relational data with domain knowledge?  How to query relational data with meaning ?  How to process such queries ? IDS Lab. Seminar - 7Center for E-Business Technology

Copyright  2008 by CEBT Overview of our solution  Create a relational virtual view on top of the data and the domain knowledge. Data and knowledge can be queried together. New knowledge can be derived  The virtual view is an interface through which users can query data, domain knowledge, and derived knowledge in a seamlessly unified manner.  Rewrite query on virtual view. IDS Lab. Seminar - 8Center for E-Business Technology

Copyright  2008 by CEBT The Virtuality of the View  Users create virtual views over the relational data and the ontology.  Virtual columns/attributes not in original data.  Virtual columns not materialized -- inferred from the ontology. IDS Lab. Seminar - 9Center for E-Business Technology

Copyright  2008 by CEBT Creating the Vitual View CREATE VIEW WineView(Id, Type, Origin, Maker, Price, LocatedIn) AS SELECT W.*, R.Regions FROM Wine AS W, RegionKnowledge AS R WHERE W.Origin = R.region IDS Lab. Seminar - 10Center for E-Business Technology

Copyright  2008 by CEBT Queries  Query 1 : To find wines that originate from the US Select ID from WineView Where ‘US’ in LocatedIn  Query 2 : Which wine is a red wine?  Select ID from WineView Where hasColor = ‘red’; IDS Lab. Seminar - 11Center for E-Business Technology

Copyright  2008 by CEBT Physical Storage Layer  The ontology is modeled as semi-structured data. Traditional RDMSs cannot handle directly.  Hybrid Relational-XML DBMS IBM DB2 9 PureXML supports XML IDS Lab. Seminar - 12Center for E-Business Technology

Copyright  2008 by CEBT Ontology Repository  Ontology repository extracts several types of information from the ontology files including class hierarchies, implication rules, transitive properties.  Class Hierarchies subClassOf isA IDS Lab. Seminar - 13Center for E-Business Technology  Transitive Property TransitiveProperty  Implication rules Implication graph

Copyright  2008 by CEBT Query Expanding  SELECT V.Id FROM WineView AS V WHERE.hasColor=White; (Type=WhiteWine) → (hasColor=white) (Type=Riesling) → (hasColor=white)  SELECT V.Id FROM Wine AS W WHERE W.type=WhiteWine OR W.type=Riesling; IDS Lab. Seminar - 14Center for E-Business Technology

Copyright  2008 by CEBT Experiment  Investigate time to rewrite the queries on virtual views.  Measurement: rewriting time averaged over 5 randomly generated data sets.  Some tweaks : Remove dead nodes Memoization techniques Pre-computation of predicate re-writing IDS Lab. Seminar - 15Center for E-Business Technology

Copyright  2008 by CEBT Experiment  Implication Graph Density  Size of transitive property trees IDS Lab. Seminar - 16Center for E-Business Technology

Copyright  2008 by CEBT Related Works  Ontology Tools OntoEdit : Use a file system to store ontology. RStar, KAON : Allow the ontology data to be stored in a RDB  Two Limitations of this loosely coupled approach DBMS users cannot reference ontology data directly. Ontology related query processing cannot leverage the query processing and optimization power of a DBMS.  A recent advance in ontology management in DBMSs was introduced by Oracle. IDS Lab. Seminar - 17Center for E-Business Technology

Copyright  2008 by CEBT Conclusion  Framework for putting a little semantics into relational SQL systems.  Users register ontologies in DBMS and links them with relational data by creating virtual views.  Virtual columns in the virtual views are not materialized.  Queries on the virtual columns are rewritten to predicates on base table columns.  Future work: performance issues IDS Lab. Seminar - 18Center for E-Business Technology