1 Ontology Enabled Data Discovery and Integration Kai Lin San Diego Supercomputer Center University of California, San Diego A. K. Sinha, Z. Malik, A.

Slides:



Advertisements
Similar presentations
Three-Step Database Design
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
An Introduction to RDF(S) and a Quick Tour of OWL
CS570 Artificial Intelligence Semantic Web & Ontology 2
Of 27 lecture 7: owl - introduction. of 27 ece 627, winter ‘132 OWL a glimpse OWL – Web Ontology Language describes classes, properties and relations.
Chapter 8: Web Ontology Language (OWL) Service-Oriented Computing: Semantics, Processes, Agents – Munindar P. Singh and Michael N. Huhns, Wiley, 2005.
1 Semantic Web Technologies: The foundation for future enterprise systems Okech Odhiambo Knowledge Systems Research Group Strathmore University.
Ontology Notes are from:
1 An Introduction To The Semantic Web. 2 Information Access on the Web Find an mp3 of a song that was on the Billboard Top Ten that features a cowbell.
Chapter 8: Web Ontology Language (OWL) Service-Oriented Computing: Semantics, Processes, Agents – Munindar P. Singh and Michael N. Huhns, Wiley, 2005.
Dr. Alexandra I. Cristea RDF.
The Semantic Web – WEEK 5: RDF Schema + Ontologies The “Layer Cake” Model – [From Rector & Horrocks Semantic Web cuurse]
COMP 6703 eScience Project Semantic Web for Museums Student : Lei Junran Client/Technical Supervisor : Tom Worthington Academic Supervisor : Peter Strazdins.
The Semantic Web Week 12 Term 1 Recap Lee McCluskey, room 2/07 Department of Computing And Mathematical Sciences Module Website:
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
From SHIQ and RDF to OWL: The Making of a Web Ontology Language
A Really Brief Crash Course in Semantic Web Technologies Rocky Dunlap Spencer Rugaber Georgia Tech.
Nancy Ide Vassar College USA Resource Definition Framework A Tutorial EUROLAN 2003 July 28 - August 8 Bucharest - Romania.
Ontologies: Making Computers Smarter to Deal with Data Kei Cheung, PhD Yale Center for Medical Informatics CBB752, February 9, 2015, Yale University.
Chapter 6 Understanding Each Other CSE 431 – Intelligent Agents.
An Introduction to Description Logics. What Are Description Logics? A family of logic based Knowledge Representation formalisms –Descendants of semantic.
1 MASWS Multi-Agent Semantic Web Systems: OWL Stephen Potter, CISA, School of Informatics, University of Edinburgh, Edinburgh, UK.
Okech Odhiambo Faculty of Information Technology Strathmore University
INF 384 C, Spring 2009 Ontologies Knowledge representation to support computer reasoning.
1 Representing Data with XML September 27, 2005 Shawn Henry with slides from Neal Arthorne.
OWL and SDD Dave Thau University of Kansas
Logics for Data and Knowledge Representation
Dept. Computer Science, Korea Univ. Intelligent Information System Lab. 1 Sohn Jong-Soo Intelligent Information System lab. Department of Computer Science.
RDF and OWL Developing Semantic Web Services by H. Peter Alesso and Craig F. Smith CMPT 455/826 - Week 6, Day Sept-Dec 2009 – w6d21.
Building an Ontology of Semantic Web Techniques Utilizing RDF Schema and OWL 2.0 in Protégé 4.0 Presented by: Naveed Javed Nimat Umar Syed.
SQL Databases are a Moving Target Juan F. Sequeda – Syed Hamid Tirmizi –
OWL 2 in use. OWL 2 OWL 2 is a knowledge representation language, designed to formulate, exchange and reason with knowledge about a domain of interest.
Chapter 9. 9 RDFS (RDF Schema) RDFS Part of the Ontological Primitive layer Adds features to RDF Provides standard vocabulary for describing concepts.
Michael Eckert1CS590SW: Web Ontology Language (OWL) Web Ontology Language (OWL) CS590SW: Semantic Web (Winter Quarter 2003) Presentation: Michael Eckert.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
Chapter 3 RDF Schema. Introduction RDF has a very simple data model RDF Schema (RDFS) enriches the data model, adding vocabulary and associated semantics.
Semantic Web - an introduction By Daniel Wu (danielwujr)
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
Chapter 3 RDF and RDFS Semantics. Introduction RDF has a very simple data model But it is quite liberal in what you can say Semantics can be given using.
OCM Ontology and Ontology Services August 14, 2012 NOAA, Boulder CO Peter Fox (RPI* and WHOI**) and *Tetherless.
EEL 5937 Ontologies EEL 5937 Multi Agent Systems Lecture 5, Jan 23 th, 2003 Lotzi Bölöni.
RDF & RDF Schema Machine Understandable Metadata for the Web Semantic Web - Spring 2006 Computer Engineering Department Sharif University of Technology.
1 Artificial Intelligence Applications Institute Centre for Intelligent Systems and their Applications Stuart Aitken Artificial Intelligence Applications.
Of 35 lecture 5: rdf schema. of 35 RDF and RDF Schema basic ideas ece 627, winter ‘132 RDF is about graphs – it creates a graph structure to represent.
RDF Schema (RDFS) RDF user communities need to define the vocabularies (terms) to indicate that they  are describing specific kinds or classes of resources.
OilEd An Introduction to OilEd Sean Bechhofer. Topics we will discuss Basic OilEd use –Defining Classes, Properties and Individuals in an Ontology –This.
OIL and DAML+OIL: Ontology Languages for the Semantic Web Sungshin Lim TOWARDS THE SEMANTIC WEB: Ontology-driven Knowledge.
The RDF meta model Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations of XML compared.
Important Concepts from the W3C RDF Vocabulary/Schema Sungtae Kim SNU OOPSLA Lab. August 19, 2004.
Representing Data with XML February 26, 2004 Neal Arthorne.
Practical RDF Chapter 12. Ontologies: RDF Business Models Shelley Powers, O’Reilly SNU IDB Lab. Taikyoung Kim.
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 12 RDF, OWL, Minimax.
EEL 5937 Ontologies EEL 5937 Multi Agent Systems Lotzi Bölöni.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES1 Towards a Generic Framework for Semantic Data Registration and Integration in Geosciences Kai.
Motivation Dynamically identify and understand information sources Provide interoperability between agents in a semantic manner Enable distributed extensible.
Semantic web course – Computer Engineering Department – Sharif Univ. of Technology – Fall RDF & RDF Schema Machine Understandable Metadata for the.
WonderWeb. Ontology Infrastructure for the Semantic Web. IST Project Review Meeting, 11 th March, WP2: Tools Raphael Volz Universität.
Enable Semantic Interoperability for Decision Support and Risk Management Presented by Dr. David Li Key Contributors: Dr. Ruixin Yang and Dr. John Qu.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Of 24 lecture 11: ontology – mediation, merging & aligning.
1 Ontology Enabled Data Integration Kai Lin San Diego Supercomputer Center University of California, San Diego.
OWL (Ontology Web Language and Applications) Maw-Sheng Horng Department of Mathematics and Information Education National Taipei University of Education.
The Semantic Web By: Maulik Parikh.
Building Trustworthy Semantic Webs
Ontology.
ece 720 intelligent web: ontology and beyond
Chapter 3 RDF and RDFS Semantics
Semantic-Web, Triple-Strores, and SPARQL
Presentation transcript:

1 Ontology Enabled Data Discovery and Integration Kai Lin San Diego Supercomputer Center University of California, San Diego A. K. Sinha, Z. Malik, A. Rezgui, A. Dalton Virginia Tech

2 Motivations A better way to discover and understand datasets Use the knowledge in ontologies to find datasets A better way to query datasets Query through ontologies without knowing the schemas A better way to integrate multiple datasets Integrate multiple datasets on-the-fly if they are mapped to ontologies

3 What Is Ontology A formal, explicit specification of a shared conceptualization unambiguous definition of all concepts, attributes and relationships machine-readability commonly accepted understanding conceptual model of a domain

4 Why Represent Domain Knowledge as Ontology Separate domain knowledge module from the operational module Configurable knowledge module Share and reuse domain knowledge Analyze domain knowledge

5 What’s Inside An Ontology? Concepts: Classes + Class-hierarchy –instances Properties: often also called “Roles” or “Slots” –labeled instance-value-pairs Axioms/Relations: –relations between classes (disjoint, covers) –inheritance (multiple? defaults?) –restrictions on slots (type, cardinality) –Characteristics of slots (symm., trans., …) reasoning tasks: –Classification: Which classes does an instance belong to? –Subsumption: Does a class subsume another one? –Consistency checking: Is there a contradiction in my axioms/instances?

6 Resource Description Framework (RDF) XML Schema is not enough for semantics only describe Grammar, i.e. syntax of single documents can not express inheritance for concepts no means to express complex integrity constraints in an unambiguous way Resource Description Framework (RDF) an infrastructure for the encoding, exchange and reuse of structured metadata Peter Morris Peter Morris page.html The author of ‘page.html‘ is Peter Morris What is the “correct” way of expressing it?

7 RDF Idea RDF is intended to provide a simple way for making statements about resources Resources objects that are uniquely identified by an URI (Uniform Resource Identifier) Anything can have a URI. an entire Web page, a whole collection of pages e.g. an entire Website, object that is not directly accessible via the Web such as a printed book. Property a specific aspect, characteristic, attribute, or relation used to describe a resource has a specific meaning, defines its permitted values Lives-In, CarColor, WorkFor, HasA, IncludedIn, hasAuthor… Statement a specific resource together with a named property plus the value of that property for that resource. Each RDF statement can be written down as a triple (Subject, Property, Object) or a graph Resource property Value Resource

8 A RDF Example <rdf:RDF xmlns:rdf = “ xmlns:dc = “ Peter Morris Peter Morris hasName April 1,2004 creationDate English

9 A General RDF Format value of property-A value of property-B Value-C Convention: A capital letter to start a type (class) name A lowercase letter to start a property name

10 RDF Schema (RDFS) Core Class rdfs:Resource rdfs:Literal rdf:XMLLiteral rdfs:Class rdfs:Property rdfs:DataType rdfs:Container Core Property rdf:type rdfs:subClassOf rdfs:subPropertyOf rdfs:domain rdfs:range rdfs:label rdfs:comment RDFS is a simple ontology language RDF: triples for making assertions about resources RDFS extends RDF with “schema vocabulary”, e.g.: –Class, Property –type, subClassOf, subPropertyOf –range, domain  representing simple assertions, taxonomy + typing

11 RDFS Example ResourceClass Property HoverVehicle Company Number Vehicle SeaVehicleLandVehicle subClassOf type producedBy type numberOfEngine

12 RDFS too weak to describe resources in sufficient detail: –No localised range and domain constraints Can’t say that the range of hasChild is person when applied to persons and elephant when applied to elephants –No existence/cardinality constraints Can’t say that all instances of person have a mother that is also a person, or that persons have exactly 2 parents –No transitive, inverse or symmetrical properties Can’t say that isPartOf is a transitive property, that hasPart is the inverse of isPartOf or that touches is symmetrical –No in/equality Can’t say that a class/instance is the same as some other class/instance, can’t say that some classes/instances are definitely disjoint/different. –No boolean algebra Can’t say that that one class is the union, intersection, complement of other classes, etc. Limitations of RDFS

13 OWL Language - Overview Three species of OWL –OWL DL stays in Description Logic fragment –OWL Lite is “easier to implement” subset of OWL DL –OWL Full is union of OWL syntax and RDF OWL DL based on Description Logic –In fact it is equivalent to SHOIN (D n ) DL OWL DL Benefits from many years of DL research –Well defined semantics –Formal properties well understood (complexity, decidability) –Known reasoning algorithms –Implemented systems (highly optimised) OWL full has all that and all the possibilities of RDF/RDFS which destroy decidability Full DL Lite

14 Full DL Lite OWL Full Allow meta-classes etc OWL DL Negation (disjointWith, complementOf) unionOf Full Cardinality Enumerated types (oneOf) OWL Light (sub)classes, individuals (sub)properties, domain, range intersection (in)equality cardinality 0/1 datatypes inverse, transitive, symmetric hasValue someValuesFrom allValuesFrom RDF Schema OWL Layers (Lite, DL, Full)

15 Ontology Inconsistency You may define Classes were no individual can fulfill its definition. Via reasoning engines such a definition can be found also in big ontologies. –Cow ≡ Animal ⊓ Vegetarian –Sheep ⊑ Animal –Vegetarian ≡  eats  Animal –MadCow ≡ Cow ⊓  eats.Sheep

16 Open/Close World Assumption Close World Assumption –The fact in the ontology describe completely what I know, all that is not in the ontology is assumed to be false.. Open World Assumption (used in OWL) –There are something not described by the ontology An ontology says: There is a train at 14:00 There is a train at 15:00 Is there a train at 17:00? no by Close World Assumption unknown by Open World Assumption

17 Resource Discovery in GEON A Resource Registration System for Data Providers –Register ontologies (domain knowledge) –Register datasets with metadata including data access information –Optionally register datasets to ontologies (which is crucial for data integration and smart search) A Search Engine for Data Users –Metadata based search –Spatial coverage based search –Temporal coverage based search –Concept based search Both are available through a public portal on the web

18 Metadata (ADN) GEON Data Registration System Resource Registration System SRB Metadata (ADN) Metadata (ADN) Metadata (ADN) Excel GeoTIFF Shapefile Catalog General InformationOntology Annotations Access Control Subjects Format Keywords Spatial coverage's Temporal coverage's ………… Integrated Resources Log Resource Metadata GEON Search Resource Schemas

19 Database Registration Table View Original Database Table Def View Def Published Database select tables and views to register GEON Mediator GEON JDBC Driver Application

20 Write Protection Mediator Database UPDATE B Only accepts SELECT statements Rejects any requests other than SELECT A B C B

21 Read Protection on Unregistered Tables and Views Mediator Database SELECT * FROM A An unregistered table or view is invisible to an end user The data in the table can’t be viewed by SELECT statement The schema can’t be fetched A B C B

22 Item Level Ontological Data Registration for Discovering The search engine uses ontologies to find more results, for example, the fact that Polygon is a subclass of GeometricalObject is used in the searching. Rectangle CirclePolygonSurface GeometricalObject_2D Ontology: Dataset Properties mentions uses has instances Search for GeometricalObject_2DReturn datasets associated with Polygon

23 Data Integration Challenges: Heterogeneities Syntactical Heterogeneity heterogeneous data format e.g vs. 02/04/04 Structural Heterogeneity heterogeneous data models and schemas e.g is saved as three columns or one columns Semantics Heterogeneity fuzzy metadata, terminology, “hidden” semantics, implicit assumptions GEON Preferred Solution: Datasets are semantically registered first Heterogeneities is resolved by registration

24 Database Integration Integration at three levels Level 1: Federation Based Integration Users should be knowledgeable to each databases Level 2: View Based Integration The intended users are somebody who want to do integration for others or make integration results reusable Level 3: Ontology Based Integration The easiest way for end users

25 Level 1: Federation Based Integration C AB G D F E C AB D GF E Mediator backend SELECT * FROM A, E WHERE …… Use SQL to query the federated database Structural and semantic heterogeneity should be solved by users themselves

26 Level 2: View Based Integration C AB G D F E C AB D GF E Mediator backend SELECT * FROM V, W WHERE …… Allow defining views on top of the federated databases Allow hiding the original backend schemas Integration results can be shared and reused VW

27 Level 3: Ontology Based Integration Require ontology annotations for backend databases Use simple ontology query language to query the integrated database Users don’t need know the backend schemas and local semantics C AB G D F E C AB D GF E Mediator backend Ontology Based Query

28 Ontology Enabled Data Integration Ontology Enabled Semantic Integration Challenges for Computer Scientists and Domain Scientists –Computer Scientists: build an integration system based on the ontological registration of datasets –Domain Scientists: create domain ontologies –Data Providers: register datasets to ontologies Ontology1 Ontology2 ontology3 dataset1dataset2dataset3 dataset4

29 Ontological Data Registration for Data integration Registering a dataset to an ontology for data integration is a procedure to generate a partial model of the ontology from the dataset itself From registration dataset individualsontology p Not all the constraints in the ontology are satisfied by the generated individuals

30 Associate one or more columns under an optional SQL condition to a selected class in the ontology Provide a mapping method if no explicit names of individuals should be generated Registering Relational Tables to Ontology Classes ……Latitude……Longitude…… …… Location (23.5, 47.9) is the name of an individual of the class Location Same name indicates the same location RockSample GeologicAge …… Jurassic/Triassic Precambrian ………… GeologicalAge PrecambrianCenozoicPaleozoic

31 Registering Tables to Ontology Object Properties Associate two entities which are already registered to the domain class and the range class of a selected object property in the ontology ……RockSampleID……PERIOD…… Rock GeologicAge hasAge

32 ODAL (Ontological Database Annotation Language) <odal:NamedIndividuals odal:id="RockSample" odal:database="VTDatabase"> Samples RockTexture RockGeoChemistry ModalData MineralChemistry Images ssID GUI generate to ODAL processor The values in the column ssID of the table Samples, RockTexture, RockGeoChemistry, ModalData,MineralChemistry and Images represent instances of RockSample Create a partial model of ontologies from database Independent on any GUI Independent on any concrete implementations reusable

33 ODAL: Import Ontologies The Ontologies used for annotating a database can be imported as follows: <odal:ODAL xmlns:rdf = “ xmlns:owl=" xmlns:odal = “ > ……

34 ODAL: Database Connection Declaration The target databases for making annotation is declared as follows: <odal:ODAL xmlns:rdf = “ xmlns:owl=" xmlns:odal = “ > …… Oracle oracle.sdsc.edu 3456 Publications ……

35 ODAL: Simple Named Individuals <odal:NamedIndividuals odal:id="BookInTableBookPrice" odal:database="PublicationDatabase" > Collections book-price ISBN Suppose the book ontology contains a class Book and the schema Collection contains a table book-price with a column ISBN. odal:id gives a name to the declaration, and represents the set of the individuals generated by the statement. The statement says that each value in the column ISBN represents a book individual.

36 ODAL: Named Individuals from Multiple Columns California Rock-Sample Latitude Longitude Suppose an ontology contains a class Location and a database table Rock-Sample with two columns Latitude and Longitude. The statement says that a pair of latitude and longitude gives a location

37 ODAL: Named Individuals with Conditions employee EmployeeId ]] employee EmployeeId ]] A condition in an odal:Condition element should be a boolean expression which is valid to be used in any WHERE clauses of SQL queries

38 ODAL: Data Type Property Declaration Person ssn person …8… … …age…SSN… Person double hasAge

39 Usually we don’t make join on individuals cross different resources A set of datatype properties can be declared as a key for a class in the ontology. We do join cross multiple resources based on keys. e.g. { hasLatitude, hasLongitude} can be declared as a key of Location Two locations from different resources are same if they have the same latitude and longitude Conditions for Joining from Different Resources Rock RockSampleID …... RockID …… We don’t know whether represents the same rock in the two resources. By default, we assume they are not.

40 SOQL (Simple Ontology Query Language) Query single or integrated resources via ontologies (i.e., high level logical views) independent on any physical presentation (i.e. schemas) RockSampleLocation ValueWithUnit float location hasSiO2 value latlong unit string SELECT X.location.*; FROM RockSample X WHERE X.location.lat > 60 AND X.location.long > 100 AND X.hasSiO2.value < 30 AND X.hasSiO2.unit =‘weightPercetage ’ GUI generate to SOQL processor

41 The Architecture of GEON Semantic Mediator Portal or Application Mediator JDBC Driver GUI SOQL Semantic Query Rewriter SOQL Parser Ontology Reasoner SOQL Processor Spatial SQL against federal schemas SQL Parser OWLODAL Query Execution Query Optimization Query Planning Internal Database OracleDB2MySQL SQL Server PostgreSQL PostGIS ODAL Processor

42 SELECT X.code, X.location.* FROM SeismicStation X, Railroad Y WHERE distance(X.location, Y.geometry) < 1 SELECT X2.stationcode, X2.lat, X2.lon FROM railroads_of_the_united_states X1, stationdatatable X2 WHERE distance(X1.the_geom, MakePoint(X2.lat, X2.lon)) < 1 GEON SOQL GUI SOQL Processor Railroad shapefile Seismic Stations Schema Mediator distance(X1.the_geom, MakePoint(X2.lat, X2.lon)) < 1 SELECT X1.the_geom FROM railroads X1 Question: Finding all seismic stations within 1 mile from railroads SELECT X2.stationcode, X2.lat, X2.lon FROM stationdatatable X2 WHERE bounding box condition

43 Questions?

44 How to Connect to GEON Databases Download GEON JDBC Driver Use the following code to create a connection // load driver Class.forName ("org.geongrid.jdbc.driver.Driver"); // set the mediator URL String url = "jdbc:geon://geon01.sdsc.edu:2532/GEON-63cb404c d9-a69f”; // open the connection Connection conn = DriverManager.getConnection(url, "geonuser", "geongrid"); GEON JDBC protocol The host name and port number of GEON Mediator GEON ID Note: the original account information is invisible to end users