CYBERINFRASTRUCTURE FOR THE GEOSCIENCES1 www.geongrid.org Towards a Generic Framework for Semantic Data Registration and Integration in Geosciences Kai.

Slides:



Advertisements
Similar presentations
Three-Step Database Design
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Michael Pizzo Software Architect Data Programmability Microsoft Corporation.
©Silberschatz, Korth and Sudarshan1.1Database System Concepts Chapter 1: Introduction Purpose of Database Systems View of Data Data Models Data Definition.
Integrated support for data integration and science portals Amarnath Gupta University of California San Diego.
--What is a Database--1 What is a database What is a Database.
GIS in GEON Cyberinfrastructure Presented by Ashraf Memon Presented by Ashraf Memon.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
1 C. Shahabi Application Programming for Relational Databases Cyrus Shahabi Computer Science Department University of Southern California
Getting Started (Excerpts) Chapter One DAVID M. KROENKE’S DATABASE CONCEPTS, 2 nd Edition.
©Silberschatz, Korth and Sudarshan1.1Database System Concepts Chapter 1: Introduction Purpose of Database Systems View of Data Data Models Data Definition.
1 Design patterns Lecture 4. 2 Three Important skills Understanding OO methodology Mastering Java language constructs Recognizing common problems and.
Attribute databases. GIS Definition Diagram Output Query Results.
Abstract Data Types and Encapsulation Concepts
Dr. Kalpakis CMSC 461, Database Management Systems Introduction.
RIZWAN REHMAN, CCS, DU. Advantages of ORDBMSs  The main advantages of extending the relational data model come from reuse and sharing.  Reuse comes.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES WMS Map Integration - Improved Ghulam Memon Ashraf Memon.
CVSQL 2 The Design. System Overview System Components CVSQL Server –Three network interfaces –Modular data source provider framework –Decoupled SQL parsing.
Objectives of the Lecture :
January, 23, 2006 Ilkay Altintas
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 1: Introduction.
Databases and LINQ Visual Basic 2010 How to Program 1.
ADVANCED DATABASES WITH ORACLE 11g FOR ADDB7311 LEARNING UNIT 1 of 7.
Database Programming in Java Corresponds with Chapter 32, 33.
GEON Workshop, Auckland, Nov 26-27, 2007 Introduction to GEON and iGEON Chaitan Baru.
1 Distributed Database Concepts 8:30-10:00AM Thursday, July 21 st 2005 CSIG05 Chaitan Baru.
CIS 270—Application Development II Chapter 25—Accessing Databases with JDBC.
CST203-2 Database Management Systems Lecture 2. One Tier Architecture Eg: In this scenario, a workgroup database is stored in a shared location on a single.
Database System Concepts and Architecture Lecture # 2 21 June 2012 National University of Computer and Emerging Sciences.
Data R&D Issues for GTL Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego Bertram Ludäscher
2. Database System Concepts and Architecture
Pipelines and Scientific Workflows with Ptolemy II Deana Pennington University of New Mexico LTER Network Office Shawn Bowers UCSD San Diego Supercomputer.
The τ - Synopses System Yossi Matias Leon Portman Tel Aviv University.
Next-generation databases Active databases: when a particular event occurs and given conditions are satisfied then some actions are executed. An active.
1 Ontology Enabled Data Discovery and Integration Kai Lin San Diego Supercomputer Center University of California, San Diego A. K. Sinha, Z. Malik, A.
1 CS 430 Database Theory Winter 2005 Lecture 2: General Concepts.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Database Environment Chapter 2. Data Independence Sometimes the way data are physically organized depends on the requirements of the application. Result:
Kepler includes contributors from GEON, SEEK, SDM Center and Ptolemy II, supported by NSF ITRs (SEEK), EAR (GEON), DOE DE-FC02-01ER25486.
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
DATABASE CONNECTIVITY TO MYSQL. Introduction =>A real life application needs to manipulate data stored in a Database. =>A database is a collection of.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
NeuroLOG ANR-06-TLOG-024 Software technologies for integration of process and data in medical imaging A transitional.
A Cyberinfrastructure Framework for Discovery, Integration, and Analysis of Earth Science Data A Prototype System A. K. Sinha, Z. Malik, A. Rezgui, A.
Mr.Prasad Sawant, MIT Pune India Introduction to DBMS.
Session 1 Module 1: Introduction to Data Integrity
Basics of JDBC Session 14.
Glossary WMS – OGC Web Mapping Services WFS – OGC Web Feature Services XML- Extensible Markup Language OGC – Open GIS Consortium ADN –
Semantic Data Extraction for B2B Integration Syntactic-to-Semantic Middleware Bruno Silva 1, Jorge Cardoso 2 1 2
Ewa Deelman, Virtual Metadata Catalogs: Augmenting Existing Metadata Catalogs with Semantic Representations Yolanda Gil, Varun Ratnakar,
ASET 1 Amity School of Engineering & Technology B. Tech. (CSE/IT), III Semester Database Management Systems Jitendra Rajpurohit.
2) Database System Concepts and Architecture. Slide 2- 2 Outline Data Models and Their Categories Schemas, Instances, and States Three-Schema Architecture.
Database System Concepts Introduction Purpose of Database Systems View of Data Data Models Data Definition Language Data Manipulation Language Transaction.
Of 24 lecture 11: ontology – mediation, merging & aligning.
1 The Relational Data Model David J. Stucki. Relational Model Concepts 2 Fundamental concept: the relation  The Relational Model represents an entire.
ISC321 Database Systems I Chapter 2: Overview of Database Languages and Architectures Fall 2015 Dr. Abdullah Almutairi.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 1: Introduction.
1 Section 1 - Introduction to SQL u SQL is an abbreviation for Structured Query Language. u It is generally pronounced “Sequel” u SQL is a unified language.
ODP V2 Data Provider overview. 22 Scope Data Provider provides access to data and metadata of the local data systems. Data Provider is a wrapper, installed.
1 Ontology Enabled Data Integration Kai Lin San Diego Supercomputer Center University of California, San Diego.
1 Design and Implementation of EarthScope Data Portal Chaitan Baru, Kai Lin San Diego Supercomputer Center.
Chapter 1: Introduction
JDBC.
Lifting Data Portals to the Web of Data
CS 174: Server-Side Web Programming February 12 Class Meeting
Lec 3: Object-Oriented Data Modeling
CS3220 Web and Internet Programming SQL and MySQL
CS3220 Web and Internet Programming SQL and MySQL
Presentation transcript:

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES1 Towards a Generic Framework for Semantic Data Registration and Integration in Geosciences Kai Lin, Chaitan Baru San Diego Supercomputer Center University of California, San Diego

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES2 Data Integration Goal Query heterogeneous data sources as a single resource Query heterogeneous data sources as a single resource – Query: not write a program (“ad hoc, non-procedural query languages”) – Heterogeneous: local resource controls definition of the data – Single resource: remove the burden of individually accessing each data source

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES3 Data Integration Challenges: Heterogeneities Syntactical Heterogeneity Syntactical Heterogeneity heterogeneous data format heterogeneous data format e.g vs. 02/04/04 Structural Heterogeneity Structural Heterogeneity heterogeneous data models and schemas e.g is saved as three columns or one columns Semantics Heterogeneity Semantics Heterogeneity fuzzy metadata, terminology, “hidden” semantics, implicit assumptions GEON Solution: data should be semantically registered to GEON first heterogeneities are resolved by registration

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES4 Levels of Registration Metadata-level registration Metadata-level registration – Register metadata associated with a resource –  submit required metadata. Predefined semantics. “Item” level registration “Item” level registration – Register the “schema” of a resources, e.g. relational database, shapefiles, … – Record semantics of schema elements, e.g. table name, column name “Item-Detail” level registration “Item-Detail” level registration – Register individual values in a dataset – Record semantics of each item in a record/column

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES5 Registering Structured Data Relational databases Relational databases Shapefiles  database tables Shapefiles  database tables Excel spreadsheets  database tables Excel spreadsheets  database tables Delimited ASCII files  database tables Delimited ASCII files  database tables Headers of scientific data files, e.g. netCDF Headers of scientific data files, e.g. netCDF

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES6 Item Level Database Registration and Access Table View Original Database Table Def View Def Published Database select tables and views to register GEON Mediator GEON JDBC Driver Application

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES7 How to Connect to GEON Databases Download GEON JDBC Driver Use the following code to create a connection // load driver Class.forName ("org.geongrid.jdbc.driver.Driver"); // set the mediator URL String url = "jdbc:geon://geon01.sdsc.edu:2532/GEON-63cb404c d9-a69f”; // open the connection Connection conn = DriverManager.getConnection(url, "geonuser", "geongrid"); GEON JDBC protocol The host name and port number of GEON Mediator GEON ID Note: the original account information is not accessbile by end users

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES8 GEON Mediator Enables Write Protection Mediator Database UPDATE B Only accepts SELECT statements Rejects any requests other than SELECT A B C B

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES9 Read Protection for Unregistered Tables and Views Mediator Database SELECT * FROM A An unregistered table or view is invisible to an end user The data in the table can’t be viewed by SELECT statement The schema can’t be fetched A B C B

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES10 GEON Database Integration GEON Mediator supports integration at three levels Level 1: Federation-Based Integration End users need to be knowledgeable about each database Level 2: View-Based Integration End users see “integrated views”. An intermediary designs these views. Level 3: Ontology-Based Integration End users can query using familiar concepts Requires middleware and formal representation of domain knowledge

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES11 Level 1: Federation-Based Integration C AB G D F E C AB D GF E GEON Mediator backend SELECT * FROM A, E WHERE …… Use SQL to query the federated database Structural and semantic heterogeneity should be solved by users themselves

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES12 Level 2: View-Based Integration C AB G D F E C AB D GF E GEON Mediator backend SELECT * FROM V, W WHERE …… Allow defining views on top of the federated databases Allow hiding the original backend schemas Integration results can be shared and reused VW

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES13 Level 3: Ontology-Based Integration Requires ontology annotations for backend databases Use simple ontology query language to query the integrated database End users do not need to know the backend schemas and local semantics C AB G D F E C AB D GF E GEON Mediator backend Ontology Based Query

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES14 GEON Ontology Based Data Integration Ontology Enabled Semantic Integration Ontology Enabled Semantic Integration Challenges for Computer Scientists and Domain Scientists Challenges for Computer Scientists and Domain Scientists – Computer Scientists: build an integration system based on the ontological registration of datasets – Domain Scientists: create domain ontologies – Data Providers: register datasets to ontologies Ontology1 Ontology2 ontology3 dataset1dataset2dataset3 dataset4

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES15 Ontological Data Registration for Data integration Registering a dataset to an ontology for data integration is a procedure to generate a partial model of the ontology from the dataset itself Registering a dataset to an ontology for data integration is a procedure to generate a partial model of the ontology from the dataset itself From registration dataset individualsontology p Not all the constraints in the ontology are satisfied by the generated individuals

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES16 Associate one or more columns under an optional SQL condition to a selected class in the ontology Associate one or more columns under an optional SQL condition to a selected class in the ontology Provide a mapping method if no explicit names of individuals should be generated Provide a mapping method if no explicit names of individuals should be generated Registering Relational Tables to Ontology Classes ……Latitude……Longitude…… ………………………… Location (23.5, 47.9) is the name of an individual of the class Location Same name indicates the same location RockSample RockSample GeologicAge GeologicAge …… …… Jurassic/Triassic Jurassic/Triassic Precambrian Precambrian ………… ………… GeologicalAge PrecambrianCenozoicPaleozoic

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES17 Registering Relational Tables to Ontology Object Properties Associate two entities which are already registered to the domain class and the range class of a selected object property in the ontology Associate two entities which are already registered to the domain class and the range class of a selected object property in the ontology ……RockSampleID……PERIOD…… ………………………… Rock GeologicAge hasAge

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES18 Register item/item-detail to Ontology ODAL (Ontological Database Annotation Language) User query SOQL (Simple Ontology Query Language) ODAL and SOQL

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES19 ODAL (Ontological Database Annotation Language) <odal:NamedIndividuals odal:id="RockSample" odal:database="VTDatabase"> Samples RockTexture RockGeoChemistry ModalData MineralChemistry Images ssID GUI generate to ODAL processor The values in the column ssID of the table Samples, RockTexture, RockGeoChemistry, ModalData,MineralChemistry and Images represent instances of RockSample Create a partial model of ontologies from databases Independent of end interface Independent of specific database implementations The ODAL mapping is itself a “first-class” object

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES20 ODAL: Import Ontologies The Ontologies used for annotating a database can be imported as follows: <odal:ODAL xmlns:rdf = “ xmlns:owl=" xmlns:odal = “ > ……

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES21 ODAL: Database Connection Declaration The target databases for making annotation is declared as follows: <odal:ODAL xmlns:rdf = “ xmlns:owl=" xmlns:odal = “ > …… Oracle oracle.sdsc.edu 3456 Publications ……

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES22 ODAL: Simple Named Individuals <odal:NamedIndividuals odal:id="BookInTableBookPrice" odal:database="PublicationDatabase" > odal:database="PublicationDatabase" > Collections Collections book-price book-price ISBN ISBN </odal:NamedIndividuals> Suppose the Book ontology contains a class Book and the schema Collection contains a table Book-Price with a column ISBN. odal:id gives a name to the declaration, and represents the set of the individuals generated by the statement. The statement says that each value in the column ISBN represents a book individual.

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES23 ODAL: Named Individuals from Multiple Columns California California Rock-Sample Rock-Sample Latitude Latitude Longitude Longitude </odal:NamedIndividuals> Suppose an ontology contains a class Location and a database table Rock-Sample with two columns Latitude and Longitude. The statement says that a pair of latitude and longitude gives a location

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES24 ODAL: Named Individuals with Conditions employee EmployeeId ]] employee EmployeeId ]] A condition in an odal:Condition element should be a boolean expression which is valid to be used in any WHERE clauses of SQL queries

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES25 ODAL: Data Type Property Declaration Person ssn person …8… … …age…SSN… Person double hasAge

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES26 To join data across independent resources we need we need to know the correspondence between entities. To join data across independent resources we need we need to know the correspondence between entities. For example, does “10001” represent the same rock in the two resources. By default, we assume they are not. For example, does “10001” represent the same rock in the two resources. By default, we assume they are not. A set of datatype properties can be declared as a key for a class in the ontology. We do join cross multiple resources based on keys. A set of datatype properties can be declared as a key for a class in the ontology. We do join cross multiple resources based on keys. e.g. { hasLatitude, hasLongitude} can be declared as a key of Location e.g. { hasLatitude, hasLongitude} can be declared as a key of Location Two locations from different resources are same if they have the same Two locations from different resources are same if they have the same latitude and longitude latitude and longitude Conditions for Joining Individuals from Different Resources Rock RockSampleID RockSampleID …... …... RockID RockID …… ……

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES27 SOQL (Simple Ontology Query Language) Query single or integrated resources via ontologies (i.e., high level logical views) independent of schema-level representation RockSampleLocation ValueWithUnit float location hasSiO2 value latlong unit string SELECT X.location.*; FROM RockSample X WHERE X.location.lat > 60 AND X.location.long > 100 AND X.hasSiO2.value < 30 AND X.hasSiO2.unit =‘weightPercetage’ GUI generate to SOQL processor

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES28 The Architecture of GEON Semantic Mediator Portal or Application Mediator JDBC Driver GUI SOQL Semantic Query Rewriter SOQL Parser Ontology Reasoner SOQL Processor Spatial SQL against federal schemas SQL Parser OWLODAL Query Execution Query Optimization Query Planning Internal Database OracleDB2MySQL SQL Server PostgreSQL PostGIS ODAL Processor

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES29 SELECT X.code, X.location.* FROM SeismicStation X, Railroad Y WHERE distance(X.location, Y.geometry) < 1 SELECT X2.stationcode, X2.lat, X2.lon FROM railroads_of_the_united_states X1, stationdatatable X2 WHERE distance(X1.the_geom, MakePoint(X2.lat, X2.lon)) < 1 GEON SOQL GUI SOQL Processor Railroad shapefile Seismic Stations Schema Mediator distance(X1.the_geom, MakePoint(X2.lat, X2.lon)) < 1 SELECT X1.the_geom FROM railroads X1 Question: Finding all seismic stations within 1 mile from railroads SELECT X2.stationcode, X2.lat, X2.lon FROM stationdatatable X2 WHERE bounding box condition