Download presentation
Presentation is loading. Please wait.
Published byEdwin Russell Modified over 9 years ago
1
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES1 www.geongrid.org Towards a Generic Framework for Semantic Data Registration and Integration in Geosciences Kai Lin, Chaitan Baru San Diego Supercomputer Center University of California, San Diego
2
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES2 www.geongrid.org Data Integration Goal Query heterogeneous data sources as a single resource Query heterogeneous data sources as a single resource – Query: not write a program (“ad hoc, non-procedural query languages”) – Heterogeneous: local resource controls definition of the data – Single resource: remove the burden of individually accessing each data source
3
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES3 www.geongrid.org Data Integration Challenges: Heterogeneities Syntactical Heterogeneity Syntactical Heterogeneity heterogeneous data format heterogeneous data format e.g. 02-04-2004 vs. 02/04/04 Structural Heterogeneity Structural Heterogeneity heterogeneous data models and schemas e.g. 02-04-2004 is saved as three columns or one columns Semantics Heterogeneity Semantics Heterogeneity fuzzy metadata, terminology, “hidden” semantics, implicit assumptions GEON Solution: data should be semantically registered to GEON first heterogeneities are resolved by registration
4
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES4 www.geongrid.org Levels of Registration Metadata-level registration Metadata-level registration – Register metadata associated with a resource – submit required metadata. Predefined semantics. “Item” level registration “Item” level registration – Register the “schema” of a resources, e.g. relational database, shapefiles, … – Record semantics of schema elements, e.g. table name, column name “Item-Detail” level registration “Item-Detail” level registration – Register individual values in a dataset – Record semantics of each item in a record/column
5
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES5 www.geongrid.org Registering Structured Data Relational databases Relational databases Shapefiles database tables Shapefiles database tables Excel spreadsheets database tables Excel spreadsheets database tables Delimited ASCII files database tables Delimited ASCII files database tables Headers of scientific data files, e.g. netCDF Headers of scientific data files, e.g. netCDF
6
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES6 www.geongrid.org Item Level Database Registration and Access Table View Original Database Table Def View Def Published Database select tables and views to register GEON Mediator GEON JDBC Driver Application
7
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES7 www.geongrid.org How to Connect to GEON Databases Download GEON JDBC Driver Use the following code to create a connection // load driver Class.forName ("org.geongrid.jdbc.driver.Driver"); // set the mediator URL String url = "jdbc:geon://geon01.sdsc.edu:2532/GEON-63cb404c-6038-11d9-a69f”; // open the connection Connection conn = DriverManager.getConnection(url, "geonuser", "geongrid"); GEON JDBC protocol The host name and port number of GEON Mediator GEON ID Note: the original account information is not accessbile by end users
8
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES8 www.geongrid.org GEON Mediator Enables Write Protection Mediator Database UPDATE B Only accepts SELECT statements Rejects any requests other than SELECT A B C B
9
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES9 www.geongrid.org Read Protection for Unregistered Tables and Views Mediator Database SELECT * FROM A An unregistered table or view is invisible to an end user The data in the table can’t be viewed by SELECT statement The schema can’t be fetched A B C B
10
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES10 www.geongrid.org GEON Database Integration GEON Mediator supports integration at three levels Level 1: Federation-Based Integration End users need to be knowledgeable about each database Level 2: View-Based Integration End users see “integrated views”. An intermediary designs these views. Level 3: Ontology-Based Integration End users can query using familiar concepts Requires middleware and formal representation of domain knowledge
11
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES11 www.geongrid.org Level 1: Federation-Based Integration C AB G D F E C AB D GF E GEON Mediator backend SELECT * FROM A, E WHERE …… Use SQL to query the federated database Structural and semantic heterogeneity should be solved by users themselves
12
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES12 www.geongrid.org Level 2: View-Based Integration C AB G D F E C AB D GF E GEON Mediator backend SELECT * FROM V, W WHERE …… Allow defining views on top of the federated databases Allow hiding the original backend schemas Integration results can be shared and reused VW
13
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES13 www.geongrid.org Level 3: Ontology-Based Integration Requires ontology annotations for backend databases Use simple ontology query language to query the integrated database End users do not need to know the backend schemas and local semantics C AB G D F E C AB D GF E GEON Mediator backend Ontology Based Query
14
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES14 www.geongrid.org GEON Ontology Based Data Integration Ontology Enabled Semantic Integration Ontology Enabled Semantic Integration Challenges for Computer Scientists and Domain Scientists Challenges for Computer Scientists and Domain Scientists – Computer Scientists: build an integration system based on the ontological registration of datasets – Domain Scientists: create domain ontologies – Data Providers: register datasets to ontologies Ontology1 Ontology2 ontology3 dataset1dataset2dataset3 dataset4
15
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES15 www.geongrid.org Ontological Data Registration for Data integration Registering a dataset to an ontology for data integration is a procedure to generate a partial model of the ontology from the dataset itself Registering a dataset to an ontology for data integration is a procedure to generate a partial model of the ontology from the dataset itself From registration dataset individualsontology p Not all the constraints in the ontology are satisfied by the generated individuals
16
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES16 www.geongrid.org Associate one or more columns under an optional SQL condition to a selected class in the ontology Associate one or more columns under an optional SQL condition to a selected class in the ontology Provide a mapping method if no explicit names of individuals should be generated Provide a mapping method if no explicit names of individuals should be generated Registering Relational Tables to Ontology Classes ……Latitude……Longitude…… 23.547.9 ………………………… Location (23.5, 47.9) is the name of an individual of the class Location Same name indicates the same location RockSample RockSample GeologicAge GeologicAge …… …… Jurassic/Triassic Jurassic/Triassic Precambrian Precambrian ………… ………… GeologicalAge PrecambrianCenozoicPaleozoic
17
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES17 www.geongrid.org Registering Relational Tables to Ontology Object Properties Associate two entities which are already registered to the domain class and the range class of a selected object property in the ontology Associate two entities which are already registered to the domain class and the range class of a selected object property in the ontology ……RockSampleID……PERIOD…… ………………………… Rock GeologicAge hasAge
18
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES18 www.geongrid.org Register item/item-detail to Ontology ODAL (Ontological Database Annotation Language) User query SOQL (Simple Ontology Query Language) ODAL and SOQL
19
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES19 www.geongrid.org ODAL (Ontological Database Annotation Language) <odal:NamedIndividuals odal:id="RockSample" odal:database="VTDatabase"> Samples RockTexture RockGeoChemistry ModalData MineralChemistry Images ssID GUI generate to ODAL processor The values in the column ssID of the table Samples, RockTexture, RockGeoChemistry, ModalData,MineralChemistry and Images represent instances of RockSample Create a partial model of ontologies from databases Independent of end interface Independent of specific database implementations The ODAL mapping is itself a “first-class” object
20
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES20 www.geongrid.org ODAL: Import Ontologies The Ontologies used for annotating a database can be imported as follows: <odal:ODAL xmlns:rdf = “http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:odal = “http://www.sdsc.edu/odal#” > ……
21
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES21 www.geongrid.org ODAL: Database Connection Declaration The target databases for making annotation is declared as follows: <odal:ODAL xmlns:rdf = “http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:odal = “http://www.sdsc.edu/odal#” > …… Oracle 9.1.21 oracle.sdsc.edu 3456 Publications ……
22
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES22 www.geongrid.org ODAL: Simple Named Individuals <odal:NamedIndividuals odal:id="BookInTableBookPrice" odal:database="PublicationDatabase" > odal:database="PublicationDatabase" > Collections Collections book-price book-price ISBN ISBN </odal:NamedIndividuals> Suppose the Book ontology contains a class Book and the schema Collection contains a table Book-Price with a column ISBN. odal:id gives a name to the declaration, and represents the set of the individuals generated by the statement. The statement says that each value in the column ISBN represents a book individual.
23
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES23 www.geongrid.org ODAL: Named Individuals from Multiple Columns California California Rock-Sample Rock-Sample Latitude Latitude Longitude Longitude </odal:NamedIndividuals> Suppose an ontology contains a class Location and a database table Rock-Sample with two columns Latitude and Longitude. The statement says that a pair of latitude and longitude gives a location
24
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES24 www.geongrid.org ODAL: Named Individuals with Conditions employee EmployeeId ]] employee EmployeeId ]] A condition in an odal:Condition element should be a boolean expression which is valid to be used in any WHERE clauses of SQL queries
25
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES25 www.geongrid.org ODAL: Data Type Property Declaration Person ssn person …8…1234-56-7890… …age…SSN… Person double hasAge
26
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES26 www.geongrid.org To join data across independent resources we need we need to know the correspondence between entities. To join data across independent resources we need we need to know the correspondence between entities. For example, does “10001” represent the same rock in the two resources. By default, we assume they are not. For example, does “10001” represent the same rock in the two resources. By default, we assume they are not. A set of datatype properties can be declared as a key for a class in the ontology. We do join cross multiple resources based on keys. A set of datatype properties can be declared as a key for a class in the ontology. We do join cross multiple resources based on keys. e.g. { hasLatitude, hasLongitude} can be declared as a key of Location e.g. { hasLatitude, hasLongitude} can be declared as a key of Location Two locations from different resources are same if they have the same Two locations from different resources are same if they have the same latitude and longitude latitude and longitude Conditions for Joining Individuals from Different Resources Rock RockSampleID RockSampleID 10001 10001 …... …... RockID RockID 10001 10001 …… ……
27
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES27 www.geongrid.org SOQL (Simple Ontology Query Language) Query single or integrated resources via ontologies (i.e., high level logical views) independent of schema-level representation RockSampleLocation ValueWithUnit float location hasSiO2 value latlong unit string SELECT X.location.*; FROM RockSample X WHERE X.location.lat > 60 AND X.location.long > 100 AND X.hasSiO2.value < 30 AND X.hasSiO2.unit =‘weightPercetage’ GUI generate to SOQL processor
28
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES28 www.geongrid.org The Architecture of GEON Semantic Mediator Portal or Application Mediator JDBC Driver GUI SOQL Semantic Query Rewriter SOQL Parser Ontology Reasoner SOQL Processor Spatial SQL against federal schemas SQL Parser OWLODAL Query Execution Query Optimization Query Planning Internal Database OracleDB2MySQL SQL Server PostgreSQL PostGIS ODAL Processor
29
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES29 www.geongrid.org SELECT X.code, X.location.* FROM SeismicStation X, Railroad Y WHERE distance(X.location, Y.geometry) < 1 SELECT X2.stationcode, X2.lat, X2.lon FROM railroads_of_the_united_states X1, stationdatatable X2 WHERE distance(X1.the_geom, MakePoint(X2.lat, X2.lon)) < 1 GEON SOQL GUI SOQL Processor Railroad shapefile Seismic Stations Schema Mediator distance(X1.the_geom, MakePoint(X2.lat, X2.lon)) < 1 SELECT X1.the_geom FROM railroads X1 Question: Finding all seismic stations within 1 mile from railroads SELECT X2.stationcode, X2.lat, X2.lon FROM stationdatatable X2 WHERE bounding box condition
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.