GEON AHM, April 16-18, SDSC C YBERINFRASTRUCTURE FOR THE G EOSCIENCES Towards Semantic Mediation for GEON: Facilitating Scientific Data Integration using Knowledge Representation Bertram Ludäscher Data and Knowledge Systems San Diego Supercomputer Center U.C. San Diego
GEON AHM, April 16-18, SDSC C YBERINFRASTRUCTURE FOR THE G EOSCIENCES Acknowledgements “Smart” Geologic Map Prototype: Kai Lin Data and Knowledge Systems San Diego Supercomputer Center Geo-Knowledge-Engineer: Boyan Brodaric Natural Resources Canada... and many GEONites : Dogan, Krishna,..., State Geologic Surveys, Chaitan, Ilya, Michalis, Ashraf,... (upcoming demo) Geoscientists + Computer Scientists Igneous Geoinformaticists +/- Energy GEON Metamorphism Equation:
GEON AHM, April 16-18, SDSC C YBERINFRASTRUCTURE FOR THE G EOSCIENCES GEON and “Semantic” Data Integration Rocky Mountains Midatlantic Region
GEON AHM, April 16-18, SDSC C YBERINFRASTRUCTURE FOR THE G EOSCIENCES What is Knowledge Representation ? Relating Theory to the World via Formal Models Source: John F. Sowa, Knowledge Representation: Logical, Philosophical, and Computational FoundationsKnowledge Representation: Logical, Philosophical, and Computational Foundations “All models are wrong, but some are useful!”
GEON AHM, April 16-18, SDSC C YBERINFRASTRUCTURE FOR THE G EOSCIENCES What is (an) “Ontology” ??? (... what CS graduate students need to know...) 1. Ontology as a philosophical discipline 2. Ontology as a an informal conceptual system 3. Ontology as a formal semantic account 4. Ontology as a specification of a “conceptualization” 5. Ontology as a representation of a conceptual system via a logical theory 5.1 characterized by specific formal properties 5.2 characterized only by its specific purposes 6. Ontology as the vocabulary used by a logical theory 7. Ontology as a (meta-level) specification of a logical theory [Guarino’95]
GEON AHM, April 16-18, SDSC C YBERINFRASTRUCTURE FOR THE G EOSCIENCES What is an Ontology? (CSE-291 cont’d ;-) Given a logical language L...Given a logical language L... –... a conceptualization is a set of models of L which describes the admittable (intended) interpretations of its non-logical symbols (the vocabulary) –... an ontology is a (possibly incomplete) axiomatization of a conceptualization. conceptualization C(L) ontology set of all models M(L) logictheories [Guarino96]
Problem: Scientific Data Integration... from Questions to Queries... What is the distribution and U/ Pb zircon ages of A-type plutons in VA? How about their 3-D geometry ? How does it relate to host rock structures? ? Information Integration Geologic Map (Virginia) GeoChemical GeoPhysical (gravity contours) GeoChronologic (Concordia) Foliation Map (structure DB) “Complex Multiple-Worlds” Mediation domain knowledge Database mediation Data modeling Knowledge Representation: ontologies, concept spaces raw data
GEON AHM, April 16-18, SDSC C YBERINFRASTRUCTURE FOR THE G EOSCIENCES Got Glue? Which one? What for? XML (common syntax)XML (common syntax) –flexible (semistructured) data model –used at all levels: data / metadata exchange, message exchange (SOAP), schemas & data types (XML Schema), Semantic Web & web ontologies (RDF(S), OWL),... Grid infrastructure (system interoperation)Grid infrastructure (system interoperation) –distributed computing and data management –web services Controlled Vocabularies (“joins”)Controlled Vocabularies (“joins”) –data level: joins across different data sets –but meta-data and ontologies (concept names, relationship names,...) are also data! Integrated View Definitions (mediated views/virtual databases)Integrated View Definitions (mediated views/virtual databases) –declarative specification of “integration logic”: XQuery, Datalog,... Thesauri (translator for retrieving related information)Thesauri (translator for retrieving related information) –synonyms, broader/narrow term, e.g., UMLS (meta-thesaurus, “ontology”) Taxonomies (classification)Taxonomies (classification) –shared vocabulary, concept hierarchy (is-a) Ontologies (classification + additional semantics):Ontologies (classification + additional semantics): –formal specification of a conceptualization, shared meaning –facilitates “smart querying”, semantic mediation
GEON AHM, April 16-18, SDSC C YBERINFRASTRUCTURE FOR THE G EOSCIENCES Information Integration Challenges System aspects: “Grid” Middleware distributed data & computing Web Services, WSDL/SOAP, OGSA, … sources = functions, files, data sets, … Syntax & Structure: (XML-Based) Data Mediators wrapping, restructuring (XML) queries and views sources = (XML) databases Semantics: Model-Based/Semantic Mediators conceptual models and declarative views Knowledge Representation: ontologies, description logics (RDF(S),OWL...) sources = knowledge bases (DB+CMs+ICs) Syntax Structure Semantics System aspects reconciling S 4 heterogeneities “gluing” together multiple data sources bridging information and knowledge gaps computationally
GEON AHM, April 16-18, SDSC C YBERINFRASTRUCTURE FOR THE G EOSCIENCES Standard (XML-Based) Mediator Architecture MEDIATOR (XML) Queries & Results S1S1 Wrapper (XML) View S2S2 Wrapper (XML) View SkSk Wrapper (XML) View Integrated Global (XML) View G Integrated View Definition G(..) S 1 (..)…S k (..) USER/Client USER/Client Query Q ( G (S 1,..., S k ) ) Query Q ( G (S 1,..., S k ) ) wrappers implemented as web services
XML-Based vs. Semantic Mediation Raw Data IF THEN Semantics, Constraints in Logic Integrated-CM := CM-QL(Src1-CM,...) Integrated-CM := CM-QL(Src1-CM,...) (XML) Objects Conceptual Models XML Elements XML Models C2 C3 C1 R Classes, Relations, is-a, has-a,... “Glue Maps” ontologies, concept spaces Integrated-DTD := XQuery(Src1-DTD,...) Integrated-DTD := XQuery(Src1-DTD,...) No Semantics / Domain Constraints A = (B*|C),D B =... Structural Constraints (DTDs), Parent, Child, Sibling,... CM ~ {Descr.Logic, ER, UML, RDF(S), …} CM-QL ~ {F-Logic, …} , ,2,140,29,Tertiary,Trc,CHINLE FORMATION,59,57
GEON AHM, April 16-18, SDSC C YBERINFRASTRUCTURE FOR THE G EOSCIENCES GEON Framework for Interoperability in the Geosciences Systems level: GEON Grid...Systems level: GEON Grid... –enable sharing of data and tools via grid services –based on Open Grid Services Architecture (OGSA) –acquisition of cluster endpoints and initial deployment at some sites underway, including SDSC, UTEP, VT,..., Syntactic and schema level: Data integration via (meta)data standards (often XML-based)Syntactic and schema level: Data integration via (meta)data standards (often XML-based) –database mediators create integrated virtual databases => dynamic creation and automatic update of data-warehouses Semantic level: data integration via “semantic” mediationSemantic level: data integration via “semantic” mediation –Situating 4-D data in context spatio-temporal, thematic, process contexts can be represented as “concept spaces” –specifically: use of ontologies, and logic-based knowledge representation –development guided/driven by specific scientific data integration problems
GEON AHM, April 16-18, SDSC C YBERINFRASTRUCTURE FOR THE G EOSCIENCES Towards Shared Conceptualizations: High-level Domain Ontology & Standard Data Model Source: NADAM Team (Boyan Brodaric et al.) Adoption of a standard (meta)data model => wrap data sets into unified virtual views
GEON AHM, April 16-18, SDSC C YBERINFRASTRUCTURE FOR THE G EOSCIENCES Towards Shared Conceptualizations: Data Contextualization via Concept Spaces
GEON AHM, April 16-18, SDSC C YBERINFRASTRUCTURE FOR THE G EOSCIENCES Towards Knowledge Sharing: Rock-type “Ontology” Composition Genesis Fabric Texture
GEON AHM, April 16-18, SDSC C YBERINFRASTRUCTURE FOR THE G EOSCIENCES Biomedical Informatics Research Network Biomedical Informatics Research Network Getting Formal: Source Contextualization & Ontology Refinement in Logic
GEON AHM, April 16-18, SDSC C YBERINFRASTRUCTURE FOR THE G EOSCIENCES Show formations where AGE = ‘Paleozic’ (without age ontology) Show formations where AGE = ‘Paleozic’ (without age ontology) Show formations where AGE = ‘Paleozic’ (with age ontology) Show formations where AGE = ‘Paleozic’ (with age ontology) domain knowledge domain knowledge Knowledge representation AGE ONTOLOGY Nevada
GEON AHM, April 16-18, SDSC C YBERINFRASTRUCTURE FOR THE G EOSCIENCES Querying with Multiple Classifications/Ontologies: Age, Composition, Texture, Fabric, Genesis
GEON AHM, April 16-18, SDSC C YBERINFRASTRUCTURE FOR THE G EOSCIENCES What to do with the “KR Glue”? Conceptual-level information, concept spaces, ontologies, and other KR techniques for...Conceptual-level information, concept spaces, ontologies, and other KR techniques for... –... smart data discovery –... browsing and querying by themes, disciplines,... –... defining virtual/mediated databases at conceptual level –... support “plugging together” of “data and information experiments” into Scientific Workflows (a.k.a. Analytical Pipelines in the SEEK ITR) –... smarter user interfaces is “find felsic sedimentary rocks” a meaningful (satisfiable) query? –...
GEON AHM, April 16-18, SDSC C YBERINFRASTRUCTURE FOR THE G EOSCIENCES Some enabling operations on “ontology data” Composition Concept expansion: what else to look for when asking for ‘Mafic’ what else to look for when asking for ‘Mafic’
GEON AHM, April 16-18, SDSC C YBERINFRASTRUCTURE FOR THE G EOSCIENCES Some enabling operations on “ontology data” Composition Generalization: finding data that is “like” X and Y finding data that is “like” X and Y
GEON AHM, April 16-18, SDSC C YBERINFRASTRUCTURE FOR THE G EOSCIENCES Towards Knowledge Sharing: Rock-type Ontology Composition Genesis Fabric Texture
GEON AHM, April 16-18, SDSC C YBERINFRASTRUCTURE FOR THE G EOSCIENCES DEMO... do NOT click this...
GEON AHM, April 16-18, SDSC C YBERINFRASTRUCTURE FOR THE G EOSCIENCES Architecture of Integrated Geologic Map Prototype System HTTP Server (Java Server Page) MapServer (Minnesota) Mediator (Java application) Database (Arizona) Database (Montana) Map Definition local layer remote layer local layer Global Ontology Definitions Rock classification Geologic age requestresponse
GEON AHM, April 16-18, SDSC C YBERINFRASTRUCTURE FOR THE G EOSCIENCES Data Source Wrapping and Integration Arizona Colorado Utah Nevada Wyoming New Mexico Montana East Idaho Montana West Formation… Age… Formation…Age… Formation…Age… Formation…Age… Formation…Age… Formation…Age… Formation…Age… …Formation…Age …Composition …Fabric …Texture …Formation…Age …Composition …Fabric …Texture ABBREV PERIOD NAME PERIOD TYPE TIME_UNIT FMATN PERIOD NAME PERIOD NAME FORMATION PERIOD FORMATION LITHOLOGY AGE andesitic sandstone Livingston formation Tertiary- Cretaceous
GEON AHM, April 16-18, SDSC C YBERINFRASTRUCTURE FOR THE G EOSCIENCES Ontology-Enabled Query Processing User: “Show formations from Cenozoic!” Query Rewriting QuaternaryTertiary Cenozoic Age Ontology Arizona Montana West TertiaryTkgm QuaternaryQ ……… QgQuaternary………TwpTertiary……… TwlTertiary……… PERIOD FORMATIONLITHOLOGYTkgmQ Qg Twp Twl … PERIOD Color Definition Map Rendering select FORMATION where AGE=“Tertiary” or AGE=“Quaternary” ABBREV
GEON AHM, April 16-18, SDSC C YBERINFRASTRUCTURE FOR THE G EOSCIENCES Integration Challenges MANY!MANY! non-available or non- interoperable datanon-available or non- interoperable data “Dirty data”, no controlled vocabularies“Dirty data”, no controlled vocabularies Many different controlled vocabularies! (“clean data”)Many different controlled vocabularies! (“clean data”) What is entailed by a vocabulary?What is entailed by a vocabulary? Formal Ontologies Extensible Ontologies
GEON AHM, April 16-18, SDSC C YBERINFRASTRUCTURE FOR THE G EOSCIENCES What’s next? YOU!YOU! GEON-SCI:GEON-SCI: –Science questions waiting to be turned into queries! GEON-KR Working Group activitiesGEON-KR Working Group activities –guided (if not driven by) geoscientists –marry KR technologies to standards (W3C, Semantic Web: RDF, OWL,...) –collect GEON-able KR resources (data models, controlled vocabularies, ontologies,...) GEON-DEV:GEON-DEV: –Generalize and merge current KR/semantic mediation architecture with standard Grid architecture –building systems