Download presentation
Presentation is loading. Please wait.
Published byAlexia Newton Modified over 9 years ago
1
San Diego Supercomputer Center EDBT'02, Prague 1 EDBT Panel, March 2002, Prague: Scientific Data Integration for Complex Multiple-Worlds Scenarios: Databases Meets Knowledge Representation Bertram Ludäscher Data and Knowledge System San Diego Supercomputer Center U.C. San Diego Bertram Ludäscher Data and Knowledge System San Diego Supercomputer Center U.C. San Diego
2
A Home Buyer’s Information Integration Problem What houses for sale under $500k have at least 2 bathrooms, 2 bedrooms, a nearby school ranking in the upper third, in a neighborhood with below-average crime rate and diverse population? ? Information Integration ? Information Integration Realtor Demographics School Rankings Crime Stats “Simple Multiple-Worlds” Mediation Problem => XML-Based Mediator “Simple Multiple-Worlds” Mediation Problem => XML-Based Mediator
3
A Neuroscientist’s Information Integration Problem What is the cerebellar distribution of rat proteins with more than 70% homology with human NCS-1? Any structure specificity? How about other rodents? ? Information Integration ? Information Integration protein localization (NCMIR) protein localization (NCMIR) neurotransmission (SENSELAB) neurotransmission (SENSELAB) sequence info (CaPROT) sequence info (CaPROT) morphometry (SYNAPSE) morphometry (SYNAPSE) “Complex Multiple-Worlds” Mediation Problem => Model-Based Mediator “Complex Multiple-Worlds” Mediation Problem => Model-Based Mediator
4
A Geoscientist’s Information Integration Problem What is the distribution and U/ Pb zircon ages of A-type plutons in VA? How about their 3-D geometry ? How does it relate to host rock structures? ? Information Integration ? Information Integration Geologic Map (Virginia) Geologic Map (Virginia) GeoChemical GeoPhysical (gravity contours) GeoPhysical (gravity contours) GeoChronologic (Concordia) GeoChronologic (Concordia) Foliation Map (structure DB) Foliation Map (structure DB) “Complex Multiple-Worlds” Mediation “Complex Multiple-Worlds” Mediation
5
San Diego Supercomputer Center EDBT'02, Prague 5 Scientific Data Integration Challenges: Heterogeneities in the 4S’s... System AspectsSystem Aspects –platforms, devices, phys. distribution, transport protocols, access APIs, impedance mismatch, user interfaces, application integration... SyntaxesSyntaxes –heterogeneous data formats (one for each tool...) StructuresStructures –heterogeneous schemas (one for each DB...) –heterogeneous data models (RDBs, ORDBs, OODBs, XMLDBs) SemanticsSemantics –unclear semantics: e.g., incoherent terminology, multiple taxonomies,...
6
San Diego Supercomputer Center EDBT'02, Prague 6 Data Integration: Approaches / Solutions Syntax Structure Semantics System aspects (Data-)Grid / Middleware(Data-)Grid / Middleware –system: distributed data & computing (SDSC SRB, Globus, web services, WSDL) –source = file or DB XML-Based MediatorsXML-Based Mediators –structure: XML queries and views –source = XML-DB Model-Based/Semantic MediatorsModel-Based/Semantic Mediators –semantics: conceptual models and declarative views –source = Knowledge Base (DB+CMs+ICs) Semantic Web FormalismsSemantic Web Formalisms –semantics: ontologies, description logics (RDF(S), DAML+OIL,...) Knowledge/Semantic-GridKnowledge/Semantic-Grid –combination
7
San Diego Supercomputer Center EDBT'02, Prague 7 What’s in a Link? Syntactic JoinsSyntactic Joins – (X,Y) := X.SSN = Y.SSN equality – (X,Y) := X.UMLS-ID = Y.UID “Speciality” Joins“Speciality” Joins – (X,Y,Score) := BLAST(X,Y,Score) similarity Semantic/Rule-Based JoinsSemantic/Rule-Based Joins – (X,Y,C) := X isa C, Y isa C, BLAST(X,Y,S), S>0.8 homology, lub – (X,Y,[produces,B,increased_in]) := X produces B, B increased_in Y. rule-based e.g., X= - secretase, B=beta amyloid, Y=Alzheimer’s disease Challenge:Challenge: –compile semantic joins into efficient syntactic ones X Y
8
XML-Based vs. Model-Based Mediation Raw Data IF THEN Logical Domain Constraints Integrated-CM := CM-QL(Src1-CM,...) Integrated-CM := CM-QL(Src1-CM,...)...... (XML) Objects Conceptual Models XML Elements XML Models C2 C3 C1 R Classes, Relations, Ontologies is-a, has-a,... “Glue” Maps Domain Maps Process Maps Integrated-DTD := XQuery(Src1-DTD,...) Integrated-DTD := XQuery(Src1-DTD,...) No Domain Constraints A = (B*|C),D B =... Structural Constraints (DTDs), Parent, Child, Sibling,... CM ~ {Descr.Logic, ER, UML, RDF/XML(-Schema), …} CM-QL ~ {F-Logic, DAML+OIL, …}
9
NCMIR ANATOM Domain Map: concepts concepts relations relations logic rules logic rules
10
San Diego Supercomputer Center EDBT'02, Prague 10 Semantics-Aware Browsing and Querying Cerebellum Source 1 Source 2 Source 3 Cerebellar Cortex Granule Cell Layer Purkinje Cell layer Molecular Layer has a Purkinje Cell Dendrite Dendritic spines Dendritic shaft Endoplasmic reticulum Purkinje Neuron has a
11
San Diego Supercomputer Center EDBT'02, Prague 11 Domain Map = labeled graph with concepts ("classes") and roles ("associations") additional semantics: expressed as logic rules (F-logic) Domain Map = labeled graph with concepts ("classes") and roles ("associations") additional semantics: expressed as logic rules (F-logic) Domain Map (DM) Purkinje cells and Pyramidal cells have dendrites that have higher-order branches that contain spines. Dendritic spines are ion (calcium) regulating components. Spines have ion binding proteins. Neurotransmission involves ionic activity (release). Ion-binding proteins control ion activity (propagation) in a cell. Ion-regulating components of cells affect ionic activity (release). Domain Expert Knowledge DM in Description Logic Formalizing Glue Knowledge: Domain Map for SYNAPSE and NCMIR
12
San Diego Supercomputer Center EDBT'02, Prague 12 Source Registration/Data Contextualization Source Registration/Data Contextualization Source registers data with an existing ontology, using description logics it may also refine the mediator’s domain map... [ICDE01] sources can register new concepts at the mediator...
13
San Diego Supercomputer Center EDBT'02, Prague 13 Source Registration: Semantic Annotations
14
San Diego Supercomputer Center EDBT'02, Prague 14 Multiple Ways of Querying Data Brain Cerebellum Purkinje Cell Layer Purkinje cell neuron has a is a Spatial Representation (Atlases) Ontologies Transformations
15
San Diego Supercomputer Center EDBT'02, Prague 15 S1 S2 S3 (XML-Wrapper) CM-Wrapper USER/Client USER/Client CM (Integrated View) Mediator Engine FL rule proc. LP rule proc. Graph proc. XSB Engine GCM CM S1 GCM CM S2 GCM CM S3 CM Queries & Results (exchanged in XML) Domain Maps DMs Domain Maps DMs Domain Maps DMs Domain Maps DMs Domain Maps DMs Process Maps PMs “Glue” Maps GMs semantic context CON(S) Integrated View Definition IVD Model-Based Mediator Architecture First Results & Demos: [SSDBM’00] [VLDB’00] [ICDE’01] [HBP’01] [EDBT’02][BNCOD’02] Conceptual Model = Object Model Knowledge Base Contextualization Conceptual Model = Object Model Knowledge Base Contextualization
16
San Diego Supercomputer Center EDBT'02, Prague 16 Model-Based Mediation Methodology... Lift Sources to export CMs:Lift Sources to export CMs: CM(S) = OM(S) + KB(S) + CON(S) Object Model OM(S):Object Model OM(S): –complex objects (frames), class hierarchy, OO constraints Knowledge Base KB(S):Knowledge Base KB(S): –explicit representation of (“hidden”) source semantics –logic rules over OM(S) Contextualization CON(S):Contextualization CON(S): –situate OM(S) data using “glue maps” (GMs): domain maps DMs (ontology) = terminological knowledge: concepts + roles process maps PMs = “procedural knowledge”: states + transitions
17
San Diego Supercomputer Center EDBT'02, Prague 17... Model-Based Mediation Methodology Integrated View Definition (IVD)Integrated View Definition (IVD) –declarative (logic) rules with object-oriented features –defined over CM(S), domain maps, process maps –needs “mediation engineers” = domain + KRDB experts Knowledge-Based Querying and Browsing (runtime):Knowledge-Based Querying and Browsing (runtime): –mediator composes the user query Q with the IVD... rewrites (Q o IVD), sends subqueries to sources... post-processes returned results (e.g., situate in context)
18
San Diego Supercomputer Center EDBT'02, Prague 18 Mediation Scenarios & Techniques Federated Databases XML-Based Mediation Model-Based Mediation One-World One-/Multiple-Worlds Complex Multiple-Worlds Common Schema Mediated Schema Common Glue Maps SQL, rules XML query languages DOOD query languages Schema Transformations Syntax-Aware Mappings Semantics-Aware Mappings Syntactic Joins Syntactic Joins “Semantic” Joins via Glue Maps DB expertDB expert KRDB + domain experts
19
San Diego Supercomputer Center EDBT'02, Prague 19 Some Observations Scientific Data Integration is differentScientific Data Integration is different –e.g., complex and hidden semantics,... Co-Education (CS=>DS, DS=>CS) takes timeCo-Education (CS=>DS, DS=>CS) takes time –NIH BioInformatics Research Network (BIRN) – Neuroscientists –DOE Scientific Data Management Center (SDM) –Starting with Ecologists, Geoscientists,... A good thing about standards:A good thing about standards: There are so many to choose from:There are so many to choose from: –SQL, http, HTML, XML, XQuery, XSLT, XML Schema, RDF(S), DAML+OIL, DAML-S, UMLS, GO, XMI, SOAP, WSDL,... Syntax is overrated (and its impact underestimated?)Syntax is overrated (and its impact underestimated?) –nobody likes LISP any more, but everybody likes XML... 2 nd Marriage of Knowledge Representation & Databases:2 nd Marriage of Knowledge Representation & Databases: –Semantic Web –(child from 1 st marriage: Deductive Databases; aren’t they cute siblings? ;) => model-based/semantic mediators
20
San Diego Supercomputer Center EDBT'02, Prague 20 Internet2 SOAP OIL The Road Ahead: Scientific Data Integration with the Semantic Web !? Data-Grid Scientific Data RDF DOOD rules WSDL XQuery DAML-S RDF DOOD rules WSDL XQuery DAML-S XML RDF XMLDB subsumption DAML Logic description logics RDB inference ORDB ontologies ’ Integrated Data Views Ivory Tower
21
San Diego Supercomputer Center EDBT'02, Prague 21 Some Related References: Mediation of Neuroscience Data Model-Based Mediation with Domain Maps, B. Ludäscher, A. Gupta, M. E. Martone, 17th Intl. Conference on Data Engineering (ICDE), Heidelberg, Germany, IEEE Computer Society, April 2001.Model-Based Mediation with Domain Maps, B. Ludäscher, A. Gupta, M. E. Martone, 17th Intl. Conference on Data Engineering (ICDE), Heidelberg, Germany, IEEE Computer Society, April 2001. Navigating Virtual Information Sources with Know-ME, X. Qian, B. Ludäscher, M. E. Martone, A. Gupta, demonstration track, Intl. Conference on Extending Database Technology (EDBT), Prague, Czech Republic, March 2002.Navigating Virtual Information Sources with Know-ME, X. Qian, B. Ludäscher, M. E. Martone, A. Gupta, demonstration track, Intl. Conference on Extending Database Technology (EDBT), Prague, Czech Republic, March 2002. Model-Based Information Integration in a Neuroscience Mediator System, B. Ludäscher, A. Gupta, M. E. Martone, demonstration track, 26th Intl. Conference on Very Large Databases (VLDB), Cairo, Egypt, September 2000.Model-Based Information Integration in a Neuroscience Mediator System, B. Ludäscher, A. Gupta, M. E. Martone, demonstration track, 26th Intl. Conference on Very Large Databases (VLDB), Cairo, Egypt, September 2000. Knowledge-Based Integration of Neuroscience Data Sources, A. Gupta, B. Ludäscher, M. E. Martone, 12th Intl. Conference on Scientific and Statistical Database Management (SSDBM), Berlin, Germany, IEEE Computer Society, July 2000.Knowledge-Based Integration of Neuroscience Data Sources, A. Gupta, B. Ludäscher, M. E. Martone, 12th Intl. Conference on Scientific and Statistical Database Management (SSDBM), Berlin, Germany, IEEE Computer Society, July 2000. A Cell-Centered Database for Electron Tomographic Data, M. E. Martone, A. Gupta, M. Wong, X. Qian, G. Sosinsky, S. Lamont, B. Ludäscher, and M. H. Ellisman. Journal of Structural Biology, 2002. to appearA Cell-Centered Database for Electron Tomographic Data, M. E. Martone, A. Gupta, M. Wong, X. Qian, G. Sosinsky, S. Lamont, B. Ludäscher, and M. H. Ellisman. Journal of Structural Biology, 2002. to appear
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.