Download presentation
Presentation is loading. Please wait.
Published byPhillip Shelton Modified over 9 years ago
1
Scientific Data Integration with Model-Based Mediation : Databases Meets * Knowledge Representation Bertram Ludäscher Bertram LudäscherLUDAESCH@SDSC.EDU Knowledge-Based Integration Lab Data and Knowledge Systems San Diego Supercomputer Center U.C. San Diego * or rather rediscovers
2
Integration Example from the Database Community User: “Where can I get the cheapest copy (including shipping cost) of Wittgenstein’s Tractatus Logicus-Philosophicus within a week?” ? Information Integration ? Information Integration addall.com Mediator “One-World” Mediation “One-World” Mediation amazon.com A1books.com half.com barnes&noble.com
3
Another Well-Known Data Integration Example What houses for sale under $500k have at least 2 bathrooms, 2 bedrooms, a nearby school ranking in the upper third, in a neighborhood with below-average crime rate and diverse population? ? Information Integration ? Information Integration Realtor Demographics School Rankings Crime Stats “Multiple-Worlds” Mediation “Multiple-Worlds” Mediation
4
National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center Information Integration from a DB Perspective Information Integration ChallengeInformation Integration Challenge –Given: data sources S_1,..., S_k (DBMS, web sites,...) and user questions Q_1,...,Q_n that can be answered using the S_i –Find: the answers to Q_1,..., Q_n The Database Perspective: source = “database”The Database Perspective: source = “database” S_i has a schema (relational, XML, OO,...) S_i can be queried define virtual (or materialized) integrated views V over S_1,...,S_k using database query languages questions become queries Q_i against V(S_1,...,S_k) Why a Database Perspective?Why a Database Perspective? –scalability, efficiency, reusability (declarative queries),...
5
National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center Abstract (XML-Based) Mediator Architecture S_1 MEDIATOR XML Queries & Results USER/Client USER/Client Wrapper XML View S_2 Wrapper XML View S_k Wrapper XML View Integrated XML View V Integrated View Definition IVD(S_1,...,S_k) Query Q o V (S_1,...,S_k) Query Q o V (S_1,...,S_k)
6
National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center XMAS: XML Matching And Structuring language Integrated View Definition: “Find publications from amazon.com and DBLP, join on author, group by authors and title” CONSTRUCT $a1 $t $p { $p } { $a1, $t } WHERE $a1 : $t : IN WRAP(“amazon.com”) AND $a2 : $p : IN WRAP(“www...DBLP…”) AND value( $a1 ) = value( $a2 ) CONSTRUCT $a1 $t $p { $p } { $a1, $t } WHERE $a1 : $t : IN WRAP(“amazon.com”) AND $a2 : $p : IN WRAP(“www...DBLP…”) AND value( $a1 ) = value( $a2 ) XMAS XMAS Algebra
7
National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center Information Integration & Mediation for Scientific Data... a different set of problems (reality) came our way...
8
A Neuroscientist’s Information Integration Problem What is the cerebellar distribution of rat proteins with more than 70% homology with human NCS-1? Any structure specificity? How about other rodents? ? Information Integration ? Information Integration protein localization (NCMIR) protein localization (NCMIR) neurotransmission (SENSELAB) neurotransmission (SENSELAB) sequence info (CaPROT) sequence info (CaPROT) morphometry (SYNAPSE) morphometry (SYNAPSE) “Complex Multiple-Worlds” Mediation “Complex Multiple-Worlds” Mediation
9
A Geoscientist’s Information Integration Problem What is the distribution and U/ Pb zircon ages of A-type plutons in VA? How about their 3-D geometry ? How does it relate to host rock structures? ? Information Integration ? Information Integration Geologic Map (Virginia) Geologic Map (Virginia) GeoChemical GeoPhysical (gravity contours) GeoPhysical (gravity contours) GeoChronologic (Concordia) GeoChronologic (Concordia) Foliation Map (structure DB) Foliation Map (structure DB) “Complex Multiple-Worlds” Mediation “Complex Multiple-Worlds” Mediation
10
National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center DB mediation techniques Ontologies KR formalisms Model-Based Mediation Information Integration Landscape conceptual distance one-world multiple-worlds conceptual complexity/depth low high addall book-buyer BLAST EcoCyc Cyc WordNet GO home-buyer 24x7 consumer NCBI UMLS MIA Entrez RiboWeb Tambis Bioinformatics Geoinformatics
11
National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center What’s the Problem with XML & Complex Multiple-Worlds? XML is SyntaxXML is Syntax –DTDs talk about element nesting –XML Schema schemas give you data types –need anything else? => write comments! Domain Semantics is complex:Domain Semantics is complex: –implicit assumptions, hidden semantics sources seem unrelated to the non-expert Need Structure and Semantics beyond XML trees!Need Structure and Semantics beyond XML trees! employ richer OO models (UML, EER,...) make domain semantics and “glue knowledge” explicit use ontologies to fix terminology and conceptualization avoid ambiguities by using formal semantics
12
XML-Based vs. Model-Based Mediation Raw Data IF THEN Logical Domain Constraints Integrated-CM := CM-QL(Src1-CM,...) Integrated-CM := CM-QL(Src1-CM,...)...... (XML) Objects Conceptual Models XML Elements XML Models C2 C3 C1 R Classes, Relations, is-a, has-a,... Glue Maps DMs, PMs Glue Maps DMs, PMs Integrated-DTD := XML-QL(Src1-DTD,...) Integrated-DTD := XML-QL(Src1-DTD,...) No Domain Constraints A = (B*|C),D B =... Structural Constraints (DTDs), Parent, Child, Sibling,... CM ~ {Descr.Logic, ER, UML, RDF/XML(-Schema), …} CM-QL ~ {F-Logic, DAML+OIL, …}
13
National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center What’s the Glue? What’s in a Link? Syntactic JoinsSyntactic Joins – (X,Y) := X.SSN = Y.SSN equality – (X,Y) := X.UMLS-ID = Y.UID “Speciality” Joins“Speciality” Joins – (X,Y,Score) := BLAST(X,Y,Score) similarity Semantic/Rule-Based JoinsSemantic/Rule-Based Joins – (X,Y,C) := X isa C, Y isa C, BLAST(X,Y,S), S>0.8 homology, lub – (X,Y,[produces,B,increased_in]) := X produces B, B increased_in Y. rule-based e.g., X= - secretase, B=beta amyloid, Y=Alzheimer’s disease X Y
14
National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center Model-Based Mediation Methodology... Lift Sources to export Conceptual Models (CMs):Lift Sources to export Conceptual Models (CMs): CM(S) = OM(S) + KB(S) + CON(S) Object Model OM(S):Object Model OM(S): –complex objects (frames), class hierarchy, OO constraints Knowledge Base KB(S):Knowledge Base KB(S): –explicit representation of (“hidden”) source semantics –logic rules over OM(S) Contextualization CON(S):Contextualization CON(S): –situate OM(S) data using “glue maps” (GMs): domain maps DMs (ontology) = terminological knowledge: concepts + roles process maps PMs = “procedural knowledge”: states + transitions
15
National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center... Model-Based Mediation Methodology Integrated View Definition (IVD)Integrated View Definition (IVD) –declarative (logic) rules with object-oriented features –defined over CM(S), domain maps, process maps –needs “mediation engineers” = domain + KRDB experts Knowledge-Based Querying and Browsing (runtime):Knowledge-Based Querying and Browsing (runtime): –mediator composes the user query Q with the IVD... rewrites (Q o IVD), sends subqueries to sources... post-processes returned results (e.g., situate in context)
16
National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center S1 S2 S3 (XML-Wrapper) CM-Wrapper USER/Client USER/Client CM (Integrated View) Mediator Engine FL rule proc. LP rule proc. Graph proc. DDB engine CM(S) = OM(S)+KB(S)+CON(S) GCM CM S1 GCM CM S2 GCM CM S3 CM Queries & Results (exchanged in XML) Domain Maps DMs Domain Maps DMs Domain Maps DMs Domain Maps DMs Domain Maps DMs Process Maps PMs “Glue” Maps GMs semantic context CON(S) Integrated View Definition IVD Model-Based Mediator Architecture First results: KIND prototype, formal DM semantics, PMs [SSDBM00] [VLDB00] [ICDE01] [NIH-HB01] [EDBT02],... BIRN-CC,...
17
National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center Domain Maps & Ontologies as “Glue Knowledge Sources” Domain Map OntologyDomain Map Ontology –conceptualization of relevant entities and relationships –formal representation of terminological knowledge Use in Model-Based MediationUse in Model-Based Mediation –(derived) concepts as “drop points”, “anchor points”, “context” for source classes –compile-time use: view definition, subsumption, classification,... –runtime use: querying/deduction, path queries,.... KR Formalisms:KR Formalisms: –Semantic nets, Thesauri, Frame-Logic, Description Logics,...
18
National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center Domain Experts’ “Glue Knowledge” Cerebellum Source 1 Source 2 Source 3 Cerebellar Cortex Granule Cell Layer Purkinje Cell layer Molecular Layer has a Purkinje Cell Dendrite Dendritic spines Dendritic shaft Endoplasmic reticulum Purkinje Neuron has a
19
NCMIR ANATOM Domain Map: concepts concepts relations relations logic rules logic rules
20
National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center Formalizing Glue Knowledge: Domain Map for SYNAPSE and NCMIR Domain Map = labeled graph with concepts ("classes") and roles ("associations") additional semantics: expressed as logic rules (F-logic) Domain Map = labeled graph with concepts ("classes") and roles ("associations") additional semantics: expressed as logic rules (F-logic) Domain Map (DM) Purkinje cells and Pyramidal cells have dendrites that have higher-order branches that contain spines. Dendritic spines are ion (calcium) regulating components. Spines have ion binding proteins. Neurotransmission involves ionic activity (release). Ion-binding proteins control ion activity (propagation) in a cell. Ion-regulating components of cells affect ionic activity (release). Domain Expert Knowledge DM in Description Logic
21
National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center Source Contextualization & DM Refinement Source Contextualization & DM Refinement In addition to registering (“hanging off”) data relative to existing concepts, a source may also refine the mediator’s domain map... sources can register new concepts at the mediator...
22
Query Processing “Demo” Query Processing “Demo” Query results in context Contextualization CON(Result) wrt. ANATOM. provided by the domain expert and mediation engineer deductive OO language (here: F-logic) provided by the domain expert and mediation engineer deductive OO language (here: F-logic)
23
National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center Process Maps with Abstractions and Elaborations: => From Terminological to “Procedural Glue” nodes ~ states edges ~ processes, transitions blue/red edges: processes in Src1/Src2 general form of edges: how about these?
24
National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center What’s in an Answer? (What’s in a Link? revisited) Semantic/Rule-Based JoinsSemantic/Rule-Based Joins – (X,Y,[produces,B,increased_in]) := X produces B, B increased_in Y. rule-based e.g., X= - secretase, B=beta amyloid, Y=Alzheimer’s disease What is the Erdoes number of person P?What is the Erdoes number of person P? –3–3 Really? Why?Really? Why? –authority based: said so –faith based: don’t know but believe firmly –query statement Q =... derived it from DB –query Q =... derived it from DB and KB using derivation D logic-based systems often “come with explanations” ultimate goal: “computations as proofs”, “explanation-based computing” X Y
25
National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center Summary: Mediation Scenarios & Techniques Federated Databases XML-Based Mediation Model-Based Mediation One-World One-/Multiple-Worlds Complex Multiple-Worlds Common Schema Mediated Schema Common Glue Maps SQL, rules XML query languages DOOD query languages Schema Transformations Syntax-Aware Mappings Semantics-Aware Mappings Syntactic Joins Syntactic Joins “Semantic” Joins via Glue Maps DB expertDB expert KRDB + domain expert
26
National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center Technical Issues and Challenges Integration Method and ArchitectureIntegration Method and Architecture –federated DBs, warehouse/wrapper-mediator approach, GAV/LAV, Grid infrastructure,... Suitable KRDB Formalisms and FrameworksSuitable KRDB Formalisms and Frameworks –XML, DTDs, XML Schema, XPath, XQuery,... –RDF(S), Ontologies, Description Logics, DAML+OIL,... –querying, deduction, subsumption, classification,... Algorithms and ImplementationAlgorithms and Implementation –query composition, rewriting, reasoning, source capabilities,... Information Integration Scenario and ScopeInformation Integration Scenario and Scope –simple/complex, single/multiple worlds,...
27
National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center The Larger Infrastructure / Interoperability Picture The GridsThe Grids –Data-Grid (SRB,...), Computational-Grid (Globus,...), “Knowledge-Grid”,... The WebsThe Webs –W3C: HTML, XML, Semantic Web (RDF(S), DAML+OIL,...) Service & Protocol-Oriented ArchitecturesService & Protocol-Oriented Architectures –WSDL, SOAP, CORBA, EJB,... The Application LevelThe Application Level –applications (computations + KRDB mediation) are chained together to form... => analytical “Knowledge” Pipelines: NIH BIRN: LONI, NSF GriPhyN, DOE SciDAC, PDB, ASC, AVIRIS,... => Data => => Computations => => Analysis => => Knowledge => => Data => => Computations => => Analysis => => Knowledge =>
28
National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center Thank You! Questions? Queries?
29
National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center Models and Formal Approaches: Relating Theory to the World ©2000 by John F. Sowa, http://www.jfsowa.com/krbook/, Knowledge Representation: Logical, Philosophical, and Computational Foundations, Brooks/Cole, Pacific Grove, CA.http://www.jfsowa.com/krbook/Knowledge Representation: Logical, Philosophical, and Computational Foundations All models are wrong, but some are useful!
30
National Partnership of Advanced Computational Infrastructure San Diego Supercomputer CenterOntologies So what is an Ontology?So what is an Ontology? –definition of things that are relevant to your application –representation of terminological knowledge (“TBox”) –explicit specification of a conceptualization –concept hierarchy (“is-a”) –further semantic relationships between concepts –abstractions of relational schemas, (E)ER, UML classes, XML Schemas Examples:Examples: –NCMIR ANATOM –GO (Gene Ontology) –UMLS (Unified Medical Language System –CYC
31
National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center Description Logics Terminological Knowledge (TBox)Terminological Knowledge (TBox) –Concept Definition (naming of concepts): –Axiom (constraining of concepts): => a mediators “glue knowledge source” Assertional Knowledge (ABox)Assertional Knowledge (ABox) –the marked neuron in image 27 => the concrete instances/individuals of the concepts/classes that your sources export
32
National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center Description Logic DL definition of “Happy Father” (Example from Ian Horrocks, U Manchester, UK)DL definition of “Happy Father” (Example from Ian Horrocks, U Manchester, UK)
33
National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center Some Open Database & Knowledge Representation Issues Mix of Query Processing and ReasoningMix of Query Processing and Reasoning –FaCT description logic reasoner for DMs? –or reconcilation of DMs via argumentation-frameworks (“games”) using well-founded and stable models of logic programs [ICDT97,PODS97,TCS00] Modeling “Process Knowledge” => Process MapsModeling “Process Knowledge” => Process Maps –formal semantics? (dynamic/temporal/Kripke models?) –executable semantics? (Statelog?) Graph Queries over DMs and PMsGraph Queries over DMs and PMs –expressible in F-logic [InfSystem98] –scalability? (UMLS Domain Map has millions of entries)......
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.