From Data Integration To Semantic Mediation: Addressing Heterogeneities in Data Bertram Ludäscher Bertram Ludäscher Knowledge-Based Information.

Slides:



Advertisements
Similar presentations
Semantic Interoperability & Semantic Models: Introduction
Advertisements

1 ICS-FORTH EU-NSF Semantic Web Workshop 3-5 Oct Christophides Vassilis Database Technology for the Semantic Web Vassilis Christophides Dimitris Plexousakis.
XML: Extensible Markup Language
0 General information Rate of acceptance 37% Papers from 15 Countries and 5 Geographical Areas –North America 5 –South America 2 –Europe 20 –Asia 2 –Australia.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query.
XML Views El Hazoui Ilias Supervised by: Dr. Haddouti Advanced XML data management.
1 Overview XML crash course –HTML vs. XML –pure XML data model (XML = linear syntax for trees) XML Schema Rubin Landau, Bertram Ludaescher, Richard Marciano,
A Transducer-Based XML Query Processor Bertram Ludäscher, SDSC/CSE UCSD Pratik Mukhopadhyay, CSE UCSD Yannis Papakonstantinou, CSE UCSD.
Advanced Topics COMP163: Database Management Systems University of the Pacific December 9, 2008.
Chapter 1: Data Models and DBMS Architecture Title: What Goes Around Comes Around Authors: M. Stonebraker, J. Hellerstein Pages: 2-40.
1 COS 425: Database and Information Management Systems XML and information exchange.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
Knowledge Mediation in the WWW based on Labelled DAGs with Attached Constraints Jutta Eusterbrock WebTechnology GmbH.
Amarnath Gupta Univ. of California San Diego. An Abstract Question There is no concrete answer …but …
XML, distributed databases, and OLAP/warehousing The semantic web and a lot more.
Managing & Integrating Enterprise Data with Semantic Technologies Susie Stephens Principal Product Manager, Oracle
Modeling Interactive Web Sources for Information Mediation Information Mediation Framework/Motivation Modeling Interactive Sources with Interaction Diagrams.
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. Towards Translating between XML and WSML based on mappings between.
Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.
XML Overview. Chapter 8 © 2011 Pearson Education 2 Extensible Markup Language (XML) A text-based markup language (like HTML) A text-based markup language.
San Diego Supercomputer Center EDBT'02, Prague 1 EDBT Panel, March 2002, Prague: Scientific Data Integration for Complex Multiple-Worlds Scenarios: Databases.
1 Tutorial #5: Scientific Data Integration and Mediation San Diego Supercomputer Center U.C. San Diego U.C. San Diego Bertram Ludäscher Ilkay Altintas.
GEON-UTEP GEON-Knowledge Representation WG Update GEON-KR list (currently) Bertram Ludaescher (SDSC: Bertram Ludaescher (SDSC:
Brain Data & Knowledge Grid (or: Towards Services for Knowledge-Based Mediation of Neuroscience Information Sources) National Center for Microscopy and.
Model Based Mediation With Domain Maps ___________________________ Xiaosen Li Guanrao William
Data R&D Issues for GTL Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego Bertram Ludäscher
GEON AHM, April 16-18, SDSC C YBERINFRASTRUCTURE FOR THE G EOSCIENCES Towards Semantic Mediation for GEON: Facilitating Scientific Data Integration using.
CSE-291: Ontologies in Data & Process Integration Department of Computer Science & Engineering University of California, San Diego CSE-291: Ontologies.
Alignment of ATL and QVT © 2006 ATLAS Nantes Alignment of ATL and QVT Ivan Kurtev ATLAS group, INRIA & University of Nantes, France
San Diego Supercomputer Center University of California, San Diego The MIX Project Native XML Database XML View(s) Wrappers export: 1. Schemas & Metadata.
Nancy Lawler U.S. Department of Defense ISO/IEC Part 2: Classification Schemes Metadata Registries — Part 2: Classification Schemes The revision.
Kepler/pPOD: Scientific Workflow and Provenance Support for Assembling the Tree of Life UC DAVIS Department of Computer Science The Kepler/pPOD Team Shawn.
Scientific Data Integration with Model-Based Mediation : Databases Meets * Knowledge Representation Bertram Ludäscher Bertram
Navigation-Driven Evaluation of Virtual Mediated Views Bertram Ludäscher, SDSC/UCSD Yannis Papakonstantinou, UCSD Pavel Velikhov, UCSD Overview Mediator.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
1 CS 430 Database Theory Winter 2005 Lecture 17: Objects, XML, and DBMSs.
Model-Based Mediation: Framework and Challenges Bertram Ludäscher Data and Knowledge Systems San Diego Supercomputer Center U.C. San.
Knowledge Management for Digital Libraries, Mediated Views, & Archives Bertram Ludäscher * Data and Knowledge Systems (DAKS) * San Diego.
1 Lessons from the TSIMMIS Project Yannis Papakonstantinou Department of Computer Science & Engineering University of California, San Diego.
© DATAMAT S.p.A. – Giuseppe Avellino, Stefano Beco, Barbara Cantalupo, Andrea Cavallini A Semantic Workflow Authoring Tool for Programming Grids.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
Rule-Based Programming for VORBs Bertram Ludaescher Arcot Rajasekar Data and Knowledge Systems San Diego Supercomputer Center U.C. San Diego.
1 Model-Based Information Integration in a Neuroscience Mediator System Bertram Ludaescher Amarnath Gupta Maryann E. Martone University of California San.
San Diego Supercomputer Center XMLDM'02, Prague 1 Time to Leave the Trees: From Syntactic to Conceptual Querying of XML Bertram Ludäscher Ilkay Altintas.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
Knowledge-Based Integration of Neuroscience Data Sources Amarnath Gupta Bertram Ludäscher Maryann Martone University of California San Diego.
Information Integration BIRN supports integration across complex data sources – Can process wide variety of structured & semi-structured sources (DBMS,
From Database Federation to Model-Based Mediation: Databases Meets * Knowledge Representation Bertram Ludäscher Data and Knowledge Systems.
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
Mining the Biomedical Research Literature Ken Baclawski.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
NeuroLOG ANR-06-TLOG-024 Software technologies for integration of process and data in medical imaging A transitional.
Model-Based Mediation with Domain Maps Bertram Ludäscher * Amarnath Gupta * Maryann E. Martone + * San Diego Supercomputer Center (SDSC) + National Center.
Semantic Mediation and Scientific Workflows Bertram Ludäscher Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego.
CSE-291: Ontologies in Data Integration Department of Computer Science & Engineering University of California, San Diego CSE-291: Ontologies in Data Integration.
An Extensible Model-Based Mediator System with Domain Maps Amarnath Gupta * Bertram Ludäscher * Maryann E. Martone + * San Diego Supercomputer Center (SDSC)
WonderWeb. Ontology Infrastructure for the Semantic Web. IST Project Review Meeting, 11 th March, WP2: Tools Raphael Volz Universität.
Welcome to CPSC 534B: Information Integration Laks V.S. Lakshmanan Rm. 315.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center KNOW-ME (KNOWledge-Map-Explorer) Semantic Browsing of Integrated.
1 Integrating Databases into the Semantic Web through an Ontology-based Framework Dejing Dou, Paea LePendu, Shiwoong Kim Computer and Information Science,
XML and Distributed Applications By Quddus Chong Presentation for CS551 – Fall 2001.
Semantic Data Integration: From Syntax and Structural Transformations to Semantics Bertram Ludäscher Data and Knowledge Systems San Diego.
XML: Extensible Markup Language
UCSD Neuron-Centered Database
Data R&D Issues for GTL Bertram Ludäscher Data and Knowledge Systems
XML in Web Technologies
Ontologies: Introduction and Some Uses
Presentation transcript:

From Data Integration To Semantic Mediation: Addressing Heterogeneities in Data Bertram Ludäscher Bertram Ludäscher Knowledge-Based Information Systems Lab San Diego Supercomputer Center and Department of Computer Science & Engineering University of California, San Diego Bertram Ludäscher Bertram Ludäscher Knowledge-Based Information Systems Lab San Diego Supercomputer Center and Department of Computer Science & Engineering University of California, San Diego

2 Outline 1.Information Integration from a Database Perspective 2.XML-Based Data Integration 3.Model-Based / Semantic Mediation 4.Discussion

An Online Shopper’s Information Integration Problem El Cheapo: “Where can I get the cheapest copy (including shipping cost) of Wittgenstein’s Tractatus Logicus-Philosophicus within a week?” ? Information Integration ? Information Integration addall.com “One-World” Scenario: XML-based mediator “One-World” Scenario: XML-based mediator amazon.com A1books.com half.com barnes&noble.com Mediator (virtual DB) (vs. Datawarehouse) Mediator (virtual DB) (vs. Datawarehouse)

A Home Buyer’s Information Integration Problem Which houses for sale under $500k have at least 2 bathrooms, 2 bedrooms, a nearby school ranking in the upper third, in a neighborhood with below-average crime rate and diverse population? ? Information Integration ? Information Integration Realtor Demographics School Rankings Crime Stats “Multiple-Worlds” Scenario: XML-based mediator “Multiple-Worlds” Scenario: XML-based mediator

A Neuroscientist’s Information Integration Problem What is the cerebellar distribution of rat proteins with more than 70% homology with human NCS-1? Any structure specificity? How about other rodents? ? Information Integration ? Information Integration protein localization (NCMIR) protein localization (NCMIR) neurotransmission (SENSELAB) neurotransmission (SENSELAB) sequence info (CaPROT) sequence info (CaPROT) morphometry (SYNAPSE) morphometry (SYNAPSE) “Complex Multiple- Worlds” Scenario: Model-based mediator “Complex Multiple- Worlds” Scenario: Model-based mediator

A Geoscientist’s Information Integration Problem What is the distribution and U/ Pb zircon ages of A-type plutons in VA? How about their 3-D geometry ? How does it relate to host rock structures? ? Information Integration ? Information Integration Geologic Map (Virginia) Geologic Map (Virginia) GeoChemical GeoPhysical (gravity contours) GeoPhysical (gravity contours) GeoChronologic (Concordia) GeoChronologic (Concordia) Foliation Map (structure DB) Foliation Map (structure DB) “Complex Multiple- Worlds” Scenario: Model-based mediator “Complex Multiple- Worlds” Scenario: Model-based mediator

7 Information Integration Challenges: Heterogeneities = S 4... System AspectsSystem Aspects –platforms, devices, distribution, APIs, protocols, … SyntaxesSyntaxes –heterogeneous data formats (one for each tool...) StructuresStructures –heterogeneous schemas (one for each DB...) –heterogeneous data models (RDBs, ORDBs, OODBs, XMLDBs, flat files, …) SemanticsSemantics –unclear & “hidden” semantics : e.g., incoherent terminology, multiple / informal taxonomies, implicit assumptions,...

8 Information Integration Challenges System aspects: “Grid” middlewareSystem aspects: “Grid” middleware –distributed data & computing –Web services, WSDL/SOAP, … –sources = functions, files, databases, … Syntax & Structure:Syntax & Structure: (XML-Based) Mediators –wrapping, restructuring –(XML) queries and views –sources = (XML) databases Semantics:Semantics: Model-Based/Semantic Mediators –conceptual models and declarative views –Semantic Web: ontologies, description logics, RDF(S), DAML+OIL, OWL,... –sources = knowledge bases (DB+CMs+ICs) Syntax Structure Semantics System aspects  reconciling S 4 heterogeneities  “gluing” together multiple data sources  bridging information and knowledge gaps computationally

9 Information Integration from a DB Perspective Information Integration ProblemInformation Integration Problem –Given: data sources S 1,..., S k (DBMS, web sites,...) and user questions Q 1,..., Q n that can be answered using the S i –Find: the answers to Q 1,..., Q n The Database Perspective: source = “database”The Database Perspective: source = “database”  S i has a schema (relational, XML, OO,...)  S i can be queried  define virtual (or materialized) integrated views V over S 1,..., S k using database query languages (SQL, XQuery,...)  questions become queries Q i against V(S 1,..., S k )

10 Outline 1.Information Integration from a Database Perspective 2.XML-Based Data Integration 3.Model-Based / Semantic Mediation 4.Discussion

11 Extensible Markup Language (XML) (meta)language for marking up text & data with user-definable tags(meta)language for marking up text & data with user-definable tags –(X)HTML, XSLT, XML Schema,... –MathML, BioML, GeoML, NeuroML,... –XML-RPC, SOAP, WSDL, OWL,... semistructured tree data modelsemistructured tree data model –flexible: marked-up text, web-pages, databases,... container model:container model: –“boxes within boxes” (meta)language for marking up text & data with user-definable tags(meta)language for marking up text & data with user-definable tags –(X)HTML, XSLT, XML Schema,... –MathML, BioML, GeoML, NeuroML,... –XML-RPC, SOAP, WSDL, OWL,... semistructured tree data modelsemistructured tree data model –flexible: marked-up text, web-pages, databases,... container model:container model: –“boxes within boxes”... in their wonderful book called SemWeb Tractat by B. Schatz and T.B. Lee, the authors show how... author: “B. Schatz” book: title: “SemWeb Tractat” author: “T.B. Lee” book title author “SemWeb Tractat” author “B. Schatz” “T.B. Lee” SemWeb Tractat B. Schatz T.B. Lee... in their wonderful book called SemWeb Tractat by B. Schatz and T.B. Lee, the authors show how...

12 XML-Based Mediator Architecture MEDIATOR XML Queries & Results S1S1 Wrapper XML View S2S2 Wrapper XML View SkSk Wrapper XML View Integrated Global XML View G Integrated View Definition G(..)  S 1 (..)…S k (..) USER/Client USER/Client Query Q ( G (S 1,..., S k ) ) Query Q ( G (S 1,..., S k ) )

13 Some Challenges in XML-Based Integration... XML Query/Transformation LanguagesXML Query/Transformation Languages –DB community: QLs for semistructured data, e.g., TSIMMIS/MSL, Lorel, Yatl,..., Florid/F-logic [InfSystems98] –CSE/SDSC: XMAS [SSD99,SIGMOD99,WebDB99,EDBT00] –W3C: XPath, XSLT, XQuery (Working Draft, June 2001) XML Schema LanguagesXML Schema Languages –DTDs, RELAX NG, XML Schema,... [XMLDM02] DB Theoreticians:DB Theoreticians: –Expressiveness/Complexity Trade-Off querying: FO, (WF/S-)Datalog, FO(LFP), FO(PFP),..., allquerying: FO, (WF/S-)Datalog, FO(LFP), FO(PFP),..., all reasoning: query satisfiability, containment, equivalencereasoning: query satisfiability, containment, equivalence......

14 XMAS: XML Matching And Structuring language Integrated View Definition: “Find books from amazon.com and DBLP, join on author, group by authors and title” CONSTRUCT $a1 $t $p { $p } { $a1, $t } WHERE $a1 : $t : IN "amazon.com" AND $a2 : $p : IN " " AND value( $a1 ) = value( $a2 ) CONSTRUCT $a1 $t $p { $p } { $a1, $t } WHERE $a1 : $t : IN "amazon.com" AND $a2 : $p : IN " " AND value( $a1 ) = value( $a2 ) XMAS XMAS Algebra [QL98,SIGMOD99] [EDBT00]

15 XML (XMAS) Query Processing Translator Rewriter/Optimizer: Q’(S) composed plan optimized plan XML Query Q Composition Q(G) XML Global View Definition G(S) algebraic plans Plan Execution Compile-time Run-time:query evaluation

16 …New Challenges in (XML-Based) Mediation Global-As-View (GAV)Global-As-View (GAV) –user query Q  global relations G Q(G) –global relations G  source relations S G(S) –challenge: compute answers Q(G(V(S))) without computing all of V and G  query rewriting (with limited source capabilities): Q’(S) = Q(G) Local-As-View (LAV)Local-As-View (LAV) –user query Q  global relations G Q(G) –source relations S  global relations G S(G) –challenge: “reverse/rewrite rules” from S(G) to some G’(S)  answering queries using views: equivalent rewritings may not exist  find maximally contained ones: Q’(G’(S))  Q(G) Inter(CS)disciplinary research needed: DB  FP  LPInter(CS)disciplinary research needed: DB  FP  LP –GAV/LAV  view (un)folding  Clark’s completion, resolution, factoring

17 Querying XML Streams: A New Frontier New applications for stream-based XML processing:New applications for stream-based XML processing: –Continuous, real-time data streams (wireless sensor networks, …) –Data / message transformation in Web services (SOAP, RMI, processing …) –Extract-transform-load applications (Tera/Peta-byte archival migration, …) … leading to a new XML querying & transformation paradigm:… leading to a new XML querying & transformation paradigm: –how to execute (some) XML queries & transformations on very large (infinite) data streams using only limited memory –XML stream machine (XSM): extended XML transducers with buffers XQuery XSM network XSMs clearly outperform tree-based approaches on streamable queries (100x over Xalan) [A Transducer-Based XML Query Processor, Ludäscher Mukhopadhyay, Papakonstantinou, VLDB’02]

18 Outline 1.Information Integration from a Database Perspective 2.XML-Based Data Integration 3.Model-Based / Semantic Mediation 4.Discussion

A Neuroscientist’s Information Integration Problem What is the cerebellar distribution of rat proteins with more than 70% homology with human NCS-1? Any structure specificity? How about other rodents? ? Information Integration ? Information Integration protein localization (NCMIR) protein localization (NCMIR) neurotransmission (SENSELAB) neurotransmission (SENSELAB) sequence info (CaPROT) sequence info (CaPROT) morphometry (SYNAPSE) morphometry (SYNAPSE) “Complex Multiple-Worlds” Mediation “Complex Multiple-Worlds” Mediation

A Geoscientist’s Information Integration Problem What is the distribution and U/ Pb zircon ages of A-type plutons in VA? How about their 3-D geometry ? How does it relate to host rock structures? ? Information Integration ? Information Integration Geologic Map (Virginia) Geologic Map (Virginia) GeoChemical GeoPhysical (gravity contours) GeoPhysical (gravity contours) GeoChronologic (Concordia) GeoChronologic (Concordia) Foliation Map (structure DB) Foliation Map (structure DB) “Complex Multiple-Worlds” Mediation “Complex Multiple-Worlds” Mediation

21 What’s the Problem with XML & Complex Multiple-Worlds? XML is SyntaxXML is Syntax –... for labeled ordered trees –... all semantics lies outside of XML XML DTDs => tags + nestingXML DTDs => tags + nesting XML Schema => DTDs + data modeling XML Schema => DTDs + data modeling need anything else? => write comments!need anything else? => write comments! Domain Semantics is Complex:Domain Semantics is Complex: –implicit assumptions, hidden semantics  sources seem unrelated to the non-expert Need Structure and Semantics beyond trees!Need Structure and Semantics beyond trees!  employ richer OO models  make domain semantics and “glue knowledge” explicit  use ontologies to fix terminology and conceptualization  avoid ambiguities by using KR and formal semantics

22 DB mediation techniques Ontologies KR formalisms Model-Based Mediation Information Integration Landscape conceptual distance one-world multiple-worlds conceptual complexity/depth low high addall book-buyer BLAST EcoCyc Cyc WordNet GO home-buyer 24x7 consumer UMLS MIA Entrez RiboWeb Tambis Bioinformatics Geo-, Ecoinformatics

XML-Based vs. Model-Based Mediation Raw Data IF  THEN  Logical Domain Constraints Integrated-CM  CM-QL(Src1-CM,...) (XML) Objects Conceptual Models XML Elements XML Models C2 C3 C1 R Classes, Relations, is-a, has-a,... “Glue Maps” = Domain & Process Maps (ontologies) Integrated-DTD  XML-QL(Src1-DTD,...) No Domain Constraints A = (B*|C),D B =... Structural Constraints (DTDs), Parent, Child, Sibling,... CM ~ {Descr.Logic, ER, UML, RDF/XML(-Schema), …} CM-QL ~ {F-Logic, DAML+OIL, …}

24 What’s the Glue? What’s in a Link? Syntactic JoinsSyntactic Joins –  (X,Y) := X.SSN = Y.SSN equality –  (X,Y) := X.UMLS-ID = Y.UID “Speciality” Joins“Speciality” Joins –  (X,Y,Score) := BLAST(X,Y,Score) similarity Semantic/Rule-Based JoinsSemantic/Rule-Based Joins –  (X,Y,C) := X isa C, Y isa C, BLAST(X,Y,S), S>0.8 homology, lub –  (X,Y,[produces,B,increased_in]) := X produces B, B increased_in Y. rule-based e.g., X=  - secretase, B=beta amyloid, Y=Alzheimer’s disease CS Challenge:CS Challenge: –compile semantic joins into efficient syntactic ones X Y 

25 Semantic Mediation SOURCES Lift Sources to export CMs:Lift Sources to export CMs: CM(S) = OM(S) + KB(S) + CON(S) Object Model OM(S):Object Model OM(S): –complex objects (frames), class hierarchy, OO constraints Knowledge Base KB(S):Knowledge Base KB(S): –explicit representation of (“hidden”) source semantics –logic rules over OM(S) Contextualization CON(S):Contextualization CON(S): –situate OM(S) data using “glue maps” (ontologies):  domain maps DMs = terminological knowledge: concepts + roles  process maps PMs = “procedural knowledge”: states + transitions

26 Semantic Mediation MEDIATOR Integrated View Definition (IVD)Integrated View Definition (IVD) –declarative (logic) rules with object-oriented features –defined over CM(S), domain maps, process maps –needs “mediation engineers” = domain + KRDB experts Knowledge-Based Querying and Browsing (runtime):Knowledge-Based Querying and Browsing (runtime): –mediator composes the user query Q with the IVD... rewrites (Q o IVD), sends subqueries to sources... post-processes returned results (e.g., situate in context)

27 S1 S2 S3 (XML-Wrapper) CM-Wrapper USER/Client USER/Client CM (Integrated View) Mediator Engine FL rule proc. LP rule proc. Graph proc. XSB Engine CM(S) = OM(S)+KB(S)+CON(S) GCM CM S1 GCM CM S2 GCM CM S3 CM Queries & Results (exchanged in XML) Domain Maps DMs Domain Maps DMs Domain Maps DMs Domain Maps DMs Domain Maps DMs Process Maps PMs “Glue” Maps GMs semantic context CON(S) Integrated View Definition IVD Model-Based Mediator Architecture First results & Demos: KIND prototype, formal DM semantics, PMs [SSDBM00] [VLDB00] [ICDE01] [NIH-HB01] [BNCOD02] [ER02] [EDBT02] [BioInf02]

28 Domain Map = labeled graph with concepts ("classes") and roles ("associations") additional semantics: expressed as logic rules (F-logic) Domain Map = labeled graph with concepts ("classes") and roles ("associations") additional semantics: expressed as logic rules (F-logic) Domain Map (DM) Purkinje cells and Pyramidal cells have dendrites that have higher-order branches that contain spines. Dendritic spines are ion (calcium) regulating components. Spines have ion binding proteins. Neurotransmission involves ionic activity (release). Ion-binding proteins control ion activity (propagation) in a cell. Ion-regulating components of cells affect ionic activity (release). Domain Expert Knowledge DM in Description Logic Formalizing Glue Knowledge: Domain Map for SYNAPSE and NCMIR

29 Source Contextualization & DM Refinement Source Contextualization & DM Refinement In addition to registering (“hanging off”) data relative to existing concepts, a source may also refine the mediator’s domain map...  sources can register new concepts at the mediator...

Example: ANATOM Domain Map Example: ANATOM Domain Map

31 Browsing Registered Data with Domain Maps

Query Processing Demo Query Processing Demo Query results in context Contextualization CON(Result) wrt. ANATOM. Mediator View Definition DERIVE protein_distribution (Protein, Organism,Brain_region, Feature_name, Anatom, Value) WHERE I: protein_label_image[ proteins ->> {Protein}; organism -> Organism; anatomical_structures ->> {AS: anatomical_structure[ name->Anatom ] } ], % from PROLAB NAE: neuro_anatomic_entity[ name->Anatom; % from ANATOM NAE: neuro_anatomic_entity[ name->Anatom; % from ANATOM located_in->>{Brain_region} ], located_in->>{Brain_region} ], AS..segments..features [ name->Feature_name; value->Value ]. AS..segments..features [ name->Feature_name; value->Value ]. provided by the domain expert and mediation engineer deductive OO language (here: F-logic)

Example: Inside Query Evaluation push X1 := select targets of “output from parallel fiber” ; determine source X2 := “find and situate” X1 in ANATOM Domain Map; compute region of interest (here: downward X3 := subregion-closure(X2); push X4 := select PROT-data(X3, Ryanodine Receptors); compute protein X5 := compute aggregate(X4); display in display X5 in context (ANATOM) "How does the parallel fiber output (Yale/SENSELAB) relate to the distribution of Ryanodine Receptors (UCSD/NCMIR)?” => DEMONSTRATION

34 Open Database & Knowledge Representation Issues Mix of Query Processing and ReasoningMix of Query Processing and Reasoning –GAV & LAV with semantic query optimization (NIH BIRN, NSF GEON) –description logic reasoner for DMs (FaCT) ? –reconciliation of conflicting DMs via argumentation-frameworks (“games”) using well-founded and stable models of logic programs [ICDT97, PODS97, TCS00, TODS02] Modeling “Process Knowledge” => Process MapsModeling “Process Knowledge” => Process Maps –formal semantics? (dynamic/temporal/Kripke models/Petri nets?) –executable semantics? (Statelog?) Graph Queries over DMs and PMsGraph Queries over DMs and PMs –expressible in F-logic [InfSystem98] –scalability? (UMLS Domain Map has millions of entries) How to incorporate “procedural features”?How to incorporate “procedural features”? –Bioinformatics, Ecoinformatics, … => sources = DBs + analytical tools + …  scientific workflow planning and management (“promoter identification workflow” for DOE SciDAC, NSF/ITR SEEK)

35 Process Maps with Abstractions and Elaborations: From Terminological to Procedural Glue nodes ~ states edges ~ processes, transitions blue/red edges: processes in Src1/Src2 general form of edges: related formalisms

36 A Scientific Workflow: Promoter Identification Questions: Are chr#’s in common? Are chr#’s locations in common? Are there conserved upstream sequences? Are gene locations conserved across species Questions: RNA POLII promoter? GpC Island present? Are there common TAF’s across genomic gi#? Questions: Are there other common genes? gi#’s from clusfavor cDNA gi# Gene name blast blast human Genomic gi# Chr # Gene location TAF’s Location on Genomic gi#’s Probabilities of match Probabilities of random match TRANSFAC GC Island location Exon/intron location Repeats location Promoter location GRAIL Validates polII promoter location promoter location Shared TAF’s across cluster Common consensus sequence Data Consolidation Consensus sequences CLUSTAL blast other species Genomic gi# Chr # Gene location blast Matthew Coleman, LLNL, 2002 Genomic gi# cDNA gi# blast CLUSTAL TRANSFAC

37 SDM Demo & Architecture Translation Approach: Abstract Workflow (AWF) => Executable Workflow (EWF) Translation Approach: Abstract Workflow (AWF) => Executable Workflow (EWF)

38 Analytical Pipelines: An Open Source Tool

39 A Commercial Tool for Analytical Pipelines

40 Summary: Mediation Scenarios & Techniques Federated Databases XML-Based Mediation Model-Based Mediation One-World One-/Multiple-Worlds Complex Multiple-Worlds Common Schema Mediated Schema Common Glue Maps SQL, rules XML query languages DOOD query languages Schema Transformations Syntax-Aware Mappings Semantics-Aware Mappings Syntactic Joins Syntactic Joins “Semantic” Joins via Glue Maps DB expertDB expert KRDB + domain experts Glue?

41 GEON vs. SEEK

42 Outline 1.Information Integration from a Database Perspective 2.XML-Based Data Integration 3.Model-Based / Semantic Mediation 4.Discussion

43 Thank you! Questions? Queries?

44 Some References Model-Based Mediation:Model-Based Mediation: –A Model-Based Mediator System for Scientific Data Management, B. Ludäscher, A. Gupta, M. Martone, Bioinformatics: Managing Scientific Data, Lacroix, Critchlow (eds), Morgan Kaufmann, to appear, 2003 –Model-Based Mediation with Domain Maps, B. Ludäscher, A. Gupta, M. E. Martone, 17th Intl. Conference on Data Engineering (ICDE’01), Heidelberg, Germany, IEEE Computer Society, Model-Based Mediation with Domain Maps(ICDE’01)Model-Based Mediation with Domain Maps(ICDE’01) –Managing Semistructured Data with FLORID: A Deductive Object-Oriented Perspective, B. Ludäscher, R. Himmeröder, G. Lausen, W. May, C. Schlepphorst, Information Systems, 23(8), Special Issue on Semistructured Data, Managing Semistructured Data with FLORID: A Deductive Object-Oriented PerspectiveInformation Systems, 23(8), Special Issue on Semistructured DataManaging Semistructured Data with FLORID: A Deductive Object-Oriented PerspectiveInformation Systems, 23(8), Special Issue on Semistructured Data XML-Based Mediation:XML-Based Mediation: –VXD/Lazy Mediators: Navigation-Driven Evaluation of Virtual Mediated Views, B. Ludäscher, Y. Papakonstantinou, P. Velikhov, Intl. Conference on Extending Database Technology (EDBT’00), Konstanz, Germany, LNCS 1777, Springer, Navigation-Driven Evaluation of Virtual Mediated Views (EDBT’00)Navigation-Driven Evaluation of Virtual Mediated Views (EDBT’00) –XML Streams: A Transducer-Based XML Query Processor, B. Ludäscher, P. Mukhopadhyay, Y. Papakonstantinou, Intl. Conference on Very Large Databases (VLDB’02), Hong Kong, 2002

45 Knowledge Representation: Relating Theory to the World via Formal Models John F. Sowa, Knowledge Representation: Logical, Philosophical, and Computational FoundationsKnowledge Representation: Logical, Philosophical, and Computational Foundations “All models are wrong, but some are useful!”