Model-Based Mediation with Domain Maps Bertram Ludäscher * Amarnath Gupta * Maryann E. Martone + * San Diego Supercomputer Center (SDSC) + National Center.

Slides:



Advertisements
Similar presentations
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Advertisements

Three-Step Database Design
Semantic Interoperability & Semantic Models: Introduction
1 ICS-FORTH & Univ. of Crete SeLene November 15, 2002 A View Definition Language for the Semantic Web Maganaraki Aimilia.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 8 Slide 1 System modeling 2.
Interactive Generation of Integrated Schemas Laura Chiticariu et al. Presented by: Meher Talat Shaikh.
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
1 Draft of a Matchmaking Service Chuang liu. 2 Matchmaking Service Matchmaking Service is a service to help service providers to advertising their service.
New Approaches to GIS and Atlas Production Infrastructure for spatial data integration: across scales and projects Ilya Zaslavsky David Valentine San Diego.
1 COS 425: Database and Information Management Systems XML and information exchange.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
Modified from Sommerville’s originalsSoftware Engineering, 7th edition. Chapter 8 Slide 1 System models.
Knowledge Mediation in the WWW based on Labelled DAGs with Attached Constraints Jutta Eusterbrock WebTechnology GmbH.
Amarnath Gupta Univ. of California San Diego. An Abstract Question There is no concrete answer …but …
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
Modeling Interactive Web Sources for Information Mediation Information Mediation Framework/Motivation Modeling Interactive Sources with Interaction Diagrams.
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. Towards Translating between XML and WSML based on mappings between.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 7 Slide 1 System models l Abstract descriptions of systems whose requirements are being.
Chapter 4 System Models A description of the various models that can be used to specify software systems.
San Diego Supercomputer Center EDBT'02, Prague 1 EDBT Panel, March 2002, Prague: Scientific Data Integration for Complex Multiple-Worlds Scenarios: Databases.
National Partnership for Advanced Computational Infrastructure Digital Library Architecture Reagan Moore Chaitan Baru Amarnath Gupta George Kremenek Bertram.
Brain Data & Knowledge Grid (or: Towards Services for Knowledge-Based Mediation of Neuroscience Information Sources) National Center for Microscopy and.
Grant Number: IIS Institution of PI: Arizona State University PIs: Zoé Lacroix Title: Collaborative Research: Semantic Map of Biological Data.
Model Based Mediation With Domain Maps ___________________________ Xiaosen Li Guanrao William
Data R&D Issues for GTL Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego Bertram Ludäscher
GEON AHM, April 16-18, SDSC C YBERINFRASTRUCTURE FOR THE G EOSCIENCES Towards Semantic Mediation for GEON: Facilitating Scientific Data Integration using.
CSE-291: Ontologies in Data & Process Integration Department of Computer Science & Engineering University of California, San Diego CSE-291: Ontologies.
Alignment of ATL and QVT © 2006 ATLAS Nantes Alignment of ATL and QVT Ivan Kurtev ATLAS group, INRIA & University of Nantes, France
San Diego Supercomputer Center University of California, San Diego The MIX Project Native XML Database XML View(s) Wrappers export: 1. Schemas & Metadata.
Entity Framework Overview. Entity Framework A set of technologies in ADO.NET that support the development of data-oriented software applications A component.
Scientific Data Integration with Model-Based Mediation : Databases Meets * Knowledge Representation Bertram Ludäscher Bertram
Navigation-Driven Evaluation of Virtual Mediated Views Bertram Ludäscher, SDSC/UCSD Yannis Papakonstantinou, UCSD Pavel Velikhov, UCSD Overview Mediator.
Validated Model Transformation Tihamér Levendovszky Budapest University of Technology and Economics Department of Automation and Applied Informatics Applied.
Model-Based Mediation: Framework and Challenges Bertram Ludäscher Data and Knowledge Systems San Diego Supercomputer Center U.C. San.
1 Lessons from the TSIMMIS Project Yannis Papakonstantinou Department of Computer Science & Engineering University of California, San Diego.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
Dimitrios Skoutas Alkis Simitsis
Chapter 7 System models.
Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009.
System models l Abstract descriptions of systems whose requirements are being analysed.
Rule-Based Programming for VORBs Bertram Ludaescher Arcot Rajasekar Data and Knowledge Systems San Diego Supercomputer Center U.C. San Diego.
Neuroscience Information Framework Ontologies: Nerve cells in Neurolex and NIFSTD Maryann Martone University of California, San Diego.
1 Model-Based Information Integration in a Neuroscience Mediator System Bertram Ludaescher Amarnath Gupta Maryann E. Martone University of California San.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure Feature Interpretation in Vector Data: Reconciling Spatial.
San Diego Supercomputer Center XMLDM'02, Prague 1 Time to Leave the Trees: From Syntactic to Conceptual Querying of XML Bertram Ludäscher Ilkay Altintas.
From Data Integration To Semantic Mediation: Addressing Heterogeneities in Data Bertram Ludäscher Bertram Ludäscher Knowledge-Based Information.
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
Knowledge-Based Integration of Neuroscience Data Sources Amarnath Gupta Bertram Ludäscher Maryann Martone University of California San Diego.
From Database Federation to Model-Based Mediation: Databases Meets * Knowledge Representation Bertram Ludäscher Data and Knowledge Systems.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
C++ Inheritance Data Structures & OO Development I 1 Computer Science Dept Va Tech June 2007 © McQuain Generalization versus Abstraction Abstraction:simplify.
Semantic Mediation and Scientific Workflows Bertram Ludäscher Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego.
CSE-291: Ontologies in Data Integration Department of Computer Science & Engineering University of California, San Diego CSE-291: Ontologies in Data Integration.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Interlib Technology Integration Reagan.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
An Extensible Model-Based Mediator System with Domain Maps Amarnath Gupta * Bertram Ludäscher * Maryann E. Martone + * San Diego Supercomputer Center (SDSC)
Presented by Kyumars Sheykh Esmaili Description Logics for Data Bases (DLHB,Chapter 16) Semantic Web Seminar.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center KNOW-ME (KNOWledge-Map-Explorer) Semantic Browsing of Integrated.
Ontology Technology applied to Catalogues Paul Kopp.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Collection Based Persistent Archives
UCSD Neuron-Centered Database
Abstract descriptions of systems whose requirements are being analysed
Interlib Technology Integration
Model Based Mediation With Domain Maps ___________________________
Ontologies: Introduction and Some Uses
Presentation transcript:

Model-Based Mediation with Domain Maps Bertram Ludäscher * Amarnath Gupta * Maryann E. Martone + * San Diego Supercomputer Center (SDSC) + National Center for Microscopy and Imaging Research (NCMIR) University of California, San Diego (UCSD)

Overview Motivation –Problem with current Mediator Architecture –Complex Scientific Multiple-World Scenarios Model-Based Mediation Architecture –Lifting from XML to level of Conceptual Models (CMs) Formal Framework –Domain Maps (DMs) –Generic Conceptual Model GCM –Integrated View Definition Example Query Evaluation Open Issues

A Standard Mediator Architecture (MIX -- Mediation of Information using XML, SDSC/UCSD) MIX MEDIATOR INTEGRATED VIEWUSER-Query Data Sources DB Files WWW Lab1Lab2Lab3 Wrapper XML Q/A XML Integrated View Definition XMAS/XQuery XML Q/A

The Problem: Complex Multiple-World Scenarios Current Integration Issues –Structural/Schema Conflicts common semistructured data model (XML) schema transformations/integration (XML queries & transforms) – Limited Query Capabilities capability based rewriting (e.g., TSIMMIS) –... BUT scenarios are “one-world” ( amazon.com vs. bn.com ) or simple multiple world ( home buyer ) Problem: No Support for Semantic Mediation –“complex multiple-world” scenarios (Neuroscience, Geoscience): complex, disjoint, seemingly unrelated data “hidden semantics” in complex, indirect relationships

A Neuroscience Question What is the cerebellar distribution of rat proteins with more than 70% homology with human NCS-1? Any structure specificity? How about other rodents? protein localization (NCMIR) Wrapper neurotransmission (SENSELAB) Wrapper morphometry (SYNAPSE) Wrapper ??? Integrated View ??? ???Mediator ??? ??? Integrated View Definition ???

Hidden Semantics: Protein Localization (NCMIR) RyR …. spine 0 branchlet 30 Molecular layer of Cerebellar Cortex Purkinje Cell layer of Cerebellar Cortex Fragment of dendrite

Hidden Semantics: Morphometry (SNYAPSE) … … Branch level beyond 4 is a branchlet Must be dendritic because Purkinje cells don’t have somatic spines

Approach: Model-Based Mediation Complex Multiple Worlds Integration Problem –terms not directly joinable –complex, indirect associations –unstated, “hidden” semantics (not just schema conflicts) Missing “Semantic Link” => how to define complex, indirect semantic links? => lift mediation to the level of conceptual models (CMs) => domain expert’s knowledge formalized as rules over CMs => Model-Based Mediation

XML-Based vs. Model-Based Mediation IF  THEN  Logical Domain Constraints Integrated-CM := CM-QL(Src1-CM,...) (XML) Objects Conceptual Models C2 C3 C1 R Classes, Relations, is-a, has-a,... DOMAIN MAP Raw Data XML Elements XML Models Integrated-DTD := XQuery(Src1-DTD,...) No Domain Constraints A = (B*|C),D B =... Structural Constraints (DTDs), Parent, Child, Sibling,...

Extended Mediator Architecture Wrappers export Conceptual Models (CMs) –facts & rules for classes, relationships, ICs,... –source data is “put into context” (“aboutness” index) by linking to domain maps (DMs) Mediator employs CMs and DMs –... to define complex semantic relationships on the formalized domain knowledge Generic Conceptual Model (GCM) –as a common target CM –minimal requirements/core expressions: instance(O,C), subclass(C1,C2) method_type(C,M,C’), method_value(O,M,R) relation_type(R,A1/C1,...,An/Cn) relation_value(R,a1,...,an) Expressiveness, Extensibility –allow inductive properties (inheritance, closures,...) –employ a declarative rule language (e.g. F-Logic)

Model-Based Mediator Architecture USER/Client USER/Client S1 S2 S3 XML-Wrapper CM-Wrapper XML-Wrapper CM-Wrapper XML-Wrapper CM-Wrapper GCM CM S1 GCM CM S2 GCM CM S3 CM (Integrated View) Mediator Engine FL rule proc. LP rule proc. Graph proc. XSB Engine Domain Map DM Integrated View Definition IVD Logic API (capabilities) CM Queries & Results (exchanged in XML) CM Plug-In

Formalizing Domain Knowledge: Domain Map for SYNAPSE and NCMIR A domain map comprises Description Logic facts... - concepts ("classes") - roles ("associations") derived properties expressed as logic rules - (e.g. F-logic) domain map Purkinje cells and Pyramidal cells have dendrites that have higher-order branches that contain spines. Dendritic spines are ion (calcium) regulating components. Spines have ion binding proteins. Neurotransmission involves ionic activity (release). Ion-binding proteins control ion activity (propagation) in a cell. Ion-regulating components of cells affect ionic activity (release). domain expert knowledge equivalent Description Logic facts

Domain Map Refinement In addition to registering (“hanging off”) data, a source may also refine the mediator’s domain map source can register new concepts at the mediator...

Definition of Integrated Views (Deja Vu?)... XML/CM-2-FL Translators <!ELEMENT Study (study_id, … animal, experiments, experimenters> <!ELEMENT experiment (description, instrument, parameters)> studyDB[studies =>> study]. study[study_id => string; … animal => animal; experiments =>> experiment; experimenters =>> string]. … Specification of Domain Knowledge Subclasses Data Classification Integrity Constraints mushroom_spine :: spine DERIVE S:mushroom_spine FROM S:spine[head  _; neck  _]. ic1(S):ALERT[type  “invalid spine”; object  S] IF S:spine[undef ->> {head, neck}].

... Definition of Integrated Views (Multiple Sources) Integrated View Definition Schema Reasoning & Dynamic Classes taxon[subspecies  string; species  string; genus  string; … phylum  string; kingdom  string; superkingdom  string]. subspecies::species::genus:: … kingdom::superkingdom TAXON Rank Hierarchy DERIVE T:TR, TR::TR1 FROM T: ‘TAXON’.taxon[Taxon_Rank  TR, Taxon_Rank1  TR1], Taxon_Rank::Taxon_Rank1. Create Classes from TAXON data DERIVE protein_distribution(Protein, Organism,Brain_region,Feature_name,Anatom,Value) FROM I:protein_label_image[ proteins ->> {Protein}; organism -> Organism; anatomical_structures ->> {AS:anatomical_structure[name->Anatom]}], % from PROLAB AS..segments..features[name->Feature_name; value->Value], NAE:neuro_anatomic_entity[name-> Anatom; % from ANATOM located_in->>{Brain_region}]. TAXON DB Schema

Query Evaluation Example push X1 := select output from parallel fiber ; determine source X2 := “hang off” X1 from Domain Map; compute region of interest (here: downward X3 := subregion-closure(X2); push X4 := select PROT-data(X3, Ryanodine Receptors); compute protein X5 := compute aggregate(X4); "How does the parallel fiber output (Yale/SENSELAB) relate to the distribution of Ryanodine Receptors (UCSD/NCMIR)?"

ANATOM Domain Map with Registered Data ANATOM DATA

Deductive Closure of “has_a” with “tc(is_a)”: (YES -- Real Recursive Views!! ;-) ANATOM CLOSURE

Interactive Queries KIND01

Resulting Sub DOMAIN MAP “Browser” PROTLOC

Computed Protein Localization Data PROTLOC

Client-Side Result Visualization (using AxioMap Viewer: Ilya Zaslavsky) PROTLOC-AxioMap

Comparison & Summary: Model-Based Mediation

Conclusions and Outlook Model-based Mediation Architecture –for complex multiple worlds scenarios (Neuroscience,...) –sources export CMs (data “lifted” to conceptual level) –mediator employs DMs (“semantic road map”) Simple Prototype based on XSB/FLORA –source and result data situated in DM context –domain scientists are excited... Some Open Issues –striking the right balance between complexity and expressiveness of DMs (e.g. subsumption and satisfiability of DMs should be decidable) –query processing/optimization –modeling query capabilities –semantic annotation tools for “dumb” sources –re-implement... *sigh*... –...

ADDITIONAL MATERIAL STARTS HERE

ANATOM Domain Map ANATOM

Model-Based Mediation with DOMAIN MAPS (DMs) Integrated-CM(Z1,...) := get X1,... from Src1; get X2,... from Src2; LINK (Xi, Yj); Zj = CM-QL(X1,...,Y1,...) LINK(X,Y): X.zip = Y.zip X.addr in Y.zip X.zip overlaps Y.county... “Semantic Road Maps” for situating source data => navigational aid (browsing source classes at the conceptual level) => basis for integrated views across multiple worlds => link points (concepts) and labeled arcs (roles) => formal semantics (in FL and/or DLs) Example: ANATOM DM = antatomical entities (concepts) + is_a, has_a, overlaps,... (roles) => from syntactic equality to semantic joins

Example Query Evaluation (I) Example: protein_distribution –given: organism, protein, brain_region –ANATOM DM: recursively traverse the has_a_star paths under brain_region collect all anatomical_entities –Source PROLAB: join with anatomical structures and collect the value of attribute “image.segments.features.feature.protein_amount” where “image.segments.features.feature.protein_name” = protein and “study_db.study.animal.name” = organism –Mediator: aggregate over all parents up to brain_region report distribution

Interactive Queries KIND

Summary & Outlook: Federation of Brain Data CCBCCB, Montana SU Surface atlas, Van Essen LabVan Essen Lab NCMIRNCMIR, UCSD stereotaxic atlas LONILONI MCell, CNL, SalkCNL ANATOM PROTLOC ResultResult (VML) ResultResult (XML/XSLT)  MODEL-BASED Mediation