1 Model-Based Information Integration in a Neuroscience Mediator System Bertram Ludaescher Amarnath Gupta Maryann E. Martone University of California San.

Slides:



Advertisements
Similar presentations
Three-Step Database Design
Advertisements

Semantic Interoperability & Semantic Models: Introduction
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Language Specification using Metamodelling Joachim Fischer Humboldt University Berlin LAB Workshop Geneva
D2I Project, Rome, October ARTEMIS The ARTEMIS prototype for the construction of reconciled views based on affinity evaluation and interactive.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Interactive Generation of Integrated Schemas Laura Chiticariu et al. Presented by: Meher Talat Shaikh.
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
New Approaches to GIS and Atlas Production Infrastructure for spatial data integration: across scales and projects Ilya Zaslavsky David Valentine San Diego.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
1 Lecture 13: Database Heterogeneity. 2 Outline Database Integration Wrappers Mediators Integration Conflicts.
Summary. Chapter 9 – Triggers Integrity constraints Enforcing IC with different techniques –Keys –Foreign keys –Attribute-based constraints –Schema-based.
Knowledge Mediation in the WWW based on Labelled DAGs with Attached Constraints Jutta Eusterbrock WebTechnology GmbH.
Amarnath Gupta Univ. of California San Diego. An Abstract Question There is no concrete answer …but …
Semantic Interoperability Jérôme Euzenat INRIA & LIG France Natasha Noy Stanford University USA.
Ontology Matching Basics Ontology Matching by Jerome Euzenat and Pavel Shvaiko Parts I and II 11/6/2012Ontology Matching Basics - PL, CS 6521.
Modeling Interactive Web Sources for Information Mediation Information Mediation Framework/Motivation Modeling Interactive Sources with Interaction Diagrams.
Research Topics in Computing Data Modelling for Data Schema Integration 1 March 2005 David George.
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. Towards Translating between XML and WSML based on mappings between.
San Diego Supercomputer Center EDBT'02, Prague 1 EDBT Panel, March 2002, Prague: Scientific Data Integration for Complex Multiple-Worlds Scenarios: Databases.
National Partnership for Advanced Computational Infrastructure Digital Library Architecture Reagan Moore Chaitan Baru Amarnath Gupta George Kremenek Bertram.
Survey of Semantic Annotation Platforms
The Semantic Web Service Shuying Wang Outline Semantic Web vision Core technologies XML, RDF, Ontology, Agent… Web services DAML-S.
Brain Data & Knowledge Grid (or: Towards Services for Knowledge-Based Mediation of Neuroscience Information Sources) National Center for Microscopy and.
Model Based Mediation With Domain Maps ___________________________ Xiaosen Li Guanrao William
GEON AHM, April 16-18, SDSC C YBERINFRASTRUCTURE FOR THE G EOSCIENCES Towards Semantic Mediation for GEON: Facilitating Scientific Data Integration using.
CSE-291: Ontologies in Data & Process Integration Department of Computer Science & Engineering University of California, San Diego CSE-291: Ontologies.
Mobile Topic Maps for e-Learning John McDonald & Darina Dicheva Intelligent Information Systems Group Computer Science Department Winston-Salem State University,
Categories of Vocabulary Compatibility Dmitry Lenkov Oracle.
1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University.
San Diego Supercomputer Center University of California, San Diego The MIX Project Native XML Database XML View(s) Wrappers export: 1. Schemas & Metadata.
What is MOF? The Meta Object Facility (MOF) specification provides a set of CORBA interfaces that can be used to define and manipulate a set of interoperable.
Scientific Data Integration with Model-Based Mediation : Databases Meets * Knowledge Representation Bertram Ludäscher Bertram
Navigation-Driven Evaluation of Virtual Mediated Views Bertram Ludäscher, SDSC/UCSD Yannis Papakonstantinou, UCSD Pavel Velikhov, UCSD Overview Mediator.
Model-Based Mediation: Framework and Challenges Bertram Ludäscher Data and Knowledge Systems San Diego Supercomputer Center U.C. San.
1 Lessons from the TSIMMIS Project Yannis Papakonstantinou Department of Computer Science & Engineering University of California, San Diego.
Dimitrios Skoutas Alkis Simitsis
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009.
Rule-Based Programming for VORBs Bertram Ludaescher Arcot Rajasekar Data and Knowledge Systems San Diego Supercomputer Center U.C. San Diego.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
Web-site Building Methodologies Current Research.
From Data Integration To Semantic Mediation: Addressing Heterogeneities in Data Bertram Ludäscher Bertram Ludäscher Knowledge-Based Information.
Chapter 2 Database System Concepts and Architecture Dr. Bernard Chen Ph.D. University of Central Arkansas.
INCF Digital Atlasing Infrastructure: An Overview.
Knowledge-Based Integration of Neuroscience Data Sources Amarnath Gupta Bertram Ludäscher Maryann Martone University of California San Diego.
Working with Ontologies Introduction to DOGMA and related research.
From Database Federation to Model-Based Mediation: Databases Meets * Knowledge Representation Bertram Ludäscher Data and Knowledge Systems.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Model-Based Mediation with Domain Maps Bertram Ludäscher * Amarnath Gupta * Maryann E. Martone + * San Diego Supercomputer Center (SDSC) + National Center.
C++ Inheritance Data Structures & OO Development I 1 Computer Science Dept Va Tech June 2007 © McQuain Generalization versus Abstraction Abstraction:simplify.
Semantic Mediation and Scientific Workflows Bertram Ludäscher Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego.
CSE-291: Ontologies in Data Integration Department of Computer Science & Engineering University of California, San Diego CSE-291: Ontologies in Data Integration.
Working with XML. Markup Languages Text-based languages based on SGML Text-based languages based on SGML SGML = Standard Generalized Markup Language SGML.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Interlib Technology Integration Reagan.
An Extensible Model-Based Mediator System with Domain Maps Amarnath Gupta * Bertram Ludäscher * Maryann E. Martone + * San Diego Supercomputer Center (SDSC)
WonderWeb. Ontology Infrastructure for the Semantic Web. IST Project Review Meeting, 11 th March, WP2: Tools Raphael Volz Universität.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center KNOW-ME (KNOWledge-Map-Explorer) Semantic Browsing of Integrated.
Chapter 2 Database System Concepts and Architecture
Collection Based Persistent Archives
UCSD Neuron-Centered Database
Chapter 2 Database Environment Pearson Education © 2009.
Interlib Technology Integration
Database Systems Instructor Name: Lecture-3.
Model Based Mediation With Domain Maps ___________________________
Ontologies: Introduction and Some Uses
Chapter 2 Database Environment Pearson Education © 2009.
Presentation transcript:

1 Model-Based Information Integration in a Neuroscience Mediator System Bertram Ludaescher Amarnath Gupta Maryann E. Martone University of California San Diego

VLDB2000, Cairo 2 A Standard Mediator Architecture (MIX -- Mediation of Information using XML) MIX MEDIATOR INTEGRATED VIEW USER-Query Data Sources DB Files WWW Lab1Lab2Lab3 Wrapper XML Q/A XML Integrated View Definition XML Q/A

VLDB2000, Cairo 3 Integration Issues SEMANTIC Integration ??? SYNTACTIC/STRUCTURAL Integration Integrated Views (Src-XML => Intgr-XML) Schema Integration (DTD =>DTD) Wrapping, Data Extraction (Text => XML) MIX Mediation of Information using XML SYSTEM Integration SRB/MCAT TCP/IP HTTP CORBA storage, query capabilities protocols & services Distributed Query Processing

VLDB2000, Cairo 4 Integration Issues: Mediating across Multiple-Worlds Structural Integration => common semistructured data model (XML) => XML queries & transformations to resolve schema conflicts Limited Query Capabilities => mediator is aware of QCs exported by wrappers... Semantic Integration –most work deals with issues for “one-world” scenarios (e.g., amazon.com vs. bn.com) –what if data comes from a “multiple-world” scenario (like Neuroscience), where data objects from different sources are not even similar, and only the hidden semantics (known to the domain expert) provides the “semantic link”?

VLDB2000, Cairo 5 A Neuroscience Question protein localization What is the cerebellar distribution of rat proteins with more than 70% homology with human NCS-1? Any structure specificity? How about other rodents? ??? Integrated View ??? ???Mediator ??? morphometry neurotransmission Web CaBP, Expasy Wrapper ??? Integrated View Definition ???

VLDB2000, Cairo 6 Hidden Semantics: Protein Localization RyR …. spine 0 branchlet 30 Molecular layer of Cerebellar Cortex Purkinje Cell layer of Cerebellar Cortex Fragment of dendrite

VLDB2000, Cairo 7 Hidden Semantics: Morphometry … … Branch level beyond 4 is a branchlet Must be dendritic because Purkinje cells don’t have somatic spines

VLDB2000, Cairo 8 The Problem Multiple Worlds Integration –compatible terms not directly joinable –complex, indirect associations among schema elements –unstated integrity constraints Why not just use Ontologies? –typical ontologies associate terms along limited number of dimensions What’s needed? –a “theory” under which non-identical terms can be “semantically joined” => lift mediation to the level of conceptual models (CMs) => domain knowledge, ICs become rules over CMs => Model-Based Mediation

VLDB2000, Cairo 9 XML-Based vs. Model-Based Mediation Raw Data IF  THEN  Logical Domain Constraints Integrated-CM := CM-QL(Src1-CM,...) (XML) Objects Conceptual Models XML Elements XML Models C2 C3 C1 R Classes, Relations, is-a, has-a,... DOMAIN MAP Integrated-DTD := XML-QL(Src1-DTD,...) No Domain Constraints A = (B*|C),D B =... Structural Constraints (DTDs), Parent, Child, Sibling,...

VLDB2000, Cairo 10 Extended Mediator Architecture => Wrappers export Conceptual Models (CMs), i.e., facts+rules for classes, relationships, ICs,... ) => Mediator imports CMs (from sources, auxiliary knowledge bases, and domain maps (DMs) => a generic conceptual model (GCM, a subset of F-logic), extensible via rules = common target CM language => new CMs can be plugged-in by specifying them in GCM + F-logic rules => prototype implementation in FLORA: global-as-view approach compiler: F-logic => XSB-Prolog top-down evaluation => virtual (demand-driven) views external interfaces (XML, RDBs, DM visualization,...)

VLDB2000, Cairo 11 Model-Based Mediator Architecture USER/Client USER/Client S1 S2 S3 XML-Wrapper CM-Wrapper XML-Wrapper CM-Wrapper XML-Wrapper CM-Wrapper GCM CM S1 GCM CM S2 GCM CM S3 CM (Integrated View) Mediator Engine FL rule proc. LP rule proc. Graph proc. XSB Engine Domain Map DM Integrated View Definition IVD Logic API (capabilities) CM Queries & Results (exchanged in XML) CM Plug-In

VLDB2000, Cairo 12 Definition of Integrated Views... XML-2-FL and CM-2-FL Translators <!ELEMENT Study (study_id, … animal, experiments, experimenters> <!ELEMENT experiment (description, instrument, parameters)> studyDB[studies =>> study]. study[study_id => string; … animal => animal; experiments =>> experiment; experimenters =>> string]. … Specification of Domain Knowledge Subclasses Rules Integrity Constraints Integrated View Definition mushroom_spine :: spine S:mushroom_spine IF S:spine[head  _; neck  _]. ic1(S):alert[type  “invalid spine”; object  S] IF S:spine[undef ->> {head, neck}]. protein_distribution(Protein, Organism, Brain_region, Feature_name, Anatom, Value) IF I:protein_label_image[ proteins ->> {Protein}; organism -> Organism; anatomical_structures ->> {AS:anatomical_structure[name->Anatom]}], NAE:neuro_anatomic_entity[name->Anatom; loccated_in->>{Brain_region}], AS..segments..features[name->Feature_name; value->Value].

VLDB2000, Cairo Definition of Integrated Views (Multiple Sources) Creating Mediated Classes Reasoning with Schema animal[M  R] IF S:source, S.animal [M  R]. X[taxon  T] IF X: ‘PROLAB’.animal[name  N], words(N,[W1,W2|_]), T: ‘TAXON’.taxon[genus  W1;species  W2]. union over all classes association rule taxon[subspecies  string; species  string; genus  string; … phylum  string; kingdom  string; superkingdom  string]. Schema subspecies::species::genus:: … kingdom::superkingdom At Mediator T:TR, TR::TR1 IF T: ‘TAXON’.taxon[Taxon_Rank  TR, Taxon_Rank1  TR1], Taxon_Rank::Taxon_Rank1. Class creation by schema reasoning

VLDB2000, Cairo 14 Model-Based Mediation with DOMAIN MAPS (DMs) Integrated-CM(Z1,...) := get X1,... from Src1; get X2,... from Src2; LINK (Xi, Yj); Zj = CM-QL(X1,...,Y1,...) LINK(X,Y): X.zip = Y.zip X.addr in Y.zip X.zip overlaps Y.county... “Semantic Road Maps” for situating source data => navigational aid (browsing source classes at the conceptual level) => basis for integrated views across multiple worlds => link points (concepts) and labeled arcs (roles) => formal semantics (in FL and/or DLs) Example: ANATOM DM = antatomical entities (concepts) + is_a, has_a, overlaps,... (roles) => from syntactic equality to semantic joins

VLDB2000, Cairo 15 ANATOM Domain Map ANATOM

VLDB2000, Cairo 16 ANATOM Domain Map with Registered Data ANATOM DATA

VLDB2000, Cairo 17 Deductive Closure of “has_a” with “tc(is_a)”: (YES -- Real Recursive Views!! ;-) ANATOM CLOSURE

VLDB2000, Cairo 18 Example Query Evaluation (I) Example: protein_distribution –given: organism, protein, brain_region –ANATOM DM: recursively traverse the has_a_star paths under brain_region collect all anatomical_entities –Source PROLAB: join with anatomical structures and collect the value of attribute “image.segments.features.feature.protein_amount” where “image.segments.features.feature.protein_name” = protein and “study_db.study.animal.name” = organism –Mediator: aggregate over all parents up to brain_region report distribution

VLDB2000, Cairo 19 Interactive Queries (I) KIND

VLDB2000, Cairo 20 Example Query Evaluation X1 := select output from parallel fiber X2 := “hang off” X1 from Domain X3 := X4 := select PROT-data(X3, Ryanodine X5 := compute aggregate(X4); "How does the parallel fiber output (Yale/SENSELAB) relate to the distribution of Ryanodine Receptors (UCSD/NCMIR)?"

VLDB2000, Cairo 21 Interactive Queries (II) KIND01

VLDB2000, Cairo 22 Resulting Sub DOMAIN MAP “Browser” PROTLOC

VLDB2000, Cairo 23 Computed Protein Localization Data PROTLOC

VLDB2000, Cairo 24 Client-Side Result Visualization (using AxioMap Viewer: Ilya Zaslavsky) PROTLOC-AxioMap

VLDB2000, Cairo 25 Summary & Outlook: Federation of Brain Data CCBCCB, Montana SU Surface atlas, Van Essen LabVan Essen Lab NCMIRNCMIR, UCSD stereotaxic atlas LONILONI MCell, CNL, SalkCNL ANATOM PROTLOC ResultResult (VML) ResultResult (XML/XSLT)  MODEL-BASED Mediation