Knowledge-Based Integration of Neuroscience Data Sources Amarnath Gupta Bertram Ludäscher Maryann Martone University of California San Diego.

Slides:



Advertisements
Similar presentations
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Advertisements

National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Intelligent Technologies Module: Ontologies and their use in Information Systems Revision lecture Alex Poulovassilis November/December 2009.
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
Page 1 Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Integrating Multiple Data Sources using a Standardized XML.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
A Probabilistic Framework for Information Integration and Retrieval on the Semantic Web by Livia Predoiu, Heiner Stuckenschmidt Institute of Computer Science,
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 8 Slide 1 System models.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
XML Views El Hazoui Ilias Supervised by: Dr. Haddouti Advanced XML data management.
Research on Intelligent Information Systems Himanshu Gupta Michael Kifer Annie Liu C.R. Ramakrishnan I.V. Ramakrishnan Amanda Stent David Warren Anita.
Advanced Topics COMP163: Database Management Systems University of the Pacific December 9, 2008.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
Modified from Sommerville’s originalsSoftware Engineering, 7th edition. Chapter 8 Slide 1 System models.
Page 1 Multidatabase Querying by Context Ramon Lawrence, Ken Barker Multidatabase Querying by Context.
The University of Akron Dept of Business Technology Computer Information Systems Database Management Approaches 2440: 180 Database Concepts Instructor:
Automatic Data Ramon Lawrence University of Manitoba
Ontology translation: two approaches Xiangkui Yao OntoMorph: A Translation System for Symbolic Knowledge By: Hans Chalupsky Ontology Translation on the.
Semantic Mediation & OWS 8 Glenn Guempel
Knowledge Mediation in the WWW based on Labelled DAGs with Attached Constraints Jutta Eusterbrock WebTechnology GmbH.
Amarnath Gupta Univ. of California San Diego. An Abstract Question There is no concrete answer …but …
Chapter 10 Architectural Design
Modeling Interactive Web Sources for Information Mediation Information Mediation Framework/Motivation Modeling Interactive Sources with Interaction Diagrams.
A Unified Framework for the Semantic Integration of XML Databases
CS848: Topics in Databases: Foundations of Query Optimization Topics Covered  Databases  QL  Query containment  More on QL.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 7 Slide 1 System models l Abstract descriptions of systems whose requirements are being.
Chapter 4 System Models A description of the various models that can be used to specify software systems.
San Diego Supercomputer Center EDBT'02, Prague 1 EDBT Panel, March 2002, Prague: Scientific Data Integration for Complex Multiple-Worlds Scenarios: Databases.
System models Abstract descriptions of systems whose requirements are being analysed Abstract descriptions of systems whose requirements are being analysed.
National Partnership for Advanced Computational Infrastructure Digital Library Architecture Reagan Moore Chaitan Baru Amarnath Gupta George Kremenek Bertram.
Brain Data & Knowledge Grid (or: Towards Services for Knowledge-Based Mediation of Neuroscience Information Sources) National Center for Microscopy and.
Model Based Mediation With Domain Maps ___________________________ Xiaosen Li Guanrao William
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Information System Development Courses Figure: ISD Course Structure.
EU Project proposal. Andrei S. Lopatenko 1 EU Project Proposal CERIF-SW Andrei S. Lopatenko Vienna University of Technology
1 Lessons from the TSIMMIS Project Yannis Papakonstantinou Department of Computer Science & Engineering University of California, San Diego.
Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.
Dimitrios Skoutas Alkis Simitsis
Chapter 7 System models.
System models l Abstract descriptions of systems whose requirements are being analysed.
Pertemuan 19 PEMODELAN SISTEM Matakuliah: D0174/ Pemodelan Sistem dan Simulasi Tahun: Tahun 2009.
Modified by Juan M. Gomez Software Engineering, 6th edition. Chapter 7 Slide 1 Chapter 7 System Models.
Sommerville 2004,Mejia-Alvarez 2009Software Engineering, 7th edition. Chapter 8 Slide 1 System models.
ICS (072)Database Systems: An Introduction & Review 1 ICS 424 Advanced Database Systems Dr. Muhammad Shafique.
1 Model-Based Information Integration in a Neuroscience Mediator System Bertram Ludaescher Amarnath Gupta Maryann E. Martone University of California San.
Interoperability & Knowledge Sharing Advisor: Dr. Sudha Ram Dr. Jinsoo Park Kangsuk Kim (former MS Student) Yousub Hwang (Ph.D. Student)
DDBMS Distributed Database Management Systems Fragmentation
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Database Environment Chapter 2. Data Independence Sometimes the way data are physically organized depends on the requirements of the application. Result:
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Scaling Heterogeneous Databases and Design of DISCO Anthony Tomasic Louiqa Raschid Patrick Valduriez Presented by: Nazia Khatir Texas A&M University.
Information Integration BIRN supports integration across complex data sources – Can process wide variety of structured & semi-structured sources (DBMS,
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
Ch- 8. Class Diagrams Class diagrams are the most common diagram found in modeling object- oriented systems. Class diagrams are important not only for.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Model-Based Mediation with Domain Maps Bertram Ludäscher * Amarnath Gupta * Maryann E. Martone + * San Diego Supercomputer Center (SDSC) + National Center.
Semantic Mediation and Scientific Workflows Bertram Ludäscher Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego.
CSE-291: Ontologies in Data Integration Department of Computer Science & Engineering University of California, San Diego CSE-291: Ontologies in Data Integration.
Raluca Paiu1 Semantic Web Search By Raluca PAIU
An Extensible Model-Based Mediator System with Domain Maps Amarnath Gupta * Bertram Ludäscher * Maryann E. Martone + * San Diego Supercomputer Center (SDSC)
Class Diagrams. Terms and Concepts A class diagram is a diagram that shows a set of classes, interfaces, and collaborations and their relationships.
1 Chapter 2 Database Environment Pearson Education © 2009.
 To explain why the context of a system should be modelled as part of the RE process  To describe behavioural modelling, data modelling and object modelling.
Presented by Kyumars Sheykh Esmaili Description Logics for Data Bases (DLHB,Chapter 16) Semantic Web Seminar.
National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center KNOW-ME (KNOWledge-Map-Explorer) Semantic Browsing of Integrated.
Welcome: To the fifth learning sequence “ Data Models “ Recap : In the previous learning sequence, we discussed The Database concepts. Present learning:
Engineering, 7th edition. Chapter 8 Slide 1 System models.
UCSD Neuron-Centered Database
Query Optimization.
Ontologies: Introduction and Some Uses
Presentation transcript:

Knowledge-Based Integration of Neuroscience Data Sources Amarnath Gupta Bertram Ludäscher Maryann Martone University of California San Diego

A Standard Information Mediation Framework Client Query Integrated XML View Data Source XML Data Source Data Source XML View Wrapper XML View XML View Mediator View Definition

A Neuroscience Question protein localization Cerebellar distribution of rat proteins with more than 70% homology with human NCS-1? Any structure specificity? How about other rodents? Integrated View Mediator View Definition morphometryneurotransmission WWW CaBP, Expasy Wrapper

Integration Issues Structural Heterogeneity –Resolved by converting to common semistructured data model Heterogeneity in Query Capabilities –Resolved by writing wrappers with binding patterns and other capability-definition languages Semantic Heterogeneity –Schema conflicts Partially resolved by mapping rules in the mediator –Hidden Semantics?

Hidden Semantics:Protein Localization RyR …. spine 0 branchlet 30 Molecular layer of Cerebellar Cortex Purkinje Cell layer of Cerebellar Cortex Fragment of dendrite

Hidden Semantics: Morphometry … … Branch level beyond 4 is a branchlet Must be dendritic because Purkinje cells don’t have somatic spines

The Problem Multiple Worlds Integration –compatible terms not directly joinable –complex, indirect associations among schema elements –unstated integrity constraints Why not use ontologies? –typical ontologies associate terms along limited number of dimensions What’s needed –a “theory” under which non-identical terms can be “semantically” joined

Our Approach Modify the standard Mediation Architecture –Wrapper Extend to encode an object-version of the structure schema –Mediator Redesign to incorporate auxiliary knowledge sources to –Correlate object schema of sources –Define additional objects not specified but derivable from sources At the Mediator –Use a logic engine to Encode the mapping rules between sources Define integrated views using a combination of exported objects from source and the auxiliary knowledge sources Perform query decomposition We still use Global-as-View form of mediation

The KIND Architecture View Definition Rules Logic EngineIntegration Logic Schema of Registered Sources Integrated User View Auxiliary Knowledge Source 1 Auxiliary Knowledge Source 2 Object Wrapper Structure Wrapper Object Wrapper Structure Wrapper Src 1 Src 2 Materialized Views

The Knowledge-Base Situate every data object in its anatomical context –An illustrationAn illustration –New data is registered with the knowledge-basedata is registered with the knowledge-base –Insertion of new data reconciles the current knowledge- base with the new information by: Indexing the data with the source as part of registration Extending the knowledge-base Creating new views with complex rules to encode additional domain knowledge

F-Logic for the Mediation Engine Why F-Logic? –Provides the power of Datalog (with negation) and object creation through Skolem IDs –Correct amount of “notational sugar” and rules to provide object-oriented abstraction –Schema-level reasoning –Expressing variable arity F-Logic in KIND –Source schema wrapped into F-Logic schema –Knowledge-sources programmed in F-Logic –Definition of Integrated Views

Wrapping into Logic Objects Automated Part <!ELEMENT Study (study_id, … animal, experiments, experimenters> <!ELEMENT experiment (description, instrument, parameters)> studyDB[studies   study]. study[study_id  string; … animal  animal; experiments   experiment; experimenters   string]. … Non-automated Part Subclasses Rules Integrity Constraints mushroom_spine::spine S:mushroom_spine IF S:spine[head  _;neck  _]. ic1(S):alert[type  “invalid spine”; object S] IF S:spine[undef   {head, neck}].

Computing with Auxiliary Sources Creating Mediated Classes Reasoning with Schema animal[M  R] IF S:source, S.animal [M  R]. animal[taxon  ‘TAXON’.taxon]. X[taxon  T] IF X: ‘PROLAB’.animal[name  N], words(N,[W1,W2|_]), T: ‘TAXON’.taxon[genus  W1;species  W2]. union view association rule taxon[subspecies  string; species  string; genus  string; … phylum  string; kingdom  string; superkingdom  string]. Schema subspecies::species::genus:: … kingdom::superkingdomAt Mediator T:TR, TR::TR1 IF T: ‘TAXON’.taxon[Taxon_Rank  TR, Taxon_Rank1  TR1], Taxon_Rank::Taxon_Rank1. Class creation by schema reasoning

Integrated View Definition Views are defined between sources and knowledge base Example: protein_distribution –given: organism, protein, brain_region –KB Anatom: recursively traverse the has_a paths under brain_region collect all anatomical_entities –Source PROLAB: join with anatomical structures and collect the value of attribute “image.segments.features.feature.protein_amount” where “image.segments.features.feature.protein_name” = protein and “study_db.study.animal.name” = organism –Mediator: aggregate over all parents up to brain_region report distribution

a second integrated view Query Evaluation Example protein distribution of Human NCS-1 homologue –from wrapped CaBP website: get the amino acid sequence for human NCS-1 –from wrapped Expasy website: submit amino acid sequence, get ranked homologues –at Mediator: select homologues H found in rat, and homology > 0.70 –at Mediator: for each h in H –from previous view: »protein_distribution( rat, h, cerebellum, distribution) Construct resultresult

Implementation System –Flora as F-Logic Engine –Communicate with ODBC databases through underlying XSB Prolog –XML wrapping and Web querying through XMAS, our XML query language and custom-built wrappers Data –Human Brain Project sites –NPACI Neuroscience Thrust sites

Work in Progress Architecture –plug-in architecture for domain knowledge sources conceptual models from data sources Functionality –better handling of large data –operations expressive query language operators for domain knowledge manipulation –query evaluation query optimization using domain knowledge Demonstration –at VLDB 2000