Presentation is loading. Please wait.

Presentation is loading. Please wait.

Knowledge-Based Integration of Neuroscience Data Sources Amarnath Gupta Bertram Ludäscher Maryann Martone University of California San Diego.

Similar presentations


Presentation on theme: "Knowledge-Based Integration of Neuroscience Data Sources Amarnath Gupta Bertram Ludäscher Maryann Martone University of California San Diego."— Presentation transcript:

1 Knowledge-Based Integration of Neuroscience Data Sources Amarnath Gupta Bertram Ludäscher Maryann Martone University of California San Diego

2 A Standard Information Mediation Framework Client Query Integrated XML View Data Source XML Data Source Data Source XML View Wrapper XML View XML View Mediator View Definition

3 A Neuroscience Question protein localization Cerebellar distribution of rat proteins with more than 70% homology with human NCS-1? Any structure specificity? How about other rodents? Integrated View Mediator View Definition morphometryneurotransmission WWW CaBP, Expasy Wrapper

4 Integration Issues Structural Heterogeneity –Resolved by converting to common semistructured data model Heterogeneity in Query Capabilities –Resolved by writing wrappers with binding patterns and other capability-definition languages Semantic Heterogeneity –Schema conflicts Partially resolved by mapping rules in the mediator –Hidden Semantics?

5 Hidden Semantics:Protein Localization RyR …. spine 0 branchlet 30 Molecular layer of Cerebellar Cortex Purkinje Cell layer of Cerebellar Cortex Fragment of dendrite

6 Hidden Semantics: Morphometry … 12.348 1.93 4.47 9.884 7.930 4.47 1.79 … Branch level beyond 4 is a branchlet Must be dendritic because Purkinje cells don’t have somatic spines

7 The Problem Multiple Worlds Integration –compatible terms not directly joinable –complex, indirect associations among schema elements –unstated integrity constraints Why not use ontologies? –typical ontologies associate terms along limited number of dimensions What’s needed –a “theory” under which non-identical terms can be “semantically” joined

8 Our Approach Modify the standard Mediation Architecture –Wrapper Extend to encode an object-version of the structure schema –Mediator Redesign to incorporate auxiliary knowledge sources to –Correlate object schema of sources –Define additional objects not specified but derivable from sources At the Mediator –Use a logic engine to Encode the mapping rules between sources Define integrated views using a combination of exported objects from source and the auxiliary knowledge sources Perform query decomposition We still use Global-as-View form of mediation

9 The KIND Architecture View Definition Rules Logic EngineIntegration Logic Schema of Registered Sources Integrated User View Auxiliary Knowledge Source 1 Auxiliary Knowledge Source 2 Object Wrapper Structure Wrapper Object Wrapper Structure Wrapper Src 1 Src 2 Materialized Views

10 The Knowledge-Base Situate every data object in its anatomical context –An illustrationAn illustration –New data is registered with the knowledge-basedata is registered with the knowledge-base –Insertion of new data reconciles the current knowledge- base with the new information by: Indexing the data with the source as part of registration Extending the knowledge-base Creating new views with complex rules to encode additional domain knowledge

11 F-Logic for the Mediation Engine Why F-Logic? –Provides the power of Datalog (with negation) and object creation through Skolem IDs –Correct amount of “notational sugar” and rules to provide object-oriented abstraction –Schema-level reasoning –Expressing variable arity F-Logic in KIND –Source schema wrapped into F-Logic schema –Knowledge-sources programmed in F-Logic –Definition of Integrated Views

12 Wrapping into Logic Objects Automated Part <!ELEMENT Study (study_id, … animal, experiments, experimenters> <!ELEMENT experiment (description, instrument, parameters)> studyDB[studies   study]. study[study_id  string; … animal  animal; experiments   experiment; experimenters   string]. … Non-automated Part Subclasses Rules Integrity Constraints mushroom_spine::spine S:mushroom_spine IF S:spine[head  _;neck  _]. ic1(S):alert[type  “invalid spine”; object S] IF S:spine[undef   {head, neck}].

13 Computing with Auxiliary Sources Creating Mediated Classes Reasoning with Schema animal[M  R] IF S:source, S.animal [M  R]. animal[taxon  ‘TAXON’.taxon]. X[taxon  T] IF X: ‘PROLAB’.animal[name  N], words(N,[W1,W2|_]), T: ‘TAXON’.taxon[genus  W1;species  W2]. union view association rule taxon[subspecies  string; species  string; genus  string; … phylum  string; kingdom  string; superkingdom  string]. Schema subspecies::species::genus:: … kingdom::superkingdomAt Mediator T:TR, TR::TR1 IF T: ‘TAXON’.taxon[Taxon_Rank  TR, Taxon_Rank1  TR1], Taxon_Rank::Taxon_Rank1. Class creation by schema reasoning

14 Integrated View Definition Views are defined between sources and knowledge base Example: protein_distribution –given: organism, protein, brain_region –KB Anatom: recursively traverse the has_a paths under brain_region collect all anatomical_entities –Source PROLAB: join with anatomical structures and collect the value of attribute “image.segments.features.feature.protein_amount” where “image.segments.features.feature.protein_name” = protein and “study_db.study.animal.name” = organism –Mediator: aggregate over all parents up to brain_region report distribution

15 a second integrated view Query Evaluation Example protein distribution of Human NCS-1 homologue –from wrapped CaBP website: get the amino acid sequence for human NCS-1 –from wrapped Expasy website: submit amino acid sequence, get ranked homologues –at Mediator: select homologues H found in rat, and homology > 0.70 –at Mediator: for each h in H –from previous view: »protein_distribution( rat, h, cerebellum, distribution) Construct resultresult

16 Implementation System –Flora as F-Logic Engine –Communicate with ODBC databases through underlying XSB Prolog –XML wrapping and Web querying through XMAS, our XML query language and custom-built wrappers Data –Human Brain Project sites –NPACI Neuroscience Thrust sites

17 Work in Progress Architecture –plug-in architecture for domain knowledge sources conceptual models from data sources Functionality –better handling of large data –operations expressive query language operators for domain knowledge manipulation –query evaluation query optimization using domain knowledge Demonstration –at VLDB 2000


Download ppt "Knowledge-Based Integration of Neuroscience Data Sources Amarnath Gupta Bertram Ludäscher Maryann Martone University of California San Diego."

Similar presentations


Ads by Google