Download presentation
Presentation is loading. Please wait.
Published byClaud Howard Modified over 8 years ago
1
1 Model-Based Information Integration in a Neuroscience Mediator System Bertram Ludaescher Amarnath Gupta Maryann E. Martone University of California San Diego
2
VLDB2000, Cairo 2 A Standard Mediator Architecture (MIX -- Mediation of Information using XML) MIX MEDIATOR INTEGRATED VIEW USER-Query Data Sources DB Files WWW Lab1Lab2Lab3 Wrapper XML Q/A XML Integrated View Definition XML Q/A
3
VLDB2000, Cairo 3 Integration Issues SEMANTIC Integration ??? SYNTACTIC/STRUCTURAL Integration Integrated Views (Src-XML => Intgr-XML) Schema Integration (DTD =>DTD) Wrapping, Data Extraction (Text => XML) MIX Mediation of Information using XML SYSTEM Integration SRB/MCAT TCP/IP HTTP CORBA storage, query capabilities protocols & services Distributed Query Processing
4
VLDB2000, Cairo 4 Integration Issues: Mediating across Multiple-Worlds Structural Integration => common semistructured data model (XML) => XML queries & transformations to resolve schema conflicts Limited Query Capabilities => mediator is aware of QCs exported by wrappers... Semantic Integration –most work deals with issues for “one-world” scenarios (e.g., amazon.com vs. bn.com) –what if data comes from a “multiple-world” scenario (like Neuroscience), where data objects from different sources are not even similar, and only the hidden semantics (known to the domain expert) provides the “semantic link”?
5
VLDB2000, Cairo 5 A Neuroscience Question protein localization What is the cerebellar distribution of rat proteins with more than 70% homology with human NCS-1? Any structure specificity? How about other rodents? ??? Integrated View ??? ???Mediator ??? morphometry neurotransmission Web CaBP, Expasy Wrapper ??? Integrated View Definition ???
6
VLDB2000, Cairo 6 Hidden Semantics: Protein Localization RyR …. spine 0 branchlet 30 Molecular layer of Cerebellar Cortex Purkinje Cell layer of Cerebellar Cortex Fragment of dendrite
7
VLDB2000, Cairo 7 Hidden Semantics: Morphometry … 12.348 1.93 4.47 9.884 7.930 4.47 1.79 … Branch level beyond 4 is a branchlet Must be dendritic because Purkinje cells don’t have somatic spines
8
VLDB2000, Cairo 8 The Problem Multiple Worlds Integration –compatible terms not directly joinable –complex, indirect associations among schema elements –unstated integrity constraints Why not just use Ontologies? –typical ontologies associate terms along limited number of dimensions What’s needed? –a “theory” under which non-identical terms can be “semantically joined” => lift mediation to the level of conceptual models (CMs) => domain knowledge, ICs become rules over CMs => Model-Based Mediation
9
VLDB2000, Cairo 9 XML-Based vs. Model-Based Mediation Raw Data IF THEN Logical Domain Constraints Integrated-CM := CM-QL(Src1-CM,...)...... (XML) Objects Conceptual Models XML Elements XML Models C2 C3 C1 R Classes, Relations, is-a, has-a,... DOMAIN MAP Integrated-DTD := XML-QL(Src1-DTD,...) No Domain Constraints A = (B*|C),D B =... Structural Constraints (DTDs), Parent, Child, Sibling,...
10
VLDB2000, Cairo 10 Extended Mediator Architecture => Wrappers export Conceptual Models (CMs), i.e., facts+rules for classes, relationships, ICs,... ) => Mediator imports CMs (from sources, auxiliary knowledge bases, and domain maps (DMs) => a generic conceptual model (GCM, a subset of F-logic), extensible via rules = common target CM language => new CMs can be plugged-in by specifying them in GCM + F-logic rules => prototype implementation in FLORA: global-as-view approach compiler: F-logic => XSB-Prolog top-down evaluation => virtual (demand-driven) views external interfaces (XML, RDBs, DM visualization,...)
11
VLDB2000, Cairo 11 Model-Based Mediator Architecture USER/Client USER/Client S1 S2 S3 XML-Wrapper CM-Wrapper XML-Wrapper CM-Wrapper XML-Wrapper CM-Wrapper GCM CM S1 GCM CM S2 GCM CM S3 CM (Integrated View) Mediator Engine FL rule proc. LP rule proc. Graph proc. XSB Engine Domain Map DM Integrated View Definition IVD Logic API (capabilities) CM Queries & Results (exchanged in XML) CM Plug-In
12
VLDB2000, Cairo 12 Definition of Integrated Views... XML-2-FL and CM-2-FL Translators <!ELEMENT Study (study_id, … animal, experiments, experimenters> <!ELEMENT experiment (description, instrument, parameters)> studyDB[studies =>> study]. study[study_id => string; … animal => animal; experiments =>> experiment; experimenters =>> string]. … Specification of Domain Knowledge Subclasses Rules Integrity Constraints Integrated View Definition mushroom_spine :: spine S:mushroom_spine IF S:spine[head _; neck _]. ic1(S):alert[type “invalid spine”; object S] IF S:spine[undef ->> {head, neck}]. protein_distribution(Protein, Organism, Brain_region, Feature_name, Anatom, Value) IF I:protein_label_image[ proteins ->> {Protein}; organism -> Organism; anatomical_structures ->> {AS:anatomical_structure[name->Anatom]}], NAE:neuro_anatomic_entity[name->Anatom; loccated_in->>{Brain_region}], AS..segments..features[name->Feature_name; value->Value].
13
VLDB2000, Cairo 13... Definition of Integrated Views (Multiple Sources) Creating Mediated Classes Reasoning with Schema animal[M R] IF S:source, S.animal [M R]. X[taxon T] IF X: ‘PROLAB’.animal[name N], words(N,[W1,W2|_]), T: ‘TAXON’.taxon[genus W1;species W2]. union over all classes association rule taxon[subspecies string; species string; genus string; … phylum string; kingdom string; superkingdom string]. Schema subspecies::species::genus:: … kingdom::superkingdom At Mediator T:TR, TR::TR1 IF T: ‘TAXON’.taxon[Taxon_Rank TR, Taxon_Rank1 TR1], Taxon_Rank::Taxon_Rank1. Class creation by schema reasoning
14
VLDB2000, Cairo 14 Model-Based Mediation with DOMAIN MAPS (DMs) Integrated-CM(Z1,...) := get X1,... from Src1; get X2,... from Src2; LINK (Xi, Yj); Zj = CM-QL(X1,...,Y1,...) LINK(X,Y): X.zip = Y.zip X.addr in Y.zip X.zip overlaps Y.county... “Semantic Road Maps” for situating source data => navigational aid (browsing source classes at the conceptual level) => basis for integrated views across multiple worlds => link points (concepts) and labeled arcs (roles) => formal semantics (in FL and/or DLs) Example: ANATOM DM = antatomical entities (concepts) + is_a, has_a, overlaps,... (roles) => from syntactic equality to semantic joins
15
VLDB2000, Cairo 15 ANATOM Domain Map ANATOM
16
VLDB2000, Cairo 16 ANATOM Domain Map with Registered Data ANATOM DATA
17
VLDB2000, Cairo 17 Deductive Closure of “has_a” with “tc(is_a)”: (YES -- Real Recursive Views!! ;-) ANATOM CLOSURE
18
VLDB2000, Cairo 18 Example Query Evaluation (I) Example: protein_distribution –given: organism, protein, brain_region –ANATOM DM: recursively traverse the has_a_star paths under brain_region collect all anatomical_entities –Source PROLAB: join with anatomical structures and collect the value of attribute “image.segments.features.feature.protein_amount” where “image.segments.features.feature.protein_name” = protein and “study_db.study.animal.name” = organism –Mediator: aggregate over all parents up to brain_region report distribution
19
VLDB2000, Cairo 19 Interactive Queries (I) KIND
20
VLDB2000, Cairo 20 Example Query Evaluation (II) @SENSELAB: X1 := select output from parallel fiber ; @MEDIATOR: X2 := “hang off” X1 from Domain Map; @MEDIATOR: X3 := subregion-closure(X2); @NCMIR: X4 := select PROT-data(X3, Ryanodine Receptors); @MEDIATOR: X5 := compute aggregate(X4); "How does the parallel fiber output (Yale/SENSELAB) relate to the distribution of Ryanodine Receptors (UCSD/NCMIR)?"
21
VLDB2000, Cairo 21 Interactive Queries (II) KIND01
22
VLDB2000, Cairo 22 Resulting Sub DOMAIN MAP “Browser” PROTLOC
23
VLDB2000, Cairo 23 Computed Protein Localization Data PROTLOC
24
VLDB2000, Cairo 24 Client-Side Result Visualization (using AxioMap Viewer: Ilya Zaslavsky) PROTLOC-AxioMap
25
VLDB2000, Cairo 25 Summary & Outlook: Federation of Brain Data CCBCCB, Montana SU Surface atlas, Van Essen LabVan Essen Lab NCMIRNCMIR, UCSD stereotaxic atlas LONILONI MCell, CNL, SalkCNL ANATOM PROTLOC ResultResult (VML) ResultResult (XML/XSLT) MODEL-BASED Mediation
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.