Brain Data & Knowledge Grid (or: Towards Services for Knowledge-Based Mediation of Neuroscience Information Sources) National Center for Microscopy and.

Slides:



Advertisements
Similar presentations
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Advertisements

Digital Preservation Lifecycle Management Building a demonstration prototype for the preservation of large-scale multi-media collections Arcot Rajasekar.
A Prototype Implementation of a Framework for Organising Virtual Exhibitions over the Web Ali Elbekai, Nick Rossiter School of Computing, Engineering and.
An Operational Metadata Framework For Searching, Indexing, and Retrieving Distributed GIServices on the Internet By Ming-Hsiang.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids Reagan W. Moore San Diego Supercomputer Center.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids, Digital Libraries and Persistent Archives Reagan.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid PPDG Data Handling System Reagan.
Applying Data Grids to Support Distributed Data Management Storage Resource Broker Reagan W. Moore Ian Fisk Bing Zhu University of California, San Diego.
The GRID Adventures: SDSC's Storage Resource Broker and Web Services in Digital Library Applications Arcot Rajasekar, Reagan Moore, Bertram Ludäscher,
New Approaches to GIS and Atlas Production Infrastructure for spatial data integration: across scales and projects Ilya Zaslavsky David Valentine San Diego.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Quete: Ontology-Based Query System for Distributed Sources Haridimos Kondylakis, Anastasia Analyti, Dimitris Plexousakis Kondylak, analyti,
Knowledge Mediation in the WWW based on Labelled DAGs with Attached Constraints Jutta Eusterbrock WebTechnology GmbH.
January, 23, 2006 Ilkay Altintas
Modeling Interactive Web Sources for Information Mediation Information Mediation Framework/Motivation Modeling Interactive Sources with Interaction Diagrams.
Towards Bootstrapping Knowledge- Based Archives* Bertram Ludäscher Richard Marciano Reagan Moore San Diego Supercomputer Center
Integrating digital atlases of the brain: atlas services with WPS Ilya Zaslavsky San Diego Supercomputer Center, UCSD Lead of the INCF Digital Atlasing.
San Diego Supercomputer Center EDBT'02, Prague 1 EDBT Panel, March 2002, Prague: Scientific Data Integration for Complex Multiple-Worlds Scenarios: Databases.
National Partnership for Advanced Computational Infrastructure Digital Library Architecture Reagan Moore Chaitan Baru Amarnath Gupta George Kremenek Bertram.
Model Based Mediation With Domain Maps ___________________________ Xiaosen Li Guanrao William
Database System Concepts and Architecture
Data R&D Issues for GTL Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego Bertram Ludäscher
San Diego Supercomputer Center University of California, San Diego The MIX Project Native XML Database XML View(s) Wrappers export: 1. Schemas & Metadata.
All Hands Meeting 2003 BIRN ONTOLOGIES Session Jeffrey Grethe Amarnath Gupta Bertram Ludäscher Maryann E. Martone.
Scientific Data Integration with Model-Based Mediation : Databases Meets * Knowledge Representation Bertram Ludäscher Bertram
Navigation-Driven Evaluation of Virtual Mediated Views Bertram Ludäscher, SDSC/UCSD Yannis Papakonstantinou, UCSD Pavel Velikhov, UCSD Overview Mediator.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Mark Ellisman, Ph.D. Professor of Neurosciences and Bioengineering Director, BIRN Coordinating Center Center for Research on Biological Systems University.
EU Project proposal. Andrei S. Lopatenko 1 EU Project Proposal CERIF-SW Andrei S. Lopatenko Vienna University of Technology
1 Lessons from the TSIMMIS Project Yannis Papakonstantinou Department of Computer Science & Engineering University of California, San Diego.
Atlas Interoperablity I & II: progress to date, requirements gathering Session I: 8:30 – 10am Session II: 10:15 – 12pm.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
Rule-Based Programming for VORBs Bertram Ludaescher Arcot Rajasekar Data and Knowledge Systems San Diego Supercomputer Center U.C. San Diego.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure SRB + Web Services = Datagrid Management System (DGMS) Arcot.
1 Model-Based Information Integration in a Neuroscience Mediator System Bertram Ludaescher Amarnath Gupta Maryann E. Martone University of California San.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Archive for the NSDL Reagan W. Moore Charlie Cowart.
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
Alexandria Digital Earth ProtoType DIGITAL LIBRARIES AND ENVIRONMENTAL INFORMATION Terence R. Smith Alexandria Digital Library Project.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure Feature Interpretation in Vector Data: Reconciling Spatial.
Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu & Gagan Agrawal Enabling.
INCF Digital Atlasing Infrastructure: An Overview.
Knowledge-Based Integration of Neuroscience Data Sources Amarnath Gupta Bertram Ludäscher Maryann Martone University of California San Diego.
XML and Its Applications Ben Y. Zhao, CS294-7 Spring 1999.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
NeuroLOG ANR-06-TLOG-024 Software technologies for integration of process and data in medical imaging A transitional.
Model-Based Mediation with Domain Maps Bertram Ludäscher * Amarnath Gupta * Maryann E. Martone + * San Diego Supercomputer Center (SDSC) + National Center.
Needs and Progress: Summary Flexible, powerful, modular atlas interface, and a query gateway to multiple types of data (GeneNetwork, Barlow, Smith, CCDB,
Semantic Mediation and Scientific Workflows Bertram Ludäscher Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego.
CSE-291: Ontologies in Data Integration Department of Computer Science & Engineering University of California, San Diego CSE-291: Ontologies in Data Integration.
Biomedical Informatics Research Network The Storage Resource Broker & Integration with NMI Middleware Arcot Rajasekar, BIRN-CC SDSC October 9th 2002 BIRN.
Working with XML. Markup Languages Text-based languages based on SGML Text-based languages based on SGML SGML = Standard Generalized Markup Language SGML.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Interlib Technology Integration Reagan.
An Extensible Model-Based Mediator System with Domain Maps Amarnath Gupta * Bertram Ludäscher * Maryann E. Martone + * San Diego Supercomputer Center (SDSC)
Distributed Archives Interoperability Cynthia Y. Cheung NASA Goddard Space Flight Center IAU 2000 Commission 5 Manchester, UK August 12, 2000.
Biomedical Informatics Research Network The BIRN Architecture: An Overview Jeffrey S. Grethe, BIRN-CC 10/9/02 BIRN All Hands Meeting 2002.
Contributions to mouse BIRN tools and resources Maryann Martone and Mark Ellisman University of California, San Diego 2008.
National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center KNOW-ME (KNOWledge-Map-Explorer) Semantic Browsing of Integrated.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
University of California, San Diego
Collection Based Persistent Archives
UCSD Neuron-Centered Database
Data R&D Issues for GTL Bertram Ludäscher Data and Knowledge Systems
Interlib Technology Integration
Model Based Mediation With Domain Maps ___________________________
A Semantic Type System and Propagation
Ontologies: Introduction and Some Uses
Presentation transcript:

Brain Data & Knowledge Grid (or: Towards Services for Knowledge-Based Mediation of Neuroscience Information Sources) National Center for Microscopy and Imaging Research (NCMIR) Mark Ellisman Maryann Martone Steve Peltier Steve Lamont... Data-Intensive Computing Environments San Diego Supercomputer Center (SDSC) Reagan Moore Chaitan Baru Amarnath Gupta Bertram Ludäscher Richard Marciano Arcot Rajasekar Ilya Zaslavsky... University of California, San Diego

Infrastructure for Sharing Neuroscience Data CCBCCB, Montana SU Surface atlas, Van Essen LabVan Essen Lab NCMIRNCMIR, UCSD stereotaxic atlas LONILONI MCell, CNL, SalkCNL SOURCES: NCMIR, U.C. San Diego Caltech Neuroimaging Center for Imaging Science, John Hopkins Center for Computational Biology, Montana State Laboratory of Neuro Imaging (LONI), UCLA Computatuonal Neurobiology Laboratory, Salk Inst. Van Essen Laboratory, Washington University … Data Management Infrastructure (DICE/NPACI) MIX Mediation in XML MCAT information discovery SRB data handling HPSS storage... Knowledge-based GRID infrastructure ? ? ? ? Data Management Infrastructure (“Data Grid”) GTOMO, Telemicroscopy, Globus, SRB/MCAT, HPSS

Sharing Resources on the Brain Data Grid Scientific groups... –create data products (e.g., text data, images, simulation data …) –put them in collections –add metadata (who created it, what is the data about …) –make it available for sharing (on the web, in data caches, in HPSS, …) Technical challenges... –size & packaging of data –heterogeneity: data types, storage technologies, transport mechanisms, authentication,... –access levels: collection, object, fragment; data-specific functions (“data blades”) Data Grid technologies can help... –distributed data management, e.g., Storage Request Broker/Metadata Catalog (SRB/MCAT), computing (Globus),... –focus is on resource sharing (data, networks, cycles)

Integration Issue: Semantic Integration/Mediation ??? SEMANTIC INTEGRATION ??? SYNTACTIC/STRUCTURAL Integration Integrated Views (Src-XML => Intgr-XML) Schema Integration (DTD =>DTD) Wrapping, Data Extraction (Text => XML) MIX Mediation of Information using XML SYSTEM INTEGRATION SRB/MCAT TCP/IP grid-ftp HTTP storage, query capabilities protocols & services Distributed Query Processing Globus JDBC DOM CORBA

Standard Mediator/Wrapper Architecture GRID federation services ??? INTEGRATED VIEW Client/User-Query (Neuro)Science (Re)Sources DB Files WWW Lab1Lab2Lab3 Wrapper XML Q/A SRB/MCAT, DOM, X(ML)Query structure transport syntax storage } domain semantics ??? Integration logic protocol translation

The Need for Semantic Integration protein localization What is the cerebellar distribution of rat proteins with more than 70% homology with human NCS-1? Any structure specificity? How about other rodents? morphometry neurotransmission ???Mediator ??? Web CaBP, Expasy Wrapper ??? Integrated View ??? ??? Integrated View Definition ??? Data, relationships, constraints are modeled (CMs) Cross-source relationships are modeled Semantic (knowledge- based) mediation services Cross-source queries

Hidden Semantics: Protein Localization RyR …. spine 0 branchlet 30 Molecular layer of Cerebellar Cortex Purkinje Cell layer of Cerebellar Cortex Fragment of dendrite

Hidden Semantics: Morphometry … … Branch level beyond 4 is a branchlet Must be dendritic because Purkinje cells don’t have somatic spines

Knowledge-Based (Semantic) Mediation Multiple Worlds Integration Problem: –compatible terms not directly joinable –complex, indirect associations among attributes –unstated integrity constraints Approach: –a “theory” under which terms can be “semantically joined” => lift mediation to the level of conceptual models (CMs) => formalize domain knowledge, ICs become rules over CMs => Knowledge-Based/Model-Based (Semantic) Mediation

XML-Based vs. Model-Based Mediation Raw Data IF  THEN  Logical Domain Constraints Integrated-CM := CM-QL(Src1-CM,...) (XML) Objects Conceptual Models XML Elements XML Models C2 C3 C1 R Classes, Relations, is-a, has-a,... DOMAIN MAP Integrated-DTD := XML-QL(Src1-DTD,...) No Domain Constraints A = (B*|C),D B =... Structural Constraints (DTDs), Parent, Child, Sibling,... CM ~ {Descr.Logic, ER, UML, RDF/XML(-Schema), …} CM-QL ~ {F-Logic, OIL, DAML, …}

Knowledge-Based Mediator Prototype USER/Client USER/Client S1 S2 S3 XML-Wrapper CM-Wrapper XML-Wrapper CM-Wrapper XML-Wrapper CM-Wrapper GCM CM S1 GCM CM S2 GCM CM S3 CM (Integrated View) Mediator Engine FL rule proc. LP rule proc. Graph proc. XSB Engine Domain Map DM Integrated View Definition IVD Logic API (capabilities) CM Queries & Results (exchanged in XML) CM Plug-In

Mediation Services: Source Registration (System Issues) Source Data Type Access Protocol Query Capability table treefile SRB HTTPJDBC SQL XML QL DOOD ARC Result Delivery Tuple-at-a-time Set-at-a-time Stream Binary for Viewer Selections SPJ

Mediation Services: Source Registration (Semantics Issues) Domain Map Registration –provide concept space/ontology … as a private object (“ myANATOM ”) … merge with others (give “semantic bridges”) … and check for conflicts Conceptual Model Registration –schema: classes, associations, attributes –domain constraints –“put data into context” (linking data to the domain map) Next

ANATOM Domain Map ANATOM Back

anatom_dom(X) :- (ucsd_has_a(X,_) ; ucsd_has_a(_,X) ; ucsd_isa(X,_) ; ucsd_isa(_,X)). senselab_dom(X) :- (sl_has_a(X,_) ; sl_has_a(_,X) ; sl_isa(X,_) ; sl_isa(_,X)). % map Senselab anatom terms to equivalent UCSD ANATOM sl2ucsd(X,X) :- senselab_dom(X), anatom_dom(X). sl2ucsd('A',axon). sl2ucsd('AH',axon). sl2ucsd('Dad',spiny_branchlet). % should map to a PATH not just the end of the path sl2ucsd('Dam',main_branches). % some of the main_branches based on the branch level sl2ucsd('Dap',main_branches). sl2ucsd('Dbd',spiny_branchlet). sl2ucsd('Dbm',main_branches). sl2ucsd('Dbp',main_branches). sl2ucsd('Ded',spiny_branchlet). sl2ucsd('Dem',main_branches). sl2ucsd('Dep',main_branches). sl2ucsd('T',axon). % keep has_a edge if at least one node is known from UCSD has_a(X,Y) :- sl2ucsd(_,X), ucsd_has_a(X,Y). has_a(X,Y) :- sl2ucsd(_,Y), ucsd_has_a(X,Y). % keep all and only UCSD is_a rels isa(X,Y) :- ucsd_isa(X,Y).BackBack Senselab (Yale) and NCMIR (UCSD) “Semantic Bridge”

Neuron Spiny Neuron Substantia Nigra Pc AxonSomaDendrite GABA Neurotransmitter Compartment Dopamine R Substance P MyNeuron Medium Spiny Neuron Substantia Nigra Pr Globus Pallidus Int. Globus Pallidus Ext. MyDendrite OR ALL:has AND = exp Neostriatum Refinement of a Domain Map (Ontology): Putting Data in Context via Registration of new Classes & Relationships

Mediation Services : Integrated View Definition DERIVE protein_distribution(Protein, Organism, Brain_region, Feature_name, Anatom, Value) FROM I:protein_label_image[ proteins ->> {Protein}; organism -> Organism; anatomical_structures ->> {AS:anatomical_structure[name->Anatom]}], % from PROLAB NAE:neuro_anatomic_entity[name->Anatom; % from ANATOM located_in->>{Brain_region}], AS..segments..features[name->Feature_name; value->Value]. provided by the domain expert and mediation engineer declarative language (here: Frame-logic)

Example Query Evaluation (I) Example: protein_distribution –given: organism, protein, brain_region –Use DOMAIN-KNOWLEDGE-BASE: recursively traverse the has_a_star paths under brain_region collect all anatomical_entities –Source PROLAB: join with anatomical structures and collect the value of attribute “image.segments.features.feature.protein_amount” where “image.segments.features.feature.protein_name” = protein and “study_db.study.animal.name” = organism –Mediator: aggregate over all parents up to brain_region report distribution

Example Query Evaluation X1 := select output from parallel fiber X2 := “hang off” X1 from Domain X3 := X4 := select PROT-data(X3, Ryanodine X5 := compute aggregate(X4); "How does the parallel fiber output (Yale/SENSELAB) relate to the distribution of Ryanodine Receptors (UCSD/NCMIR)?"

Mediation Services: Client Registration Client Update Client Fat Result Viewer Query Client Check Data Merge Before Insert Derive Before Insert Client-side Buffer Client-side Processing Navigate/ Ad-hoc Query Capability Query on Schema Thin Result Viewer Send Full Data Server-side Buffer Context Sensitive Server-Push/ Client-Pull

Example Client: Query Formulation and Result Display combination of ad hoc and navigational queries client side visualization (left) results are shown in semantic context (right)

Mediation Services: Semantic Annotation Tools line drawing ==annotation==> (spatial) database for mediation

XML Sources RDB Sources File Sources HTML Sources Query interface (down API): SDLIP, SOAP,... (subsets of) SQL, X(ML)-Query, CPL,... DOM SRB-based access Result delivery interface (up API): SDLIP, SOAP,... pull (tuple/set-at-a-time, DOM) vs. push (stream) synchronous/asynchronous direct data/data reference Wrapper Layer Digital Libraries (Collections) Spatial Sources Source registration: domain knowledge model & schema query & computation capabilities Query processing: view unfolding semantic optimization capability-based rewriting Source model lifting: domain knowledge reconciliation model transformation Query formulation: user query integrated view definition Optimizer Model Reasoner Deductive Engine Mediator Layer Mediation Services Mediator Architecture Blueprint Boston Univ. NCMIR UCSD Yale Univ. Montana Univ. SDLIP ARC IMS

Coming up: Knowledge-Based/Semantic Mediation of Brain Data CCBCCB, Montana SU Surface atlas, Van Essen LabVan Essen Lab NCMIRNCMIR, UCSD stereotaxic atlas LONILONI MCell, CNL, SalkCNL ANATOM PROTLOC ResultResult (VML/SVG) ResultResult (XML/XSLT)  Knowledge-Based Mediation

Some Open Issues Data/Knowledge Modeling –Extensibility: how to handle a source with new data types and operations? Temporal Data: instrument readings, video microscopy Spatial Data: Integrating with spatial database systems Image database systems –Conflict Management Grades of certainty Alternate Hypothesis Integrating Services –Registration and warping of my image slice to a reference Integrating into Larger Applications –M-Cell simulation –Telemicroscopy –Visualization

Model-Based Mediation with Domain Maps, Bertram Ludäscher, Amarnath Gupta, Maryann Martone, Intl. Conference on Data Engineering (ICDE), Heidelberg, 2001 Knowledge-Based Mediation of Heterogeneous Neuroscience Information Sources, Amarnath Gupta, Bertram Ludäscher, Maryann Martone, Intl. Conference on Scientific and Statistical Databases (SSDBM), Berlin, Model-Based Information Integration in a Neuroscience Mediator System, Bertram Ludäscher, Amarnath Gupta, Maryann Martone, Intl. Conference on Very Large Data Bases (VLDB), Cairo, References