Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data R&D Issues for GTL Bertram Ludäscher Data and Knowledge Systems

Similar presentations


Presentation on theme: "Data R&D Issues for GTL Bertram Ludäscher Data and Knowledge Systems"— Presentation transcript:

1 Data R&D Issues for GTL Bertram Ludäscher Data and Knowledge Systems
Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego

2 Data R&D Issues for GTL GTL data management infrastructure
Service-oriented Data Grids for Seamless data sharing (volume, distribution, access restrictions, …) Capabilities for data integration (mediators/warehouses), digital library functions, knowledge-based (“semantic”) extensions (e.g. ontologies), and archival capabilities Data analysis and knowledge-enabling infrastructure Analytical Pipelines (“Scientific Workflows”) Rapid design and prototyping, handling of complex data & task semantics, large volume, sci. workflow as a first-class product, validation, execution, monitoring, sharing, archiving How to go from a scientist’s abstract (conceptual) workflow to a data grid execution plan? New Model Management and Knowledge Representation Technologies : Closing the gap between data management (DBMS’s, data grids) and knowledge-based systems (desktop-oriented, rule-based systems) and analysis and modeling systems Mapping between numerous formalisms at the syntactic, structural, and semantic level (terminological, process-semantics, …) “Gluing” together models and formalisms across different levels: from genes to proteins to molecular machines to microbial communities…(compare: pnp transistors, boolean circuits, assembly language, high-level PLs , declarative QLs, … )  abstraction & elaboration mechanisms  Data exploration and hypothesis generation tools (KNOW-ME, SKIDL, SEEK AMS, …) Computational facilities Use of high-end networked facilities a la TeraGrid Opportunities (and challenges!) in leveraging related efforts: NIH BIRN, …, NSF Cyberinfrastructure (ITRs GEON, GriPhyN, SCEC, SEEK, …), UK e-Science, … Standardization (OGSA, KR/Semantic Web technologies, e.g., ontology languages (OWL), inference mechanisms, …), scientific workflow standards, … interoperable, open source tools One size/standards fits all? Probably not: data-intensive vs computation-intensive vs “semantics-intensive” (capturing implicit domain knowledge, hidden assumptions, …) Data R&D Issues for GTL

3 Bonus Material (beyond 1 slide limit ;-) starts here …

4 Up & Down: Abstraction & Elaboration Mechanisms
How to punch through the technology barriers? Data Grids vs Digital Libraries vs DBMS’s vs Knowledge-Based Analysis & Modeling Systems Knowledge Mgmt Information Mgmt Data Management

5 Biomedical Informatics Research Network

6 Getting Formal: Source Contextualization & Ontology Refinement in Logic
Biomedical Informatics Research Network

7 Scientific Data Integration ... Questions to Queries ...
GeoSciences Network What is the distribution and U/ Pb zircon ages of A-type plutons in VA? How about their 3-D geometry ? How does it relate to host rock structures? domain knowledge Knowledge Representation: ontologies, concept spaces “Complex Multiple-Worlds” Mediation ? Information Integration Database mediation Data modeling raw data protloc = neurotrans (stimulate then electrical responses, probes) = RDB, SENSELAB, Yale CaBP (chemical structure, PDB links, function of CaBP, found-in...) = Web, Vanderbilt U Expasy (Protein-info as Sequence data) = Web, Europe Geologic Map (Virginia) GeoPhysical (gravity contours) GeoChronologic (Concordia) Foliation Map (structure DB) GeoChemical

8 Geologic Map Integration: Geo & IT/CS meet
domain knowledge Knowledge representation AGE ONTOLOGY Nevada +/- a few hundred million years  Geoscientists + Computer Scientists Igneous Geoinformaticists +/- Energy GEON Metamorphism Equation:

9 Analytical Pipeline (AP)
    SEEK Project Overview Large collaborative NSF/ITR project: UNM, UCSB, UCSD (SDSC), UKansas,.. “Analysis & Modeling System” to design, execute, reproduce/refine scientific workflows in the ecology and biodiversity domains. ASx ASy ASz TS1 TS2 Semantic Mediation Engine Data Binding Query Processing j¬y j¬ a ECO2 Logic Rules ECO2-CL Analytical Pipeline (AP) SMS: Semantic Mediation System EcoGrid provides unified access to Distributed Data Stores , Parameter Ontologies, & Stored Analyses, and runtime capabilities via the Execution Environment Semantic Mediation System & Analysis and Modeling System use EcoGrid web services, enabling analytically driven data discovery and integration SEEK is the combination of EcoGrid data resources and information services, coupled with advanced semantic and modeling capabilities AM: Analysis and Modeling System ASr Parameters w/ Semantics C Parameter Ontologies WSDL SRB KNB MC Species Wrp Dar ... Raw data sets wrapped for integration w/ EML, etc. TaxOn EML etc. Execution Environment SAS, MATLAB, FORTRAN, etc Library of Analysis Steps, Pipelines & Results Invasive species over time W S D L Example of “AP0” AP0


Download ppt "Data R&D Issues for GTL Bertram Ludäscher Data and Knowledge Systems"

Similar presentations


Ads by Google