Data R&D Issues for GTL Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego Bertram Ludäscher
Data R&D Issues for GTL GTL data management infrastructure GTL data management infrastructure Service-oriented Data Grids for Service-oriented Data Grids for Seamless data sharing (volume, distribution, access restrictions, …) Capabilities for data integration (mediators/warehouses), digital library functions, knowledge-based (“semantic”) extensions (e.g. ontologies), and archival capabilities Data analysis and knowledge-enabling infrastructure Data analysis and knowledge-enabling infrastructure Analytical Pipelines (“Scientific Workflows”) Analytical Pipelines (“Scientific Workflows”) Rapid design and prototyping, handling of complex data & task semantics, large volume, sci. workflow as a first-class product, validation, execution, monitoring, sharing, archiving How to go from a scientist’s abstract (conceptual) workflow to a data grid execution plan? New Model Management and Knowledge Representation Technologies : New Model Management and Knowledge Representation Technologies : Closing the gap between data management (DBMS’s, data grids) and knowledge-based systems (desktop- oriented, rule-based systems) and analysis and modeling systems Mapping between numerous formalisms at the syntactic, structural, and semantic level (terminological, process-semantics, …) “Gluing” together models and formalisms across different levels: from genes to proteins to molecular machines to microbial communities…(compare: pnp transistors, boolean circuits, assembly language, high-level PLs, declarative QLs, … ) abstraction & elaboration mechanisms Data exploration and hypothesis generation tools (KNOW-ME, SKIDL, SEEK AMS, …) Computational facilities Computational facilities Use of high-end networked facilities a la TeraGrid Use of high-end networked facilities a la TeraGrid Opportunities (and challenges!) in leveraging related efforts: Opportunities (and challenges!) in leveraging related efforts: NIH BIRN, …, NSF Cyberinfrastructure (ITRs GEON, GriPhyN, SCEC, SEEK, …), UK e-Science, … NIH BIRN, …, NSF Cyberinfrastructure (ITRs GEON, GriPhyN, SCEC, SEEK, …), UK e-Science, … Standardization (OGSA, KR/Semantic Web technologies, e.g., ontology languages (OWL), inference mechanisms, …), scientific workflow standards, … interoperable, open source tools Standardization (OGSA, KR/Semantic Web technologies, e.g., ontology languages (OWL), inference mechanisms, …), scientific workflow standards, … interoperable, open source tools One size/standards fits all? Probably not: data-intensive vs computation-intensive vs “semantics-intensive” (capturing implicit domain knowledge, hidden assumptions, …) One size/standards fits all? Probably not: data-intensive vs computation-intensive vs “semantics-intensive” (capturing implicit domain knowledge, hidden assumptions, …)
Bonus Material (beyond 1 slide limit ;-) starts here …
Up & Down: Abstraction & Elaboration Mechanisms Knowledge Mgmt Information Mgmt Data Management How to punch through the technology barriers? Data Grids vs Digital Libraries vs DBMS’s vs Knowledge-Based Analysis & Modeling Systems
Biomedical Informatics Research Network
Biomedical Informatics Research Network Biomedical Informatics Research Network Getting Formal: Source Contextualization & Ontology Refinement in Logic
Scientific Data Integration... Questions to Queries... What is the distribution and U/ Pb zircon ages of A-type plutons in VA? How about their 3-D geometry ? How does it relate to host rock structures? ? Information Integration Geologic Map (Virginia) GeoChemical GeoPhysical (gravity contours) GeoChronologic (Concordia) Foliation Map (structure DB) “Complex Multiple-Worlds” Mediation domain knowledge Database mediation Data modeling Knowledge Representation: ontologies, concept spaces raw data GeoSciences Network
Geologic Map Integration: Geo & IT/CS meet domain knowledge domain knowledge Knowledge representation AGE ONTOLOGY Nevada Geoscientists + Computer Scientists Igneous Geoinformaticists +/- Energy GEON Metamorphism Equation: +/- a few hundred million years
Large collaborative NSF/ITR project: UNM, UCSB, UCSD (SDSC), UKansas,.. Large collaborative NSF/ITR project: UNM, UCSB, UCSD (SDSC), UKansas,.. “Analysis & Modeling System” to design, execute, reproduce/refine scientific workflows in the ecology and biodiversity domains. “Analysis & Modeling System” to design, execute, reproduce/refine scientific workflows in the ecology and biodiversity domains. SEEK Project Overview