Data R&D Issues for GTL Bertram Ludäscher Data and Knowledge Systems

Slides:



Advertisements
Similar presentations
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Advertisements

Overview of the Science Environment for Ecological Knowledge (SEEK) Ricardo Scachetti Pereira.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
SACNAS, Sept 29-Oct 1, 2005, Denver, CO What is Cyberinfrastructure? The Computer Science Perspective Dr. Chaitan Baru Project Director, The Geosciences.
San Diego Supercomputer Center, University of California at San Diego Grid Physics Network (GriPhyN) University of Florida A Data Storage Language for.
DSM Workshop, October 22 OOPSLA 2006 Model-Based Workflows Leonardo Salayandía University of Texas at El Paso.
GIS in GEON Cyberinfrastructure Presented by Ashraf Memon Presented by Ashraf Memon.
Fungal Semantic Web Stephen Scott, Scott Henninger, Leen-Kiat Soh (CSE) Etsuko Moriyama, Ken Nickerson, Audrey Atkin (Biological Sciences) Steve Harris.
New Approaches to GIS and Atlas Production Infrastructure for spatial data integration: across scales and projects Ilya Zaslavsky David Valentine San Diego.
1 CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Global Earth Observation Grid Workshop, Bangkok, Thailand, March Integration Platform.
GIS at SDSC Domains: –From geology, environmental science, hydrology, ocean biodiversity, regional development, Katrina response, archaeology, to neuroscience.
SAN DIEGO SUPERCOMPUTER CENTER Developing a CUAHSI HIS Data Node, as part of Cyberinfrastructure for the Hydrologic Sciences David Valentine Ilya Zaslavsky.
State Geological Survey Contributions to the National Geothermal Data System.
GEON: The User Perspective Choonhan Youn Dogan Seber, Chaitan Baru, Ashraf Memon San Diego Supercomputer Center, University of California at San Diego.
Biology.sdsc.edu CIPRes in Kepler: An integrative workflow package for streamlining phylogenetic data analyses Zhijie Guan 1, Alex Borchers 1, Timothy.
Computing in Atmospheric Sciences Workshop: 2003 Challenges of Cyberinfrastructure Alan Blatecky Executive Director San Diego Supercomputer Center.
January, 23, 2006 Ilkay Altintas
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
SEEK: Enabling Ecology and Biodiversity Science Through Cyberinfrastructure.
Introduction for BEAM Ecological Niche Modeling Working Meeting Deana Pennington University of New Mexico December 14, 2004.
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
GEON-UTEP GEON-Knowledge Representation WG Update GEON-KR list (currently) Bertram Ludaescher (SDSC: Bertram Ludaescher (SDSC:
Supporting Large-Scale Science with Workflows Deana Pennington University of New Mexico Long-Term Ecological Research Network Office ITR: Science Environment.
San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida Dataflows in SRB using SDSC Matrix Arun Jagatheesan Architect & Team.
Data R&D Issues for GTL Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego Bertram Ludäscher
GEON AHM, April 16-18, SDSC C YBERINFRASTRUCTURE FOR THE G EOSCIENCES Towards Semantic Mediation for GEON: Facilitating Scientific Data Integration using.
Physical model Model results HPCC Data Modeling Environment Core Grid Services Authentication, monitoring, scheduling, catalog, data transfer, Replication,
Pipelines and Scientific Workflows with Ptolemy II Deana Pennington University of New Mexico LTER Network Office Shawn Bowers UCSD San Diego Supercomputer.
EcoGrid SEEK All Hands Meeting February 2003 Albuquerque, NM.
Investigators: Chaitan Baru, Randy Keller, Dogan Seber, Krishna Sinha, Ramon Arrowsmith, Boyan Brodaric, Karl Flessa, Eric Frost, Ann Gates, Mark Gahegan,
Science Environment for Ecological Knowledge: EcoGrid Matthew B. Jones National Center for.
Mark Ellisman, Ph.D. Professor of Neurosciences and Bioengineering Director, BIRN Coordinating Center Center for Research on Biological Systems University.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
Research Design for Collaborative Computational Approaches and Scientific Workflows Deana Pennington January 8, 2007.
Grid Technologies Arcot Rajasekar (SEEK) Paul Watson (North East eScience Centre)
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.
San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida DGL: The Assembly Language for Grid Computing Arun swaran Jagatheesan.
GEON Cyberinfrastructure Workshop Beijing, China, July 21-23, 2006 Workflow-Driven Ontologies for the Geosciences Leonardo Salayandía The University of.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Kepler includes contributors from GEON, SEEK, SDM Center and Ptolemy II, supported by NSF ITRs (SEEK), EAR (GEON), DOE DE-FC02-01ER25486.
Knowledge Representation Breakout KR: to create content (objects, reltnshps) for SMS (logic/inference) that will be useful for enhancing the discovery.
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
CNI, 3rd April 2006 Slide 1 UK National Centre for Text Mining: Activities and Plans Dr. Robert Sanderson Dept. of Computer Science University of Liverpool.
EScience Workshop on Scientific Workflows Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara.
GEOSCIENCE NEEDS & CHALLENGES Dogan Seber San Diego Supercomputer Center University of California, San Diego, USA.
Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES GEON IT Advances: Overview Chaitan Baru San Diego Supercomputer Center.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for.
SEEK Science Environment for Ecological Knowledge l EcoGrid l Ecological, biodiversity and environmental data l Computational access l Standardized, open.
Semantic Mediation and Scientific Workflows Bertram Ludäscher Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego.
GEONSearch: From Searching to Recommending GeoInformatics 2006 May 10-12, Reston, Virginia Ullas Nambiar, Bertram Ludaescher Dept. of Computer Science.
An Extensible Model-Based Mediator System with Domain Maps Amarnath Gupta * Bertram Ludäscher * Maryann E. Martone + * San Diego Supercomputer Center (SDSC)
Biomedical Informatics Research Network The BIRN Architecture: An Overview Jeffrey S. Grethe, BIRN-CC 10/9/02 BIRN All Hands Meeting 2002.
High throughput biology data management and data intensive computing drivers George Michaels.
All Hands Meeting 2005 BIRN-CC: Building, Maintaining and Maturing a National Information Infrastructure to Enable and Advance Biomedical Research.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
SAN DIEGO SUPERCOMPUTER CENTER, UCSD NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Introduction to SDSC Fran Berman Director, SDSC and.
GEON IT Solutions: Products and Demos Chaitan Baru San Diego Supercomputer Center.
EcoGrid in SEEK A Data Grid System for Ecology Bertram Ludaescher University of California, Davis Arcot Rajasekar San Diego Supercomputer Center, University.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Tools and Services Workshop
Joslynn Lee – Data Science Educator
Problem: Ecological data needed to address critical questions are dispersed, heterogeneous, and complex Solution: An internet-based mechanism to discover,
Dr Kristin Stock Allworlds Geothinking
A Semantic Type System and Propagation
Ontologies: Introduction and Some Uses
Presentation transcript:

Data R&D Issues for GTL Bertram Ludäscher Data and Knowledge Systems ludaesch@sdsc.edu Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego

Data R&D Issues for GTL GTL data management infrastructure Service-oriented Data Grids for Seamless data sharing (volume, distribution, access restrictions, …) Capabilities for data integration (mediators/warehouses), digital library functions, knowledge-based (“semantic”) extensions (e.g. ontologies), and archival capabilities Data analysis and knowledge-enabling infrastructure Analytical Pipelines (“Scientific Workflows”) Rapid design and prototyping, handling of complex data & task semantics, large volume, sci. workflow as a first-class product, validation, execution, monitoring, sharing, archiving How to go from a scientist’s abstract (conceptual) workflow to a data grid execution plan? New Model Management and Knowledge Representation Technologies : Closing the gap between data management (DBMS’s, data grids) and knowledge-based systems (desktop-oriented, rule-based systems) and analysis and modeling systems Mapping between numerous formalisms at the syntactic, structural, and semantic level (terminological, process-semantics, …) “Gluing” together models and formalisms across different levels: from genes to proteins to molecular machines to microbial communities…(compare: pnp transistors, boolean circuits, assembly language, high-level PLs , declarative QLs, … )  abstraction & elaboration mechanisms  Data exploration and hypothesis generation tools (KNOW-ME, SKIDL, SEEK AMS, …) Computational facilities Use of high-end networked facilities a la TeraGrid Opportunities (and challenges!) in leveraging related efforts: NIH BIRN, …, NSF Cyberinfrastructure (ITRs GEON, GriPhyN, SCEC, SEEK, …), UK e-Science, … Standardization (OGSA, KR/Semantic Web technologies, e.g., ontology languages (OWL), inference mechanisms, …), scientific workflow standards, … interoperable, open source tools One size/standards fits all? Probably not: data-intensive vs computation-intensive vs “semantics-intensive” (capturing implicit domain knowledge, hidden assumptions, …) Data R&D Issues for GTL

Bonus Material (beyond 1 slide limit ;-) starts here …

Up & Down: Abstraction & Elaboration Mechanisms How to punch through the technology barriers? Data Grids vs Digital Libraries vs DBMS’s vs Knowledge-Based Analysis & Modeling Systems Knowledge Mgmt Information Mgmt Data Management

Biomedical Informatics Research Network

Getting Formal: Source Contextualization & Ontology Refinement in Logic Biomedical Informatics Research Network http://nbirn.net

Scientific Data Integration ... Questions to Queries ... GeoSciences Network What is the distribution and U/ Pb zircon ages of A-type plutons in VA? How about their 3-D geometry ? How does it relate to host rock structures? domain knowledge Knowledge Representation: ontologies, concept spaces “Complex Multiple-Worlds” Mediation ? Information Integration Database mediation Data modeling raw data protloc = neurotrans (stimulate then electrical responses, probes) = RDB, SENSELAB, Yale CaBP (chemical structure, PDB links, function of CaBP, found-in...) = Web, Vanderbilt U Expasy (Protein-info as Sequence data) = Web, Europe Geologic Map (Virginia) GeoPhysical (gravity contours) GeoChronologic (Concordia) Foliation Map (structure DB) GeoChemical

Geologic Map Integration: Geo & IT/CS meet domain knowledge Knowledge representation AGE ONTOLOGY Nevada +/- a few hundred million years  Geoscientists + Computer Scientists Igneous Geoinformaticists +/- Energy GEON Metamorphism Equation:

Analytical Pipeline (AP)     SEEK Project Overview Large collaborative NSF/ITR project: UNM, UCSB, UCSD (SDSC), UKansas,.. “Analysis & Modeling System” to design, execute, reproduce/refine scientific workflows in the ecology and biodiversity domains. ASx ASy ASz TS1 TS2 Semantic Mediation Engine Data Binding Query Processing j¬y j¬ a ECO2 Logic Rules ECO2-CL Analytical Pipeline (AP) SMS: Semantic Mediation System EcoGrid provides unified access to Distributed Data Stores , Parameter Ontologies, & Stored Analyses, and runtime capabilities via the Execution Environment Semantic Mediation System & Analysis and Modeling System use EcoGrid web services, enabling analytically driven data discovery and integration SEEK is the combination of EcoGrid data resources and information services, coupled with advanced semantic and modeling capabilities AM: Analysis and Modeling System ASr Parameters w/ Semantics C Parameter Ontologies WSDL SRB KNB MC Species Wrp Dar ... Raw data sets wrapped for integration w/ EML, etc. TaxOn EML etc. Execution Environment SAS, MATLAB, FORTRAN, etc Library of Analysis Steps, Pipelines & Results Invasive species over time W S D L Example of “AP0” AP0