Data R&D Issues for GTL Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego Bertram Ludäscher

Slides:



Advertisements
Similar presentations
Overview of the Science Environment for Ecological Knowledge (SEEK) Ricardo Scachetti Pereira.
Advertisements

SACNAS, Sept 29-Oct 1, 2005, Denver, CO What is Cyberinfrastructure? The Computer Science Perspective Dr. Chaitan Baru Project Director, The Geosciences.
Data Grid: Storage Resource Broker Mike Smorul. SRB Overview Developed at San Diego Supercomputing Center. Provides the abstraction mechanisms needed.
San Diego Supercomputer Center, University of California at San Diego Grid Physics Network (GriPhyN) University of Florida A Data Storage Language for.
Pharmaceutical R&D and the role of semantics in information management and decision- making Otto Ritter AstraZeneca R&D Boston W3C Workshop on Semantic.
DSM Workshop, October 22 OOPSLA 2006 Model-Based Workflows Leonardo Salayandía University of Texas at El Paso.
Fungal Semantic Web Stephen Scott, Scott Henninger, Leen-Kiat Soh (CSE) Etsuko Moriyama, Ken Nickerson, Audrey Atkin (Biological Sciences) Steve Harris.
1 CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Global Earth Observation Grid Workshop, Bangkok, Thailand, March Integration Platform.
EInfrastructures (Internet and Grids) - 15 April 2004 Sharing ICT Resources – “Think Globally, Act Locally” A point-of-view from the United States Mary.
GIS at SDSC Domains: –From geology, environmental science, hydrology, ocean biodiversity, regional development, Katrina response, archaeology, to neuroscience.
SAN DIEGO SUPERCOMPUTER CENTER Developing a CUAHSI HIS Data Node, as part of Cyberinfrastructure for the Hydrologic Sciences David Valentine Ilya Zaslavsky.
Semantic Web for E-Science and Education Enrico Motta Knowledge Media Institute The Open University, UK.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
State Geological Survey Contributions to the National Geothermal Data System.
GEON: The User Perspective Choonhan Youn Dogan Seber, Chaitan Baru, Ashraf Memon San Diego Supercomputer Center, University of California at San Diego.
Computing in Atmospheric Sciences Workshop: 2003 Challenges of Cyberinfrastructure Alan Blatecky Executive Director San Diego Supercomputer Center.
Developing Reusable Software Infrastructure – Middleware – for Multiscale Modeling Wilfred W. Li, Ph.D. National Biomedical Computation Resource Center.
18:15:32Service Oriented Cyberinfrastructure Lab, Grid Deployments Saul Rioja Link to presentation on wiki.
San Diego Supercomputer Center EDBT'02, Prague 1 EDBT Panel, March 2002, Prague: Scientific Data Integration for Complex Multiple-Worlds Scenarios: Databases.
GEON Science Application Demos
Introduction for BEAM Ecological Niche Modeling Working Meeting Deana Pennington University of New Mexico December 14, 2004.
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
GEON-UTEP GEON-Knowledge Representation WG Update GEON-KR list (currently) Bertram Ludaescher (SDSC: Bertram Ludaescher (SDSC:
Supporting Large-Scale Science with Workflows Deana Pennington University of New Mexico Long-Term Ecological Research Network Office ITR: Science Environment.
The Yellow Group Design Informatics (Regli, Stone, Kusiak, Leifer, Gupta, Chung, Fenves, Law, Kopena)
GEON AHM, April 16-18, SDSC C YBERINFRASTRUCTURE FOR THE G EOSCIENCES Towards Semantic Mediation for GEON: Facilitating Scientific Data Integration using.
Physical model Model results HPCC Data Modeling Environment Core Grid Services Authentication, monitoring, scheduling, catalog, data transfer, Replication,
Pipelines and Scientific Workflows with Ptolemy II Deana Pennington University of New Mexico LTER Network Office Shawn Bowers UCSD San Diego Supercomputer.
Investigators: Chaitan Baru, Randy Keller, Dogan Seber, Krishna Sinha, Ramon Arrowsmith, Boyan Brodaric, Karl Flessa, Eric Frost, Ann Gates, Mark Gahegan,
Science Environment for Ecological Knowledge: EcoGrid Matthew B. Jones National Center for.
Mark Ellisman, Ph.D. Professor of Neurosciences and Bioengineering Director, BIRN Coordinating Center Center for Research on Biological Systems University.
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida DGL: The Assembly Language for Grid Computing Arun swaran Jagatheesan.
Cyberinfrastructure and EarthScope Science goals: A GEON perspective What is Cyberinfrastructure? What is GEON? How will GEON research facilitate discovery.
Breakout #2 Generic Classes of Issues Hardware –big iron (capability, not just capacity) Network –last-mile problem –computational grid Software/frameworks.
GEON Cyberinfrastructure Workshop Beijing, China, July 21-23, 2006 Workflow-Driven Ontologies for the Geosciences Leonardo Salayandía The University of.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Kepler includes contributors from GEON, SEEK, SDM Center and Ptolemy II, supported by NSF ITRs (SEEK), EAR (GEON), DOE DE-FC02-01ER25486.
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
ACGT: Open Grid Services for Improving Medical Knowledge Discovery Stelios G. Sfakianakis, FORTH.
SIG: Synthetic Seismogram Exchange Standards (formats & metadata) Is it time to establish exchange standards for synthetic seismograms? IRIS Annual Workshop.
CNI, 3rd April 2006 Slide 1 UK National Centre for Text Mining: Activities and Plans Dr. Robert Sanderson Dept. of Computer Science University of Liverpool.
EScience Workshop on Scientific Workflows Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara.
GEOSCIENCE NEEDS & CHALLENGES Dogan Seber San Diego Supercomputer Center University of California, San Diego, USA.
Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES GEON IT Advances: Overview Chaitan Baru San Diego Supercomputer Center.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for.
A Cyberinfrastructure Framework for Discovery, Integration, and Analysis of Earth Science Data A Prototype System A. K. Sinha, Z. Malik, A. Rezgui, A.
SEEK Science Environment for Ecological Knowledge l EcoGrid l Ecological, biodiversity and environmental data l Computational access l Standardized, open.
Semantic Mediation and Scientific Workflows Bertram Ludäscher Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego.
CSE-291: Ontologies in Data Integration Department of Computer Science & Engineering University of California, San Diego CSE-291: Ontologies in Data Integration.
2005 GRIDS Community Workshop1 Learning From Cyberinfrastructure Initiatives Grid Research Integration Development & Support
OWL-S: As a Semantic Mark-up Language for Grid Services By Narendranadh.J.
GEONSearch: From Searching to Recommending GeoInformatics 2006 May 10-12, Reston, Virginia Ullas Nambiar, Bertram Ludaescher Dept. of Computer Science.
An Extensible Model-Based Mediator System with Domain Maps Amarnath Gupta * Bertram Ludäscher * Maryann E. Martone + * San Diego Supercomputer Center (SDSC)
National Archives and Records Administration1 Integrated Rules Ordered Data System (“IRODS”) Technology Research: Digital Preservation Technology in a.
High Risk 1. Ensure productive use of GRID computing through participation of biologists to shape the development of the GRID. 2. Develop user-friendly.
National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center KNOW-ME (KNOWledge-Map-Explorer) Semantic Browsing of Integrated.
High throughput biology data management and data intensive computing drivers George Michaels.
All Hands Meeting 2005 BIRN-CC: Building, Maintaining and Maturing a National Information Infrastructure to Enable and Advance Biomedical Research.
SAN DIEGO SUPERCOMPUTER CENTER, UCSD NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Introduction to SDSC Fran Berman Director, SDSC and.
A Science Collaboration Environment for the Network for Earthquake Engineering Simulation (NEES) Choonhan Youn Chaitan Baru, Ahmed Elgamal,
GEON IT Solutions: Products and Demos Chaitan Baru San Diego Supercomputer Center.
EcoGrid in SEEK A Data Grid System for Ecology Bertram Ludaescher University of California, Davis Arcot Rajasekar San Diego Supercomputer Center, University.
Joslynn Lee – Data Science Educator
Data R&D Issues for GTL Bertram Ludäscher Data and Knowledge Systems
Dr Kristin Stock Allworlds Geothinking
A Semantic Type System and Propagation
Presentation transcript:

Data R&D Issues for GTL Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego Bertram Ludäscher

Data R&D Issues for GTL GTL data management infrastructure GTL data management infrastructure Service-oriented Data Grids for Service-oriented Data Grids for Seamless data sharing (volume, distribution, access restrictions, …) Capabilities for data integration (mediators/warehouses), digital library functions, knowledge-based (“semantic”) extensions (e.g. ontologies), and archival capabilities Data analysis and knowledge-enabling infrastructure Data analysis and knowledge-enabling infrastructure Analytical Pipelines (“Scientific Workflows”) Analytical Pipelines (“Scientific Workflows”) Rapid design and prototyping, handling of complex data & task semantics, large volume, sci. workflow as a first-class product, validation, execution, monitoring, sharing, archiving How to go from a scientist’s abstract (conceptual) workflow to a data grid execution plan? New Model Management and Knowledge Representation Technologies : New Model Management and Knowledge Representation Technologies : Closing the gap between data management (DBMS’s, data grids) and knowledge-based systems (desktop- oriented, rule-based systems) and analysis and modeling systems Mapping between numerous formalisms at the syntactic, structural, and semantic level (terminological, process-semantics, …) “Gluing” together models and formalisms across different levels: from genes to proteins to molecular machines to microbial communities…(compare: pnp transistors, boolean circuits, assembly language, high-level PLs, declarative QLs, … )  abstraction & elaboration mechanisms  Data exploration and hypothesis generation tools (KNOW-ME, SKIDL, SEEK AMS, …) Computational facilities Computational facilities Use of high-end networked facilities a la TeraGrid Use of high-end networked facilities a la TeraGrid Opportunities (and challenges!) in leveraging related efforts: Opportunities (and challenges!) in leveraging related efforts: NIH BIRN, …, NSF Cyberinfrastructure (ITRs GEON, GriPhyN, SCEC, SEEK, …), UK e-Science, … NIH BIRN, …, NSF Cyberinfrastructure (ITRs GEON, GriPhyN, SCEC, SEEK, …), UK e-Science, … Standardization (OGSA, KR/Semantic Web technologies, e.g., ontology languages (OWL), inference mechanisms, …), scientific workflow standards, …  interoperable, open source tools Standardization (OGSA, KR/Semantic Web technologies, e.g., ontology languages (OWL), inference mechanisms, …), scientific workflow standards, …  interoperable, open source tools One size/standards fits all? Probably not: data-intensive vs computation-intensive vs “semantics-intensive” (capturing implicit domain knowledge, hidden assumptions, …) One size/standards fits all? Probably not: data-intensive vs computation-intensive vs “semantics-intensive” (capturing implicit domain knowledge, hidden assumptions, …)

Bonus Material (beyond 1 slide limit ;-) starts here …

Up & Down: Abstraction & Elaboration Mechanisms Knowledge Mgmt Information Mgmt Data Management How to punch through the technology barriers? Data Grids vs Digital Libraries vs DBMS’s vs Knowledge-Based Analysis & Modeling Systems

Biomedical Informatics Research Network

Biomedical Informatics Research Network Biomedical Informatics Research Network Getting Formal: Source Contextualization & Ontology Refinement in Logic

Scientific Data Integration... Questions to Queries... What is the distribution and U/ Pb zircon ages of A-type plutons in VA? How about their 3-D geometry ? How does it relate to host rock structures? ? Information Integration Geologic Map (Virginia) GeoChemical GeoPhysical (gravity contours) GeoChronologic (Concordia) Foliation Map (structure DB) “Complex Multiple-Worlds” Mediation domain knowledge Database mediation Data modeling Knowledge Representation: ontologies, concept spaces raw data GeoSciences Network

Geologic Map Integration: Geo & IT/CS meet domain knowledge domain knowledge Knowledge representation AGE ONTOLOGY Nevada Geoscientists + Computer Scientists Igneous Geoinformaticists +/- Energy GEON Metamorphism Equation: +/- a few hundred million years 

Large collaborative NSF/ITR project: UNM, UCSB, UCSD (SDSC), UKansas,.. Large collaborative NSF/ITR project: UNM, UCSB, UCSD (SDSC), UKansas,.. “Analysis & Modeling System” to design, execute, reproduce/refine scientific workflows in the ecology and biodiversity domains. “Analysis & Modeling System” to design, execute, reproduce/refine scientific workflows in the ecology and biodiversity domains. SEEK Project Overview