Applications and Requirements for Scientific Workflow May 1 2006 NSF Geoffrey Fox Indiana University.

Slides:



Advertisements
Similar presentations
National e-Science Centre Glasgow e-Science Hub Opening: Remarks NeSCs Role Prof. Malcolm Atkinson Director 17 th September 2003.
Advertisements

DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
1 The EU e-Infrastructure Programme EUROCRIS Conference, Brussels, 17 th July 2007 Mário Campolargo European Commission - DG INFSO Head of Unit “GÉANT.
1 Challenges and New Trends in Data Intensive Science Panel at Data-aware Distributed Computing (DADC) Workshop HPDC Boston June Geoffrey Fox Community.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
The Data Lifecycle and the Curation of Laboratory Experimental Data Tony Hey Corporate VP for Technical Computing Microsoft Corporation.
Workshop on Workflows in Support of Large-Scale Science June 20, Paris, France In conjunction with HPDC 2006 HPDC Ewa Deelman,
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CIF21) NSF-wide Cyberinfrastructure Vision People, Sustainability, Innovation,
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CF21) IRNC Kick-Off Workshop July 13,
Using Sakai to Support eScience Sakai Conference June 12-14, 2007 Sayeed Choudhury Tim DiLauro, Jim Martino, Elliot Metsger, Mark Patton and David Reynolds.
Riding the Wave: a Perspective for Today and the Future APA Conference, November 2011 Monica Marinucci EMEA Director for Research, Oracle.
McGuinness – Microsoft eScience – December 8, Semantically-Enabled Science Informatics: With Supporting Knowledge Provenance and Evolution Infrastructure.
Astrophysics, Biology, Climate, Combustion, Fusion, Nanoscience Working Group on Simulation-Driven Applications 10 CS, 10 Sim, 1 VR.
The Changing Face of Research Anthony Beitz DART Integration Manager.
1 Building National Cyberinfrastructure Alan Blatecky Office of Cyberinfrastructure EPSCoR Meeting May 21,
Computing in Atmospheric Sciences Workshop: 2003 Challenges of Cyberinfrastructure Alan Blatecky Executive Director San Diego Supercomputer Center.
Cloud computing is the use of computing resources (hardware and software) that are delivered as a service over the Internet. Cloud is the metaphor for.
© 2013 National Ecological Observatory Network, Inc. ALL RIGHTS RESERVED. THE NEON APPROACH TO DATA INGEST, CURATION, AND SHARING Christine Laney (Data.
Open Science Grid For CI-Days Internet2: Fall Member Meeting, 2007 John McGee – OSG Engagement Manager Renaissance Computing Institute.
1 Common Challenges Across Scientific Disciplines Laurence Field CERN 18 th November 2013.
Sage Bionetworks Mission Sage Bionetworks is a non-profit organization with a vision to create a “commons” where integrative bionetworks are evolved by.
Some Thoughts on Sensor Network Research Krishna Kant Program Director National Science Foundation CNS/CSR Program.
Ohio State University Department of Computer Science and Engineering 1 Cyberinfrastructure for Coastal Forecasting and Change Analysis Gagan Agrawal Hakan.
API, Interoperability, etc.  Geoffrey Fox  Kathy Benninger  Zongming Fei  Cas De’Angelo  Orran Krieger*
Linked-data and the Internet of Things Payam Barnaghi Centre for Communication Systems Research University of Surrey March 2012.
1 USC Information Sciences Institute Yolanda Gil AAAI-08 Tutorial July 13, 2008 AAAI-08 Tutorial on Computational Workflows for Large-Scale.
DOE 2000, March 8, 1999 The IT 2 Initiative and NSF Stephen Elbert program director NSF/CISE/ACIR/PACI.
FutureGrid Connection to Comet Testbed and On Ramp as a Service Geoffrey Fox Indiana University Infra structure.
Ocean Observatories Initiative OOI Cyberinfrastructure Architecture Overview Michael Meisinger September 29, 2009.
Application of Provenance for Automated and Research Driven Workflows Tara Gibson June 17, 2008.
Introduction to Science Informatics Lecture 1. What Is Science? a dependence on external verification; an expectation of reproducible results; a focus.
IPlant Collaborative Hands-on Cyberinfrastructure Workshop – Part 2 R. Walls University of Arizona Biodiversity Information Standards (TDWG) Sep. 29, 2015,
Geosciences - Observations (Bob Wilhelmson) The geosciences in NSF’s world consists of atmospheric science, ocean science, and earth science Many of the.
TWC Deep Earth Computer: A Platform for Linked Science of the Deep Carbon Observatory Community Xiaogang (Marshall) Ma, Yu Chen, Han Wang, Patrick West,
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Deepcarbon.net Xiaogang (Marshall) Ma, Yu Chen, Han Wang, John Erickson, Patrick West, Peter Fox Tetherless World Constellation Rensselaer Polytechnic.
ICCS WSES BOF Discussion. Possible Topics Scientific workflows and Grid infrastructure Utilization of computing resources in scientific workflows; Virtual.
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
Non-Traditional Databases. Reading 1. Scientific data management at the Johns Hopkins institute for data intensive engineering and science Yanif Ahmad,
ISERVOGrid Architecture Working Group Brisbane Australia June Geoffrey Fox Community Grids Lab Indiana University
The Astronomy challenge: How can workflow preservation help? Susana Sánchez, Jose Enrique Ruíz, Lourdes Verdes-Montenegro, Julian Garrido, Juan de Dios.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
Applications and Requirements for Scientific Workflow Introduction May NSF Geoffrey Fox Indiana University.
Interoperability from the e-Science Perspective Yannis Ioannidis Univ. Of Athens and ATHENA Research Center
Group Science J. Marc Overhage MD, PhD Regenstrief Institute Indiana University School of Medicine.
Real World Experiences in Operating a Collaboratory: The Protein Data Bank Helen M. Berman Board of Governors Professor of Chemistry.
Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet.
Computational Science & Engineering meeting national needs Steven F. Ashby SIAG-CSE Chair March 24, 2003.
Applications and Requirements for Scientific Workflow May NSF Geoffrey Fox Indiana University.
NIH and the Clinical Research Enterprise Third Annual Medical Research Summit March 6, 2003 Mary S. McCabe National Institute of Health.
OGCE Workflow and LEAD Overview Suresh Marru, Marlon Pierce September 2009.
Project number: ENVRI and the Grid Wouter Los 20/02/20161.
E ARTHCUBE C ONCEPTUAL D ESIGN A Scalable Community Driven Architecture Overview PI:
CIMA and Semantic Interoperability for Networked Instruments and Sensors Donald F. (Rick) McMullen Pervasive Technology Labs at Indiana University
The Global Scene Wouter Los University of Amsterdam The Netherlands.
All Hands Meeting 2005 BIRN-CC: Building, Maintaining and Maturing a National Information Infrastructure to Enable and Advance Biomedical Research.
1 Open Science Grid: Project Statement & Vision Transform compute and data intensive science through a cross- domain self-managed national distributed.
European Life Sciences Infrastructure for Biological Information ELIXIR’s needs from the EOSC Steven Newhouse, EMBL-EBI Part of the.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No Support to scientific.
Presented by SciDAC-2 Petascale Data Storage Institute Philip C. Roth Computer Science and Mathematics Future Technologies Group.
AMSA TO 4 Advanced Technology for Sensor Clouds 09 May 2012 Anabas Inc. Indiana University.
Geoffrey Fox Panel Talk: February
The BlueBRIDGE project
Pasquale Pagano (CNR-ISTI) Project technical director
RDA US Science workshop Arlington VA, Aug 2014 Cees de Laat with many slides from Ed Seidel/Rob Pennington.
Earthquakes: Some staggering facts
Computer Science and Engineering
Bird of Feather Session
GGF10 Workflow Workshop Summary
Presentation transcript:

Applications and Requirements for Scientific Workflow May NSF Geoffrey Fox Indiana University

Team Members Geoffrey Fox, Indiana University (lead) Mark Ellisman, UCSD Constantinos Evangelinos, Massachusetts Institute of Technology Alexander Gray, Georgia Tech Walt Scacchi, University of California, Irvine Ashish Sharma, Ohio State University Alex Szalay, John Hopkins University

The Application Drivers Phrase as transformative research – does term “scientific workflow” conjure up the innovative future or perhaps a bureaucratic past? –Distributed interdisciplinary data deluged scientific methodology as an end (instrument, conjecture) to end (paper, Nobel prize) process is a Transformative approach –Provide CS support for this scientific revolution Ground these in scenarios or in application descriptions that lead to these requirements –Spans all NSF directorates –Astronomy (multi-wavelength VO), Biology (Genomics/Proteomics), Chemistry (Drug Discovery), Environmental Science (multi-sensor monitors as in NEON), Engineering (NEES, multi-disciplinary design), Geoscience (Ocean/Weather/Earth(quake) data assimilation), Medicine (multi-modal/instrument imaging), Physics (LHC, Material design), Social science (Critical Infrastructure simulations for DHS) etc.

Complex scientific methodology produces modern scientific results

CHANDRA X-ray observatory data processing workflow streams

What has changed? Exponential growth in Compute(18), Sensors(18?), Data storage(12), Network(8) (doubling time in months); performance variable in practice (last mile for networks) Data deluge (ignored largely in grand challenges, HPCC ) Algorithms (simulation, data analysis) comparable additional improvements Growth of Science not same exponential? (remains linear? More interdisciplinary) Distributed scientists and distributed shared data (not uniform in all fields) Establishes distributed data deluged scientific methodology Improve community practices that have not kept pace with changed methodology (e.g. DICOM standard in medical imaging only supports partial provenance, standalone applications don’t support distribution)

Application Requirements I Reproducibility core to Scientific method and requires rich provenance, interoperable persistent repositories with linkage of open data and publication as well as distributed simulations, data analysis and new algorithms. Distributed Science Methodology captures and publishes all steps (a rich cloud of resources including s, Wikis as new electronic log books as well as databases, compiler options …) in scientific process (data analysis) in a fashion that allows process to be reproducible; need to be able to electronically reference steps in process. –Traditional workflow like BPEL only describes a small part of this Multiple collaborative heterogeneous interdisciplinary approaches to all aspects of the distributed science methodology inevitable; need research on integration of this diversity Multiple “ibilities” (security, reliability, usability, scalability)

Application Requirements II Cope with inevitable inconsistencies and inadequacies of metadata standards; support for curation, data validation and “scrubbing” in algorithms and provenance; reputation and trust systems for data providers As we scale size and richness of data and algorithms, need a scalable methodology that hides complexity (compatible with number of scientists increasing slowly); must be simple and validatable Automate efficient provisioning, deployment and provenance generation of complex simulations and data analysis; support deployment and interoperable specification of user’s abstract workflow; support interactive user Support automated and innovative individual contributions to core “black boxes” (produced by “marine corps” for “common case”) and for general user’s actions such as choice and annotation.