Astrophysics, Biology, Climate, Combustion, Fusion, Nanoscience Working Group on Simulation-Driven Applications 10 CS, 10 Sim, 1 VR.

Slides:



Advertisements
Similar presentations
GRADD: Scientific Workflows. Scientific Workflow E. Science laboris Workflows are the new rock and roll of eScience Machinery for coordinating the execution.
Advertisements

The Big Picture Scientific disciplines have developed a computational branch Models without closed form solutions solved numerically This has lead to.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
Components and Architecture CS 543 – Data Warehousing.
Who am I? ● Catalin Comanici ● QA for 10 years, doing test automation for about 6 years ● fun guy and rock star wannabe.
Introduction to Computer and Programming CS-101 Lecture 6 By : Lecturer : Omer Salih Dawood Department of Computer Science College of Arts and Science.
N Tropy: A Framework for Analyzing Massive Astrophysical Datasets Harnessing the Power of Parallel Grid Resources for Astrophysical Data Analysis Jeffrey.
Metadata Creation with the Earth System Modeling Framework Ryan O’Kuinghttons – NESII/CIRES/NOAA Kathy Saint – NESII/CSG July 22, 2014.
DOE BER Climate Modeling PI Meeting, Potomac, Maryland, May 12-14, 2014 Funding for this study was provided by the US Department of Energy, BER Program.
Cluster Reliability Project ISIS Vanderbilt University.
NEPTUNE Canada Workshop Oceans 2.0 Project Environment NEPTUNE Canada DMAS Team Victoria, BC February 16, 2009.
Cactus Project & Collaborative Working Gabrielle Allen Max Planck Institute for Gravitational Physics, (Albert Einstein Institute)
Ohio State University Department of Computer Science and Engineering 1 Cyberinfrastructure for Coastal Forecasting and Change Analysis Gagan Agrawal Hakan.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
OpenSees on NEEShub Frank McKenna UC Berkeley. Bell’s Law Bell's Law of Computer Class formation was discovered about It states that technology.
Astrophysics, Biology, Climate, Combustion, Fusion, HEP, Nanoscience Sim Scientist DOE NL.
IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.
A framework to support collaborative Velo: Knowledge Management for Collaborative (Science | Biology) Projects A framework to support collaborative 1.
Virtual Laboratories VGL and Friends R.Fraser, T.Rankine, J.Vote, R.Woodcock AuScope Grid Roadshow 2014 CSIRO | MINERAL RESOURCES FLAGSHIP.
Presented by On the Path to Petascale: Top Challenges to Scientific Discovery Scott A. Klasky NCCS Scientific Computing End-to-End Task Lead.
EARTH SCIENCE MARKUP LANGUAGE Why do you need it? How can it help you? INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
A performance evaluation approach openModeller: A Framework for species distribution Modelling.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Introduction to Software Engineering. Why SE? Software crisis manifested itself in several ways [1]: ◦ Project running over-time. ◦ Project running over-budget.
Operating Systems David Goldschmidt, Ph.D. Computer Science The College of Saint Rose CIS 432.
Framework for MDO Studies Amitay Isaacs Center for Aerospace System Design and Engineering IIT Bombay.
Experts in numerical algorithms and High Performance Computing services Challenges of the exponential increase in data Andrew Jones March 2010 SOS14.
02/09/2010 Industrial Project Course (234313) Virtualization-aware database engine Final Presentation Industrial Project Course (234313) Virtualization-aware.
Presented by Scientific Data Management Center Nagiza F. Samatova Network and Cluster Computing Computer Sciences and Mathematics Division.
Your name here SPA: Successes, Status, and Future Directions Terence Critchlow And many, many, others Scientific Process Automation PNNL.
F. Douglas Swesty, DOE Office of Science Data Management Workshop, SLAC March Data Management Needs for Nuclear-Astrophysical Simulation at the Ultrascale.
Cactus/TIKSL/KDI/Portal Synch Day. Agenda n Main Goals:  Overview of Cactus, TIKSL, KDI, and Portal efforts  present plans for each project  make sure.
CS 127 Introduction to Computer Science. What is a computer?  “A machine that stores and manipulates information under the control of a changeable program”
Toward interactive visualization in a distributed workflow Steven G. Parker Oscar Barney Ayla Khan Thiago Ize Steven G. Parker Oscar Barney Ayla Khan Thiago.
Generic GUI – Thoughts to Share Jinping Gwo EMSGi.org.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Applications and Requirements for Scientific Workflow May NSF Geoffrey Fox Indiana University.
DOE Network PI Meeting 2005 Runtime Data Management for Data-Intensive Scientific Applications Xiaosong Ma NC State University Joint Faculty: Oak Ridge.
Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.
Supercomputing 2006 Scientific Data Management Center Lead Institution: LBNL; PI: Arie Shoshani Laboratories: ANL, ORNL, LBNL, LLNL, PNNL Universities:
Mike Hildreth DASPOS Update Mike Hildreth representing the DASPOS project 1.
Applications and Requirements for Scientific Workflow May NSF Geoffrey Fox Indiana University.
Lawrence Livermore National Laboratory 1 Science & Technology Principal Directorate - Computation Directorate Scalable Fault Tolerance for Petascale Systems.
SDM Center Experience with Fusion Workflows Norbert Podhorszki, Bertram Ludäscher Department of Computer Science University of California, Davis UC DAVIS.
CS4315A. Berrached:CMS:UHD1 Introduction to Operating Systems Chapter 1.
Parallel IO for Cluster Computing Tran, Van Hoai.
5/19/05 New Geoscience Applications 1 A DISTRIBUTED WORKFLOW DATABASE DESIGNED FOR COREWALL APPLICATIONS Bill KampBill Kamp, Lumnilogical Research Center,
Climate-SDM (1) Climate analysis use case –Described by: Marcia Branstetter Use case description –Data obtained from ESG –Using a sequence steps in analysis,
Software. Introduction n A computer can’t do anything without a program of instructions. n A program is a set of instructions a computer carries out.
VIEWS b.ppt-1 Managing Intelligent Decision Support Networks in Biosurveillance PHIN 2008, Session G1, August 27, 2008 Mohammad Hashemian, MS, Zaruhi.
Get Data to Computation eudat.eu/b2stage B2STAGE How to shift large amounts of data Version 4 February 2016 This work is licensed under the.
Data Infrastructure in the TeraGrid Chris Jordan Campus Champions Presentation May 6, 2009.
Presented by Robust Storage Management On Desktop, in Machine Room, and Beyond Xiaosong Ma Computer Science and Mathematics Oak Ridge National Laboratory.
Chapter Goals Describe the application development process and the role of methodologies, models, and tools Compare and contrast programming language generations.
Compute and Storage For the Farm at Jlab
VisIt Project Overview
Deployment of Flows Loretta Auvil
OPERATING SYSTEMS CS 3502 Fall 2017
Spark Presentation.
Grid Computing AEI Numerical Relativity Group has access to high-end resources in over ten centers in Europe/USA They want: Bigger simulations, more simulations.
Automatic launch and tracking the computational simulations with LiFlow and Sumatra Evgeniy Kuklin.
Mean Value Analysis of a Database Grid Application
C2CAMP (A Working Title)
DOE 2000 PI Retreat Breakout C-1
SDM workshop Strawman report History and Progress and Goal.
Overview of Workflows: Why Use Them?
Defining the Grid Fabrizio Gagliardi EMEA Director Technical Computing
Parallel I/O for Distributed Applications (MPI-Conn-IO)
Presentation transcript:

Astrophysics, Biology, Climate, Combustion, Fusion, Nanoscience Working Group on Simulation-Driven Applications 10 CS, 10 Sim, 1 VR

5/17/2004Chicago Meeting DOE Data Management 2 Workflows Critical Need: Enable (and Automate) Scientific Work Flows –Data Storage –Data Transfer –Data Analysis –Visualization An order of magnitude more time can be spent on manually managing these work flows than on performing the simulation science itself.

5/17/2004Chicago Meeting DOE Data Management 3 Simulations Simulations run in batch mode. Remaining workflow interactive or “on demand.” Simulation and analyses performed by distributed teams of research scientists. –Need to access remote and distributed data, resources. –Need for distributed collaborative environments. Some solutions will be team dependent. Example: Remote Viz. vs. Local Viz., Parallel HDF5 vs. Parallel netcdf, …

5/17/2004Chicago Meeting DOE Data Management 4 Let thought be the bottleneck Simulation Scientists generally have scripts to semi- automate this process. To expedite this process they need to: –fully automate the workflow, –remove the bottlenecks. Better visualization, better data analysis routines, will allow users to decrease the interpretation time. Better routines to “find the needle in the haystack” will allow the thought process to be decreased. Faster turn around time for simulations will decrease the code runtimes. –Better numerical algorithms. –More scalable algorithms. –Faster processors, faster networking, faster I/O. –Better batch systems… –More HPC systems.

5/17/2004Chicago Meeting DOE Data Management 5 Data Management (2) To expedite this process they need to: –Have a common data model to move data from simulation to analysis to viz. –Need for metadata, annotation, and provenance: Nature of Metadata –Code versions. –Simulation parameters. –Model parameters. –Information on simulation inputs (e.g., from experiments and/or other simulations). –Machine configuration. –Compiler information. –Need for tools to record provenance in databases. Additional provenance (above that provided by the above metadata) needed to describe: –Reliability of data. –How the data arrived in the form in which it was accessed. –Data ownership.

5/17/2004Chicago Meeting DOE Data Management 6 Critical to develop a unified data model. Can we build analysis routines which can be used for multiple codes? Multiple disciplines?? Standards. Data Model must allow flexibility. –Commonly we add/subtract variables used in the simulations/analysis routines. –Must deal with AMR calculations.

5/17/2004Chicago Meeting DOE Data Management 7 Biggest Bottleneck: Interpretation of Results This is the biggest bottleneck because: –Babysitting Scientists spend their “real-time” babysitting computational experiments (trying to interpret results, and move data, and orchestrate the computational pipeline). Deciding if the analysis routines are working properly with this “new” data. –Non scaleable data analysis routines Looking for the “needle in the haystack”. Better analysis routines could mean less time in the thought process and in the interpretation of the results.

5/17/2004Chicago Meeting DOE Data Management 8 Important Component: Parallel I/O –Need for significant developments in parallel I/O. Need for a portable, efficient industry standard. Need for interoperability between parallel and non-parallel I/O. – Degree of parallelism varies across the work flow. Important in multiple stages of many Work Flows: –From: Output of simulation data. –To: I/O for parallel rendering for end-product scientific visualization. –Need to cache, archive, replicate, subset, and distribute large data sets. Archival storage required to store data that takes months to produce. Data will be post-processed as it is produced, requiring that it be cached/staged. Replication, subsetting, and distribution serve multiple purposes (e.g., data staging for visualization).

5/17/2004Chicago Meeting DOE Data Management 9 Needed Technologies Auto Workflow Data Storage and Access Data Movement Data Analysis MetadataDB Access and Query Data Visualization Astro 5 (1)6 (1)7 (1)3 (1/2)214 (1/2) Fusion 6 (3/2)5 (1/2)7 (1/2)4 (1)213 (1/2) Combustion 36 (1/2)7 (1/2)5 (2)214 (1) Climate 3 (2) (2) Nano 7 (1/2)4 (1/2)26 (1)3 (1)1 (1/2)5 (1/2) Biology 27 (2)346 (1)15 (1)