SDM workshop Strawman report History and Progress and Goal.

Slides:



Advertisements
Similar presentations
A Lightweight Platform for Integration of Mobile Devices into Pervasive Grids Stavros Isaiadis, Vladimir Getov University of Westminster, London {s.isaiadis,
Advertisements

GRADD: Scientific Workflows. Scientific Workflow E. Science laboris Workflows are the new rock and roll of eScience Machinery for coordinating the execution.
Abstraction Layers Why do we need them? –Protection against change Where in the hourglass do we put them? –Computer Scientist perspective Expose low-level.
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Priority Research Direction (I/O Models, Abstractions and Software) Key challenges What will you do to address the challenges? – Develop newer I/O models.
Workshop on HPC in India Grid Middleware for High Performance Computing Sathish Vadhiyar Grid Applications Research Lab (GARL) Supercomputer Education.
EU-GRID Work Program Massimo Sgaravatto – INFN Padova Cristina Vistoli – INFN Cnaf as INFN members of the EU-GRID technical team.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Workshop Goals Richard P. Mount May 24, 2004 DOE Office of Science Data Management Workshop.
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
Distributed Application Management Using PLuSH Jeannie Albrecht, Christopher Tuttle, Alex C. Snoeren, and Amin Vahdat UC San Diego CSE {jalbrecht, ctuttle,
Astrophysics, Biology, Climate, Combustion, Fusion, Nanoscience Working Group on Simulation-Driven Applications 10 CS, 10 Sim, 1 VR.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
NGNS Program Managers Richard Carlson Thomas Ndousse ASCAC meeting 11/21/2014 Next Generation Networking for Science Program Update.
The Open Grid Service Architecture (OGSA) Standard for Grid Computing Prepared by: Haoliang Robin Yu.
Copyright © 2012 Cleversafe, Inc. All rights reserved. 1 Combining the Power of Hadoop with Object-Based Dispersed Storage.
Alok 1Northwestern University Access Patterns, Metadata, and Performance Alok Choudhary and Wei-Keng Liao Department of ECE,
WP-8, ZIB WP-8: Data Handling And Visualization Review Meeting Report Felix Hupfeld, Andrei Hutanu, Andre Merzky, Thorsten Schütt, Brygg Ullmer Zuse-Institute-Berlin.
A Lightweight Platform for Integration of Resource Limited Devices into Pervasive Grids Stavros Isaiadis and Vladimir Getov University of Westminster
DOE BER Climate Modeling PI Meeting, Potomac, Maryland, May 12-14, 2014 Funding for this study was provided by the US Department of Energy, BER Program.
Ohio State University Department of Computer Science and Engineering 1 Cyberinfrastructure for Coastal Forecasting and Change Analysis Gagan Agrawal Hakan.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
Contents 1.Introduction, architecture 2.Live demonstration 3.Extensibility.
The PROGRESS Grid Service Provider Maciej Bogdański Portals & Portlets 2003 Edinburgh, July 14th-17th.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
1 4/23/2007 Introduction to Grid computing Sunil Avutu Graduate Student Dept.of Computer Science.
NOVA Networked Object-based EnVironment for Analysis P. Nevski, A. Vaniachine, T. Wenaus NOVA is a project to develop distributed object oriented physics.
SDM Center’s Data Mining & Analysis SDM Center Parallel Statistical Analysis with RScaLAPACK Parallel, Remote & Interactive Visual Analysis with ASPECT.
Tools for collaboration How to share your duck tales…
Towards Exascale File I/O Yutaka Ishikawa University of Tokyo, Japan 2009/05/21.
OSG Tier 3 support Marco Mambelli - OSG Tier 3 Dan Fraser - OSG Tier 3 liaison Tanya Levshina - OSG.
ICCS WSES BOF Discussion. Possible Topics Scientific workflows and Grid infrastructure Utilization of computing resources in scientific workflows; Virtual.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
© 2012 Whamcloud, Inc. Lustre Development Update Dan Ferber Whamcloud, Inc. IDC HPC User Group April 16-17, 2012.
F. Douglas Swesty, DOE Office of Science Data Management Workshop, SLAC March Data Management Needs for Nuclear-Astrophysical Simulation at the Ultrascale.
Toward interactive visualization in a distributed workflow Steven G. Parker Oscar Barney Ayla Khan Thiago Ize Steven G. Parker Oscar Barney Ayla Khan Thiago.
International Symposium on Grid Computing (ISGC-07), Taipei - March 26-29, 2007 Of 16 1 A Novel Grid Resource Broker Cum Meta Scheduler - Asvija B System.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
CD FY09 Tactical Plan Status FY09 Tactical Plan Status Report for Neutrino Program (MINOS, MINERvA, General) Margaret Votava April 21, 2009 Tactical plan.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
Building and managing production bioclusters Chris Dagdigian BIOSILICO Vol2, No. 5 September 2004 Ankur Dhanik.
Satisfying Requirements BPF for DRA shall address: –DAQ Environment (Eclipse RCP): Gumtree ISEE workbench integration; –Design Composing and Configurability,
OGCE Workflow and LEAD Overview Suresh Marru, Marlon Pierce September 2009.
Experiments in Utility Computing: Hadoop and Condor Sameer Paranjpye Y! Web Search.
SDM Center Experience with Fusion Workflows Norbert Podhorszki, Bertram Ludäscher Department of Computer Science University of California, Davis UC DAVIS.
Derek Weitzel Grid Computing. Background B.S. Computer Engineering from University of Nebraska – Lincoln (UNL) 3 years administering supercomputers at.
Workflow Management Concepts and Requirements For Scientific Applications.
Origami: Scientific Distributed Workflow in McIDAS-V Maciek Smuga-Otto, Bruce Flynn (also Bob Knuteson, Ray Garcia) SSEC.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
A Data Handling System for Modern and Future Fermilab Experiments Robert Illingworth Fermilab Scientific Computing Division.
Copyright © 2004 R2AD, LLC Submitted to GGF ACS Working Group for GGF-16 R2AD, LLC Distributing Software Life Cycles Join the ACS Team GGF-16, Athens R2AD,
1 Open Science Grid: Project Statement & Vision Transform compute and data intensive science through a cross- domain self-managed national distributed.
Scientific Computing at Fermilab Lothar Bauerdick, Deputy Head Scientific Computing Division 1 of 7 10k slot tape robots.
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
NAREGI PSE with ACS S.Kawata 1, H.Usami 2, M.Yamada 3, Y.Miyahara 3, Y.Hayase 4 1 Utsunomiya University 2 National Institute of Informatics 3 FUJITSU Limited.
Landsat Remote Sensing Workflow
Introduction to Distributed Platforms
The Open Grid Service Architecture (OGSA) Standard for Grid Computing
Introduction.
Kirill Lozinskiy NERSC Storage Systems Group
Designing Business Intelligence Solutions with Microsoft SQL Server
Hadoop Technopoints.
Overview of Workflows: Why Use Them?
Gordon Erlebacher Florida State University
gLite The EGEE Middleware Distribution
GGF10 Workflow Workshop Summary
NOAA OneStop and the Cloud
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

SDM workshop Strawman report History and Progress and Goal

History Original plan In Extended EOC we came up with draft report Identify Scientific Applications Data Management needs Focus on different application types: simulations, experiments/observations Identify Data Management technologies Identify other relevant Computer Science technologies Identify Gaps, Cost, Priorities In Extended EOC we came up with draft report Based on extensive discussions of application needs Identified the scientific investigation process (workflow) Identified technologies needed Assigned writing to individuals

Section 2: Application sciences motivation and needs Astrophysics Biology Climate Modeling Combustion Fusion Energy Science High Energy and Nuclear Physics Nanotechnology

Section 3: The scientific investigation process Distributed Scientific Workflows Scientific Data Management Phases Data Generation Data Analysis Data Visualization Foundation of scientific data management technology Workflow, dataflow, data transformation Storage, data movement, grid, networks Metadata management and cataloging Efficient access and query, data integration Integrated analysis environment, visualization Requirements of supportive technologies Networking Visualization

Scientific Workflow Cycle Data Generation workflow workflow Scientific Data Management Data Visualization Data Analysis workflow

Section 4: Data Management Technologies and Gap Analysis 1) Workflow, dataflow, data transformation Workflow specification Workflow execution in distributed systems Monitoring of long-running workflows Adapting components to the framework Workflow layers Control-flow layer Application and Software Tools layer I/O System layer Storage and Network Resource layer

Astrophysical Simulation Workflow Cycle Application Layer Start New Simulation? Run Simulation batch job on capability system Continue Simulation? Simulation generates checkpoint files Archive checkpoint files to HPSS Migrate subset of checkpoint files to local cluster Vis & Analysis on local Beowulf cluster Parallel I/O Layer Parallel HDF5 Storage Layer HPSS GPFS PVFS or LUSTRE MSS, Disks, & OS

Section 4: Data Management Technologies and Gap Analysis 2) Storage, data movement, grid, networks Dynamic data storage and caching Robust terabyte-scale data movers Dataflow automation between components Multi-resolution data movement 3) Metadata management and cataloging Unified data models and API’s Annotation, ontologies and provenance Metadata requirements for workflows

Section 4: Data Management Technologies and Gap Analysis 4) Efficient access and query, data integration Parallel and random I/O Large-scale feature-based Indexing Query processing over files Data integration 5) Integrated analysis environment, visualization A single environment for packaged tools and user software A single environment for a variety of tools: statistical software, cluster analysis, … Coupling with visualization tools Work with parallel I/O

Section 5: Prioritization, Cost, and Management Prioritization process Reasons based on current barriers and needs Reasons based on long term projections Practical budgeting considerations Research and development Hardening and packaging Deployment and maintenance Recommendations and program planning Prioritization Cost Management Structure

Gap & Cost Matrix Workflow, dataflow, data transformation Research and Development Hardening and Packaging Deployment and maintenance Workflow, dataflow, data transformation Storage, data movement, grid, networks Metadata management and cataloging Efficient access and query, data integration Integrated analysis environment, visualization

Discussion items Research and Development Hardening and Packaging Deployment and maintenance Control flow tier Granularity of tasks, sub-workflows Task Invocation mechanisms-Web Services, Corba, Wrappers, Callbacks Human tasks: Notifications and alerts, steering Dataflow streaming granularity Work Tier Workflow engine for scientific applications Dataflow management Effect of dataflow on the control flow Failure detection and recovery Performance and bottleneck issues

The End