Workflow automation for processing plasma fusion simulation data Norbert Podhorszki Bertram Ludäscher Scientific Computing Group Oak Ridge National Laboratory.

Slides:



Advertisements
Similar presentations
Debugging ACL Scripts.
Advertisements

compilers and interpreters
1 OBJECTIVES To generate a web-based system enables to assemble model configurations. to submit these configurations on different.
ABSTRACT The goal of this project was to create a more realistic and interactive appliance interface for a Usability Science class here at Union. Usability.
Experiences in Integration of the 'R' System into Kepler Dan Higgins – National Center for Ecological Analysis and Synthesis (NCEAS), UC Santa Barbara.
Multithreading in Java Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Guide To UNIX Using Linux Third Edition
Programming Logic and Design, Introductory, Fourth Edition1 Understanding Computer Components and Operations (continued) A program must be free of syntax.
About the Presentations The presentations cover the objectives found in the opening of each chapter. All chapter objectives are listed in the beginning.
SETUP AND CONFIGURATIONS WEBLOGIC SERVER. 1.Weblogic Installation 2.Creating domain through configuration wizard 3.Creating domain using existing template.
Intermediate HTCondor: Workflows Monday pm Greg Thain Center For High Throughput Computing University of Wisconsin-Madison.
Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
1 Integrated Development Environment Building Your First Project (A Step-By-Step Approach)
1 Advanced Computer Programming Concurrency Multithreaded Programs Copyright © Texas Education Agency, 2013.
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
M. Taimoor Khan * Java Server Pages (JSP) is a server-side programming technology that enables the creation of dynamic,
FALL 2005CSI 4118 – UNIVERSITY OF OTTAWA1 Part 4 Web technologies: HTTP, CGI, PHP,Java applets)
Christopher Jeffers August 2012
SDM Center A Quick Update on the TSI and PIW workflows SDM All Hands March 2-3, Terence Critchlow, Xiaowen Xin, Bertram.
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
Presented by XGC: Gyrokinetic Particle Simulation of Edge Plasma CPES Team Physics and Applied Math Computational Science.
1 Lecture 4: Threads Operating System Fall Contents Overview: Processes & Threads Benefits of Threads Thread State and Operations User Thread.
AUTOBUILD Build and Deployment Automation Solution.
2nd April 2001Tim Adye1 Bulk Data Transfer Tools Tim Adye BaBar / Rutherford Appleton Laboratory UK HEP System Managers’ Meeting 2 nd April 2001.
Workflow Project Luciano Piccoli Illinois Institute of Technology.
1 Functions 1 Parameter, 1 Return-Value 1. The problem 2. Recall the layout 3. Create the definition 4. "Flow" of data 5. Testing 6. Projects 1 and 2.
SUMA: A Scientific Metacomputer Cardinale, Yudith Figueira, Carlos Hernández, Emilio Baquero, Eduardo Berbín, Luis Bouza, Roberto Gamess, Eric García,
Presented by On the Path to Petascale: Top Challenges to Scientific Discovery Scott A. Klasky NCCS Scientific Computing End-to-End Task Lead.
SDM Center End-to-end data management capabilities in the GPSC & CPES SciDAC’s: Achievements and Plans SDM AHM December 11, 2006 Scott A. Klasky End-to-End.
Extending HTML CPSC 120 Principles of Computer Science April 9, 2012.
Java Threads 11 Threading and Concurrent Programming in Java Introduction and Definitions D.W. Denbo Introduction and Definitions D.W. Denbo.
Accelerating Scientific Exploration Using Workflow Automation Systems Terence Critchlow (LLNL) Ilkay Altintas (SDSC) Scott Klasky(ORNL) Mladen Vouk (NCSU)
Fusion-SDM (1) Problem description –Each run in future: ¼ Trillion particles, 10 variables, 8 bytes –Each time step, generated every 60 sec is (250x10^^9)x8x10.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
Styx Grid Services: Lightweight, easy-to-use middleware for e-Science Jon Blower Keith Haines Reading e-Science Centre, ESSC, University of Reading, RG6.
Presented by End-to-End Computing at ORNL Scott A. Klasky Scientific Computing National Center for Computational Sciences In collaboration with Caltech:
 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.
LHCb Software Week November 2003 Gennady Kuznetsov Production Manager Tools (New Architecture)
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
Intermediate Condor: Workflows Rob Quick Open Science Grid Indiana University.
Guide to Linux Installation and Administration, 2e1 Chapter 11 Using Advanced Administration Techniques.
Your name here SPA: Successes, Status, and Future Directions Terence Critchlow And many, many, others Scientific Process Automation PNNL.
The EDGeS project receives Community research funding 1 Porting Applications to the EDGeS Infrastructure A comparison of the available methods, APIs, and.
Interactive Workflows Branislav Šimo, Ondrej Habala, Ladislav Hluchý Institute of Informatics, Slovak Academy of Sciences.
Software Development COMP220/COMP285 Seb Coope Introducing Ant These slides are mainly based on “Java Development with Ant” - E. Hatcher & S.Loughran.
Toward interactive visualization in a distributed workflow Steven G. Parker Oscar Barney Ayla Khan Thiago Ize Steven G. Parker Oscar Barney Ayla Khan Thiago.
Jay Lofstead Input/Output APIs and Data Organization for High Performance Scientific Computing November.
Hardware process When the computer is powered up, it begins to execute fetch-execute cycle for the program that is stored in memory at the boot strap entry.
If condition1 then statements elif condition2 more statements […] else even more statements fi.
Group, group, group One after the other: cmd1 ; cmd2 One or both: cmd1 && cmd2 Only one of them: cmd1 || cmd2 Cuddling (there):( cmd1 ; cmd2 ) Cuddling.
Threads. Thread A basic unit of CPU utilization. An Abstract data type representing an independent flow of control within a process A traditional (or.
Chapter – 8 Software Tools.
SDM Center Experience with Fusion Workflows Norbert Podhorszki, Bertram Ludäscher Department of Computer Science University of California, Davis UC DAVIS.
Climate-SDM (1) Climate analysis use case –Described by: Marcia Branstetter Use case description –Data obtained from ESG –Using a sequence steps in analysis,
IBM Express Runtime Quick Start Workshop © 2007 IBM Corporation Deploying a Solution.
ACCESSING DATA IN THE NIS USING THE KEPLER WORKFLOW SYSTEM Corinna Gries.
Workflow Management Concepts and Requirements For Scientific Applications.
TOPSpro Special Topics I: Database Managemen t. Agenda for Module I: Database Management  TOPSpro Backup/Restore Wizard  TOPS-TOPS Import/Export Wizard.
Active-HDL Server Farm Course 11. All materials updated on: September 30, 2004 Outline 1.Introduction 2.Advantages 3.Requirements 4.Installation 5.Architecture.
Some of the utilities associated with the development of programs. These program development tools allow users to write and construct programs that the.
FILES AND EXCEPTIONS Topics Introduction to File Input and Output Using Loops to Process Files Processing Records Exceptions.
Chapter 2 Build Your First Project A Step-by-Step Approach 2 Exploring Microsoft Visual Basic 6.0 Copyright © 1999 Prentice-Hall, Inc. By Carlotta Eaton.
Programming Logic and Design Seventh Edition Chapter 1 An Overview of Computers and Programming.
Overview of Scientific Workflows: Why Use Them?
SOFTWARE DESIGN AND ARCHITECTURE
Virtual Machine Emulator Tutorial
湖南大学-信息科学与工程学院-计算机与科学系
Exploring the Power of EPDM Tasks - Working with and Developing Tasks in EPDM By: Marc Young XLM Solutions
Overview of Workflows: Why Use Them?
Presentation transcript:

Workflow automation for processing plasma fusion simulation data Norbert Podhorszki Bertram Ludäscher Scientific Computing Group Oak Ridge National Laboratory University of California, Davis Scott A. Klasky

6/25/07Works’07 Monterey, CA Center for Plasma Edge Simulation Focus on the edge of the plasma in the tokamak Multi-scale, multi-physics simulation Edge turbulence in NSTX 100,000 frames/s) Diverted magnetic field

6/25/07Works’07 Monterey, CA Images plasma physicists adore Electric potential Parallel flow and particle positions

6/25/07Works’07 Monterey, CA Monitoring the simulation means…

6/25/07Works’07 Monterey, CA Multi-physics → many codes

6/25/07Works’07 Monterey, CA XGC simulation output Desired size of simulation (to be run on the petascale machine) –100K time steps –100 billion particles –10 attributes (double precision) per particles = 8 TB data per time step –Save (and process) 1K-10K time steps –about 5 days run on the petascale

6/25/07Works’07 Monterey, CA XGC simulation output Proprietary binary files (BP) –3D variables, separate file per each timestep NetCDF files containing –2D variables, all timesteps in one file M3D coupling data –to compute new equilibrium with external code (loose coupling) –to check linear stability of XGC externally

6/25/07Works’07 Monterey, CA What to do with those output? Proprietary binary files (BP) –Transfer to end-to-end system using bbcp –Convert to HDF5 format (with a C program) –Generate images using AVS/Express (running as service) –Archive HDF5 files in large chunks to HPSS NetCDF files containing –Transfer to end-to-end system (updating as new timesteps are written into the files) –Generate images using grace library –Archive NetCDF files at the end of simulation M3D coupling data –Transfer to end-to-end system –Execute M3D: compute new equilibrium –Transfer back the new equilibrium to XGC –Execute ELITE: compute growth rate, test linear stability –Execute M3D-MPP: to study unstable states (ELM crash)

6/25/07Works’07 Monterey, CA Schematic view of components Cray XT4 Opteron cluster Command & control site 40 GB/s HPSS ORNL

6/25/07Works’07 Monterey, CA ORNL Schematic view of components Cray XT4 Opteron cluster Command & control site 40 GB/s HPSS

6/25/07Works’07 Monterey, CA ORNL Schematic view of components Cray XT4 Opteron cluster Command & control site 40 GB/s HPSS NERSC Pull data

6/25/07Works’07 Monterey, CA Kepler workflow –to accomplish all these tasks –1239 (java) actors –4 levels of hierarchy –many instances of ProcessFile and FileWatcher composite actors “workflow templates” 43 actors, 3 levels 196 actors, 4 levels 30 actors 206 actors, 4 levels 137 actors 33 actors actors 66 actors 12 actors 243 actors, 4 levels

6/25/07Works’07 Monterey, CA Workflow – java - remote script - remote prg ls -l bp2h5 bbcp

Kepler actors for CPES Permanent SSH connection to perform tasks on a remote machine Generalized actors (sub-workflows) for specified tasks: –Watch a remote directory for simulation timesteps –Execute an external command on a remote machine –Tar and archive data in large junks to HPSS –Transfer a remote image file and display on screen –Control a running SCIRun server remotely –Job submission and control to various resource managers Above actors do logging/checkpointing –the final workflow can be stopped / restarted

6/25/07Works’07 Monterey, CA What Kepler features are used in CPES? Different computational models –PN for parallelism and pipeline processing –DDF for sequential workflow with if-then-else and while loop structures –SDF for efficient (static schedule) sequential execution of simple sub-workflows Stateful actors in stream processing of files SSH for remote operations –keeps the connection alive Command-line execution of the workflow –from a script (at deployment) (no GUI) –reading workflow parameters from a file

6/25/07Works’07 Monterey, CA ● SSH Directory Listing Java actor gives new files in a directory (once) ● This is a do-while loop where the termination condition is whether the list contains a specific element (which indicates end of simulation) FileWatcher: a data-dependent loop

6/25/07Works’07 Monterey, CA Modeling problem: stopping and finishing You create working pipelines finally. Fine. –How do you stop them? –How do you let intermediate actors know that they will not receive more tokens? –How do you perform something “after” the processing? We use a special token flowing through the pipelines –Always the last item in the pipeline. –Actors are implemented (extra work) to skip this token. Stop file created by the simulation –to stop the “task generator” actors in the workflow (FileWatchers) –to notify (stateful) actors in the pipeline that they should finalize (Archiver, Stop_AVS/Express) –to synchronize on two independent pipelines (NetCDF+HDF5 → archive images at the end)

6/25/07Works’07 Monterey, CA Role of stop file Stop

6/25/07Works’07 Monterey, CA Role of stop file Stop Finalize Wait for stop on both pipelines Extra work after the end

6/25/07Works’07 Monterey, CA Problem: how to restart this workflow? Kepler has no system-level checkpoint/restart mechanism (yet?) –seems to be difficult for large Java applications –not to mention the status of external (and remote) things. Pipeline execution –each actor is processing a different step simultaneously

6/25/07Works’07 Monterey, CA Our solution: user-level logging/restart We record –the successful operations at each (“heavy”) actor Those actors – are implemented to check before doing something whether that has been done already When the workflow is restarted –it starts from the very beginning, but the actors simply skip operations (files, tokens) that have already been done. We do not worry about repeating small (control related) actions within the workflow –external operations are that matter here

6/25/07Works’07 Monterey, CA ProcessFile core: check-perform-record

6/25/07Works’07 Monterey, CA Problem: failed operations What if an operation fails, e.g. one timestep cannot be transferred? Options: a) trust that they “fail” silently on missing data b)notify everybody downstream in the pipeline (to skip) –mark token as “failed” c) avoid giving tasks to them for the erroneous step Retrying later and processing that step is important but … … keeping up with the simulation on the next steps is even more important

6/25/07Works’07 Monterey, CA Our approach for failed operations ProcessFile and thus the workflow handles failures by discarding tokens related to failed operations from the stream Advantage: –actors need not care about failures an incoming token is a task to be done Disadvantage –rate of token production varies this can upset Kepler’s model of computation

6/25/07Works’07 Monterey, CA Discarding tokens on failure transfer 1 failed 2 convert 1arch 1 transfer 3convert 3arch 3

6/25/07Works’07 Monterey, CA After a restart… skip 1 transfer 2 skip 1 convert 2 skip 1 arch 2 skip 3

6/25/07Works’07 Monterey, CA Future Plans Provenance management –one main reason to use scientific workflow system e.g. in bioinformatics workflows –needed for debugging runs, interpreting results, repeat experiment, generate documentation, compare runs etc. –CPES workflow is selected as one use case for the ongoing Kepler provenance work New actors in CPES for controlling asynchronous I/O from the petascale computer towards the processing cluster

Thank You Questions?