Fusion-SDM (1) Problem description –Each run in future: ¼ Trillion particles, 10 variables, 8 bytes –Each time step, generated every 60 sec is (250x10^^9)x8x10.

Slides:



Advertisements
Similar presentations
Experiment Workflow Pipelines at APS: Message Queuing and HDF5 Claude Saunders, Nicholas Schwarz, John Hammonds Software Services Group Advanced Photon.
Advertisements

K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.
Jialin Liu, Bradly Crysler, Yin Lu, Yong Chen Oct. 15. Seminar Data-Intensive Scalable Computing Laboratory (DISCL) Locality-driven High-level.
EUFORIA FP7-INFRASTRUCTURES , Grant JRA4 Overview and plans M. Haefele, E. Sonnendrücker Euforia kick-off meeting 22 January 2008 Gothenburg.
Workflow automation for processing plasma fusion simulation data Norbert Podhorszki Bertram Ludäscher Scientific Computing Group Oak Ridge National Laboratory.
June 2003Yun (Helen) He1 Coupling MM5 with ISOLSM: Development, Testing, and Application W.J. Riley, H.S. Cooley, Y. He*, M.S. Torn Lawrence Berkeley National.
SDM center Questions – Dave Nelson What kind of processing / queries / searches biologists do over microarray data? –Range query on a spot? –Range query.
EPOS use case: High-resolution seismic tomography of Italy
1 High Performance Computing at SCEC Scott Callaghan Southern California Earthquake Center University of Southern California.
STAR Software Walk-Through. Doing analysis in a large collaboration: Overview The experiment: – Collider runs for many weeks every year. – A lot of data.
Advanced Scientific Visualization Paul Navrátil 28 May 2009.
Kian-Tat Lim Offline Computing November 12 th, LCLS Offline Data Management.
Data-Intensive Computing in the Science Community Alex Szalay, JHU.
Progress Presentation. Tasks Completed I have resolved most of the bugs in the previous graphs of stanford to the world monitoring data. Completed work.
ETL By Dr. Gabriel.
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
Presented by XGC: Gyrokinetic Particle Simulation of Edge Plasma CPES Team Physics and Applied Math Computational Science.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
J OINT I NSTITUTE FOR N UCLEAR R ESEARCH OFF-LINE DATA PROCESSING GRID-SYSTEM MODELLING FOR NICA 1 Nechaevskiy A. Dubna, 2012.
An Autonomic Framework in Cloud Environment Jiedan Zhu Advisor: Prof. Gagan Agrawal.
HPDC 2014 Supporting Correlation Analysis on Scientific Datasets in Parallel and Distributed Settings Yu Su*, Gagan Agrawal*, Jonathan Woodring # Ayan.
Victoria, May 2006 DAL for theorists: Implementation of the SNAP service for the TVO Claudio Gheller, Giuseppe Fiameni InterUniversitary Computing Center.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Objectives.
Presented by On the Path to Petascale: Top Challenges to Scientific Discovery Scott A. Klasky NCCS Scientific Computing End-to-End Task Lead.
A User-Lever Concurrency Manager Hongsheng Lu & Kai Xiao.
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
Computer Science Research and Development Department Computing Sciences Directorate, L B N L 1 Storage Management and Data Mining in High Energy Physics.
Parallel Programming on the SGI Origin2000 With thanks to Igor Zacharov / Benoit Marchand, SGI Taub Computer Center Technion Moshe Goldberg,
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
Bulk Data Movement: Components and Architectural Diagram Alex Sim Arie Shoshani LBNL April 2009.
HPDC 2013 Taming Massive Distributed Datasets: Data Sampling Using Bitmap Indices Yu Su*, Gagan Agrawal*, Jonathan Woodring # Kary Myers #, Joanne Wendelberger.
EGEE is a project funded by the European Union under contract IST HEP Use Cases for Grid Computing J. A. Templon Undecided (NIKHEF) Grid Tutorial,
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
Cactus/TIKSL/KDI/Portal Synch Day. Agenda n Main Goals:  Overview of Cactus, TIKSL, KDI, and Portal efforts  present plans for each project  make sure.
Computing Simulation in Orders Based Transparent Parallelizing Pavlenko Vitaliy Danilovich, Odessa National Polytechnic University Burdeinyi Viktor Viktorovych,
Running Genesis Free-Electron Laser Code on iDataPlex Dave Dunning 15 th January 2013.
GEOL882.3 Seismic Processing Systems Objective Processing Systems SEGY and similar file formats General structure of several systems.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks HYP3D Gilles Bourhis Equipe SIMPA, laboratoire.
1 L.Didenko Joint ALICE/STAR meeting HPSS and Production Management 9 April, 2000.
A QCD Grid: 5 Easy Pieces? Richard Kenway University of Edinburgh.
NCAS Computational Modelling Service (CMS) Group providing services to the UK academic modelling community Output of UM Diagnostics Directly in CF NetCDF;
CS 351/ IT 351 Modeling and Simulation Technologies Review ( ) Dr. Jim Holten.
A year & a summer of June – August
Experiences Running Seismic Hazard Workflows Scott Callaghan Southern California Earthquake Center University of Southern California SC13 Workflow BoF.
SDM Center Experience with Fusion Workflows Norbert Podhorszki, Bertram Ludäscher Department of Computer Science University of California, Davis UC DAVIS.
Stephen Gowdy FNAL 9th Feb 2015CMS Computing Model Simulation 1.
Climate-SDM (1) Climate analysis use case –Described by: Marcia Branstetter Use case description –Data obtained from ESG –Using a sequence steps in analysis,
Computer Performance. Hard Drive - HDD Stores your files, programs, and information. If it gets full, you can’t save any more. Measured in bytes (KB,
Using Galaxy to build and run data processing pipelines Jelle Scholtalbers / Charles Girardot GBCS Genome Biology Computational Support.
Tool Support for Testing Classify different types of test tools according to their purpose Explain the benefits of using test tools.
Understanding your FLOW-3D simulations better with EnSight June 2012.
1
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
Compute and Storage For the Farm at Jlab
Kai Li, Allen D. Malony, Sameer Shende, Robert Bell
Modeling Big Data Execution speed limited by: Model complexity
Seismic Hazard Analysis Using Distributed Workflows
VisIt Libsim Update DOE Computer Graphics Forum 2012 Brad Whitlock
Usecase Dynamo Moderate Requirements:
Scott Callaghan Southern California Earthquake Center
Tamas Szalay, Volker Springel, Gerard Lemson
So far we have covered … Basic visualization algorithms
Unstructured Grids at Sandia National Labs
SDM workshop Strawman report History and Progress and Goal.
TeraScale Supernova Initiative
Laura Bright David Maier Portland State University
Vocabulary Algorithm - A precise sequence of instructions for processes that can be executed by a computer Low level programming language: A programming.
Overview of Workflows: Why Use Them?
Emulator of Cosmological Simulation for Initial Parameters Study
Parallel Feature Identification and Elimination from a CFD Dataset
Presentation transcript:

Fusion-SDM (1) Problem description –Each run in future: ¼ Trillion particles, 10 variables, 8 bytes –Each time step, generated every 60 sec is (250x10^^9)x8x10 bytes = 2x10^^13 (20 Terabyte) I/O rate 300 Gb/sec (not possible) –Put analysis into simulation –Need to embed analysis into computation Approach –Reduce data by summarizing into bins –55 GB per 100x10 bins - break into 64 files (probably) –2000 time steps x64 files x4 runs = 512,000 files –Every 60 sec need to Assume –you get 20 GB/s, Need to reduce data accordingly –Per run: 55GBX2000 = 110 TB Archival: move data to HPSS –110 TB/ 300MB/s = 4.2 days per simulation * 4 runs

Fusion-SDM (2) Tasks –Task 1: help checking that 20 GB/s can be sustained –Task 2: workflow integrated into the process for generating images etc. for monitoring the progress –Task 3: move data to HPSS with HSI – workflow task

Fusion-SDM (3) Analysis scenario –Goal: To find coherent structures –Need to generate coarser bins “reduced” data, or sample data –How to run parallel analysis on entire dataset? –Approach: incremental progress Task: Chandrika –Finding coherent structure –Analyzing particle data on 5D mesh. –Reduce the 55GB mesh down to 440MB mesh. (4x spatially, 2x in one velocity dimension). Do this during for the last 1K timesteps. = 430GB. –Pick a few time steps – toroidal / poloidal data –Then increase granularity –Then take more time steps –Then need to run the whole analysis in parallel, etc. –Then apply to XGC-1 data – in CPES, etc…

provenance Provenance –Currently: weak coupling (10 MB every few seconds) Static / dynamic libraries –Future First principle codes Models – fast, much longer time simulations: stronger coupling. –Need multiple codes to be couples strongly Need electronic notebook (find new name for that) Keeping track what is sent where: perhaps of profiles (1D volume averaged slices?) –Provenance Which machine, what parameters, etc. What code: got to save the code, what libraries Task: work with Seung-Hoe/ Julian Cummings on capturing metadata (XGC) –From makefiles/build environment.? –Get information from job submission. –Get information from PAPI. Task: changes in the dashboard –Generate movies on the fly (More real-time image generation) –More analysis (IDL scripts, VISIT, fast-bit, run this script from dashboard, parallel coordinates, correlation functions, …).