Presentation is loading. Please wait.

Presentation is loading. Please wait.

SDM Center End-to-end data management capabilities in the GPSC & CPES SciDAC’s: Achievements and Plans SDM AHM December 11, 2006 Scott A. Klasky End-to-End.

Similar presentations


Presentation on theme: "SDM Center End-to-end data management capabilities in the GPSC & CPES SciDAC’s: Achievements and Plans SDM AHM December 11, 2006 Scott A. Klasky End-to-End."— Presentation transcript:

1 SDM Center End-to-end data management capabilities in the GPSC & CPES SciDAC’s: Achievements and Plans SDM AHM December 11, 2006 Scott A. Klasky End-to-End Task Lead Scientific Computing Group ORNL SDM AHM December 11, 2006 Scott A. Klasky End-to-End Task Lead Scientific Computing Group ORNL

2 SDM Center Outline Overview of GPSC activities. The GTC and GEM codes. On the path to petascale computing. Data Management Challenges for GTC. Overview of CPES activities. The XGC and M3D codes. Code Coupling. Workflow Solutions. ORNL’s end-to-end activities. Asynchronous I/O. Dashboard Efforts.

3 SDM Center It’s all about the enabling technologies… CS Math Applications Enabling technologies respond Applications drive D. Keyes It’s all about the data It’s all about the features which lead us to Scientific discovery!

4 SDM Center GPSC gyrokinetic PIC codes used for studying microturbulence in plasma core GTC (Z. Lin et al., Science 281, p.1835, 1998) Intrinsically global 3D nonlinear gyrokinetic PIC code All calculations done in real space Scales to > 30,000 processors Delta-f method Recently upgraded to fully electromagnetic GEM (Y. Chen & S. Parker, JCP, in press 2006) Fully electromagnetic nonlinear delta-f code Split-weight scheme implementation of kinetic electrons Multi-species Uses Fourier decomposition of the fields in toroidal and poloidal directions (wedge code) What about PIC noise. “ It is now generally agreed that these ITG simulations are not being influenced by particle noise. Noise effects on ETG turbulence remain under study but are beginning to seem of diminishing relevance.” PSACI-PAC.

5 SDM Center GTC Code performance. Increase output because of Asynchronous metadata rich I/O. Workflow automation. More analysis services in the workflow. Historical Prediction of GTC Data Production Cray XT3 Cray T3EIBM SP3Cray X1ECray Baker

6 SDM Center GTC: Towards a Predictive Capability for ITER Plasmas Petascale Science Investigate important physics problems for ITER plasmas, namely, the effect of size and isotope scaling on core turbulence and transport (heat, particle, and momentum). These studies will focus on the principal causes of turbulent transport in tokamaks, for example, electron and ion temperature gradient (ETG and ITG) drift instabilities, collisonless and collisional (dissipative) trapped electron mode (CTEM and DTEM) and ways to mitigate these phenomena.

7 SDM Center Impact: How does turbulence cause heat, particles and momentum to escape from plasmas? Investigation of the ITER confinement properties is required a dramatic step from 10 MW for 1 second to the projected 500 MW for 400 seconds. The race is on to improve predictive capability before ITER comes on line (projected 2015). More realistic assessment of ignition margins requires more accurate calculations of steady-state temperature and density profiles for ions, electrons and helium ash. The success of ITER depends in part on its ability to operate in a gyroBohm scaling regime which must be demonstrated computationally. Key for ITER is the fundamental understanding of the effect of deuterium-tritium isotope presence (isotope scaling) on turbulence.

8 SDM Center Calculation Details Turbulent transport studies will be carried out using the present GTC code, which uses a grid of the size of ion gyroradius. The electron particle transport physics requires the incorporation of the size of the electron skin depth in the code for the TEM physics, which can be an order of magnitude smaller than the size of ion gyroradius. A 10,000x10,000x100 grid and 1 trillion particles (100 particles/cell) are estimated to be needed. (700 TB/scalar field, 25TB particles(1 time step). For the 250TF machine a 2D domain decomposition (DD) for electrostatic simulation of ITER size machine (a/rho>1000) with kinetic electron is necessary. W. Lee

9 SDM Center GTC Data Management Issues Problem: Move data from NERSC to ORNL then to PPPL as the data was being generated. Transfer from NERSC to ORNL, 3000 timesteps, 800GB within the simulation run (34 hours). Convert each file to HDF5 file Archive files to 4GB chunks to HPSS at ORNL. Move portion of hdf5 files to PPPL. Solution: Norbert Podhorszki Transfer Convert Archive Watch

10 SDM Center GTC Data Management Achievements In the process to remove Ascii output. Hdf5 output. Netcdf output. Replace with Binary (parallel) I/O with metadata tags. Conversion to HDF5 during the simulation on a ‘cheaper’ resource. 1 XML file to describe all files output in GTC. Only work with 1 file from the entire simulation. Large buffer writes. Asynchronous I/O when it becomes available.

11 SDM Center The data-in-transit problem Particle data needs to be examined occasionally. 1 trillion particles = 25TB/hour. (Demand <2% I/O overhead). Need 356GB/sec to handle burst! (7GB/sec aggregate). We can’t store all of this data! (2.3 PB/simulation) x 12 simulations/year = 25 PB. Need to analyze on-the-fly and not save all of the data for permanent storage. [Analyze on another system]. Scalar data needs to be analyzed during the simulation. Computational Experiments too costly to let simulation run and ignore it. [Estimated cost = $500K/simulation on Pflop machine]. GTC already = 0.5M CPU hours/simulation; approaching 3M CPU hours on 250Tflop system. Need to compare new simulations with older simulations and experimental data. Metadata needs to be stored in databases.

12 SDM Center Workflow Simulation monitoring. Images generated from the workflow. User needs to set angles, min/max and then the workflow produces the images. Still need to put this in our everyday use. Really need to identify the features as it’s running. Trace back features once they are known to earlier timesteps (where are they born?)

13 SDM Center 5D Data Analysis -1 Common in fusion to look at puncture plots. (2D). To gleam insight, we need to be able to detect ‘features’ Need temporal perspective, involving the grouping of similar items to possibly identify interesting new plasma structures (within this 5D-phase space) at different stages of the simulations. 2D Phase Space

14 SDM Center 5D Data Analysis -2  Our turbulence covers the global volume as opposed to some isolated (local) regions  The spectral representation of the turbulence, evolves in time by moving to longer wavelengths.  Understanding key nonlinear dynamics here involves extracting relevant information from the data sets for the particle behavior.  The trajectories of these particles are followed self-consistently in phase space  Tracking of spatial coordinates and the velocities.  The self- consistent interaction between the fields and the particles is most important when viewed in the velocity space because particles of specific velocities will resonate with waves in the plasma to transfer energy.  Structures in velocity space could potentially be used in the future development of multi- resolution compression methods. W. Tang

15 SDM Center Data Management Challenge Decomposition shows transient wave components in time A new discovery was made by Z. Lin in large ETG calculations. We were able to see radial flow across individual eddies. The Challenge: Track the flow across the individual eddies, give statistical measurements on the velocity of the flow Using Local Eddy Motion Density (PCA) to examine data. Hard problem for lots of reasons! Ostrouchov ORNL

16 SDM Center Physics in tokamak plasma edge Plasma turbulence Turbulence suppression (H-mode) Edge localized mode and ELM cycle Density and temperature pedestal Diverter and separatrix geometry Plasma rotation Neutral collision Edge turbulence in NSTX (@ 100,000 frames/s) Diverted magnetic field

17 SDM Center XGC code XGC-0 self-consistently includes 5D ion neoclassical dynamics, realistic magnetic geometry and wall shape Conserving plasma collisions (Monte Carlo) 4D Monte Carlo neutral atoms with recycling coefficient Conserving MC collisions, ion orbit loss, self-consistent E r Neutral beam source, magnetic ripple, heat flux from core. XGC-1 includes Particle source from neutral ionization Full-f ions, electrons, and neutrals Gyrokinetic Poisson equation for neoclassical and turbulent electric field Full-f electron kinetics for neoclassical physics Adiabatic electrons for electrostatic turbulence General 2d field solver in a dynamically evolving 3D B field

18 SDM Center Neoclassical potential and flow of edge plasma from XGC1 Electric potential Parallel flow and particle positions

19 SDM Center Phs-0: Simple coupling: with M3D and NIMROD XGC-0 grows pedestal along neoclassical root MHD checks instability and crashes the pedestal The same with XGC-1 and 2 Phs-1: Kinetic coupling: MHD performs the crash XGC supplies closure information to MHD during crash Phs-2: Advanced coupling: XGC performs the crash M3D supplies the B crash information to XGC during the crash XGC-MHD coupling plan Blue: Developed Red: To be developed Need real-time visualization to help monitor/debug these simulations. Need better integration with interactive debugging sessions. Need to be able to look at derived quantities from raw data.

20 SDM Center Data replication XGC-M3D code coupling Code coupling framework with Kepler XGC on Cray XT3 End-to-end system 160p, M3D runs on 64P Monitoring routines here Ubiquitous and transparent data access via logistical networking User monitoring Data replication Post-processing 40 Gb/s Data archiving

21 SDM Center Code Coupling Framework XGC1 R2D0 M3DOMP M3DMPP lustre Bbcp first then portals with sockets. lustre  Necessary steps for initial completion  R2D0, M3DOMP becomes a service  M3DMPP is launched from Kepler once M3DOMP returns a failure condition.  XGC1 stops when M3DMPP is launched.  Get incorporated into Kepler

22 SDM Center Kepler workflow framework Kepler: developed by the SDM Center Kepler is an adaptation of the UC Berkeley tool, Ptolemy Can be composed of sub-workflows Uses event-based “director” and “actors” methodology Features in Kepler relevant to CPES Launching components (ssh, command line) Execution logger – keep track of runs Data movement – Sabul, Gridftp, Logistical Networks (future), data streaming (future).

23 SDM Center Original View of CPES workflow (a typical scenario) What’s wrong with this picture? Run Simulation Run Simulation Move files In time step Move files In time step Analyze Time step Analyze Time step Visualize Analyzed data Visualize Analyzed data Simulation Program (MPI) Simulation Program (MPI) TS Iterate On TS Disk Cache SRM Data Mover SRM Data Mover Seaborg NERSC HPSS ORNL TS Disk Cache Disk cacke Ewok-ORNL Analysis Program Analysis Program CPES VIS tool CPES VIS tool Kepler Workflow Engine Software components Hardware + OS KEPLER

24 SDM Center What’s wrong with this picture? Scientists running simulations will NOT use Kepler to schedule jobs on super-computers Concern about dependency on another system But need to track when files are generated so Kepler can move them Need a “FileWatcher” actor in kepler ORNL permit only One-Time-Password (OTP) Need a OTP login actor in Kepler Only SSH can be used to invoke jobs including data copying Cannot use GridFTP (requires GSI security support at all sites) Need an ssh-based DataMover actor in Kepler: scp, bbcp, … HPSS does not like a large number of small files Need an actor in Kepler to TAR files before archiving

25 SDM Center New actors in CPES workflow to overcome problems Detect when Files are Generated Detect when Files are Generated Move files Move files Tar files Tar files OTP Login actor OTP Login actor Disk Cache File Watcher actor File Watcher actor Seaborg NERSC HPSS ORNL Disk Cache Disk cacke Ewok-ORNL Scp File copier actor Scp File copier actor Tar’ing actor Tar’ing actor Kepler Workflow Engine Software components Hardware + OS Login At ORNL (OTP) Login At ORNL (OTP) Archive files Archive files Local archiving actor Local archiving actor Simulation Program (MPI) Simulation Program (MPI) Start Two Independent processes KEPLER 1 2

26 SDM Center Future SDM work in CPES Workflow Automation of the coupling problem. Critical for for code debugging. Necessary to track provenance to ‘replay’ coupling experiments. Q: Do we stream data or write files? Dashboard for monitoring simulation. Fast SRM movement of data NERSC ORNL.

27 SDM Center Asynchronous petascale I/O for data in transit High-performance I/O Asynchronous Managed buffers Respect firewall constraints Enable dynamic control with flexible MxN operations Transform using shared-space framework (Seine) User applications Seine coupling framework interface Other program paradigms Shared space management Load balancing Directory layerStorage layer Communication layer (buffer management) Operating system

28 SDM Center Current Status Asynchronous I/O Currently working on XT3 development machine (rizzo.ccs.ornl.gov). Current implementation based on RDMA approach. Current benchmarks indicate 0.1% overhead writing 14TB/hour on jaguar.ccs.ornl.gov. Looking at changes in ORNL infrastructure to deal with these issues. Roughly 10% of machine will be carved off for real-time analysis. (100 Tflop for real-time analysis with TBs/sec bandwidth).

29 SDM Center SDM/ORNL Dashboard: Current Status Step 1: Monitor ORNL and NERSC machines. Log in https://ewok- web.ccs.ornl.gov/dev/rbarreto/ SDMP/WebContent/SdmpApp /rosehome.phphttps://ewok- web.ccs.ornl.gov/dev/rbarreto/ SDMP/WebContent/SdmpApp /rosehome.php Uses OTP. Working to pull out users jobs. Workflow will need to move data to ewok web disk. Jpeg, xml files (metadata).

30 SDM Center Dashboard- future Current and old simulations will be accessible on webpage. Schema from simulation will be determined by XML file the simulation produces. Pictures and simple metadata (min/max…) are displayed on the webpage. Later we will allow users to ‘control’ their simulations.

31 SDM Center The End-to-End Framework SRM LN Async. NXM streaming Workflow Automation Applied Math Applications Data Monitoring CCA VIZ/Dashboard Metadata rich output from components.

32 SDM Center Plans Incorporate workflow automation into everyday work. Incorporate visualization services into the workflow. Incorporate asynchronous I/O (data streaming) techniques. Unify Schema in fusion SciDAC PIC codes. Further Develop workflow automation for code coupling. Will need dual-channel Kepler actors to understand data streams. Will need to get certificates to deal with OTP with workflow systems. Autonomics in workflow automation. Easy to use for non-developers! Dashboard. Simulation monitoring (via push method) available end Q2: 2004. Simulation control!


Download ppt "SDM Center End-to-end data management capabilities in the GPSC & CPES SciDAC’s: Achievements and Plans SDM AHM December 11, 2006 Scott A. Klasky End-to-End."

Similar presentations


Ads by Google