I/O Optimization for ENZO Cosmology Simulation Using MPI-IO Jianwei Li12/06/2001.

Slides:



Advertisements
Similar presentations
MPI version of the Serial Code With One-Dimensional Decomposition Presented by Timothy H. Kaiser, Ph.D. San Diego Supercomputer Center Presented by Timothy.
Advertisements

Sharks and Fishes – The problem
File Consistency in a Parallel Environment Kenin Coloma
1 Introduction to Collective Operations in MPI l Collective operations are called by all processes in a communicator. MPI_BCAST distributes data from one.
Parallel Processing1 Parallel Processing (CS 667) Lecture 9: Advanced Point to Point Communication Jeremy R. Johnson *Parts of this lecture was derived.
Dynamic Load Balancing for VORPAL Viktor Przebinda Center for Integrated Plasma Studies.
Sahalu Junaidu ICS 573: High Performance Computing 8.1 Topic Overview Matrix-Matrix Multiplication Block Matrix Operations A Simple Parallel Matrix-Matrix.
Efficient I/O on the Cray XT Jeff Larkin With Help Of: Gene Wagenbreth.
Parallel I/O Performance Study Christian Chilan The HDF Group September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1.
Parallelizing stencil computations Based on slides from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267.
Game of Life Courtesy: Dr. David Walker, Cardiff University.
CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.
CS 584. Review n Systems of equations and finite element methods are related.
I/O Analysis and Optimization for an AMR Cosmology Simulation Jianwei LiWei-keng Liao Alok ChoudharyValerie Taylor ECE Department Northwestern University.
CSE351/ IT351 Modeling And Simulation Choosing a Mesh Model Dr. Jim Holten.
Evaluation and Optimization of a Titanium Adaptive Mesh Refinement Amir Kamil Ben Schwarz Jimmy Su.
ECE669 L5: Grid Computations February 12, 2004 ECE 669 Parallel Computer Architecture Lecture 5 Grid Computations.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Page 1 CS Department Parallel Design of JPEG2000 Image Compression Xiuzhen Huang CS Department UC Santa Barbara April 30th, 2003.
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR.
Support for Adaptive Computations Applied to Simulation of Fluids in Biological Systems Immersed Boundary Method Simulation in Titanium.
Kathy Yelick, 1 Advanced Software for Biological Simulations Elastic structures in an incompressible fluid. Blood flow, clotting, inner ear, embryo growth,
HDF5 collective chunk IO A Working Report. Motivation for this project ► Found extremely bad performance of parallel HDF5 when implementing WRF- Parallel.
Parallel Adaptive Mesh Refinement Combined With Multigrid for a Poisson Equation CRTI RD Project Review Meeting Canadian Meteorological Centre August.
Charm++ Load Balancing Framework Gengbin Zheng Parallel Programming Laboratory Department of Computer Science University of Illinois at.
Alok 1Northwestern University Access Patterns, Metadata, and Performance Alok Choudhary and Wei-Keng Liao Department of ECE,
L15: Putting it together: N-body (Ch. 6) October 30, 2012.
1 Data Structures for Scientific Computing Orion Sky Lawlor charm.cs.uiuc.edu 2003/12/17.
Scalable Algorithms for Structured Adaptive Mesh Refinement Akhil Langer, Jonathan Lifflander, Phil Miller, Laxmikant Kale Parallel Programming Laboratory.
Lecture 8 – Stencil Pattern Stencil Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science.
HDF5 A new file format & software for high performance scientific data management.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
RFC: Breaking Free from the Collective Requirement for HDF5 Metadata Operations.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Pursuing Faster I/O in COSMO POMPA Workshop May 3rd 2010.
Lecture 7 – Data Reorganization Pattern Data Reorganization Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science.
Using HDF5 in WRF Part of MEAD - an alliance expedition.
High Performance Computing 1 Load-Balancing. High Performance Computing 1 Load-Balancing What is load-balancing? –Dividing up the total work between processes.
Project 4 SciDAC All Hands Meeting March 26-27, 2002 PIs:Alok Choudhary, Wei-keng Liao Grad Students:Avery Ching, Kenin Coloma, Jianwei Li ANL Collaborators:Bill.
Collective Buffering: Improving Parallel I/O Performance By Bill Nitzberg and Virginia Lo.
JavaSeis Parallel Arrays JavaSeis data structures Synchronous parallel model Arrays in Java mpiJava Parallel Distributed Arrays Examples.
SciDAC All Hands Meeting, March 2-3, 2005 Northwestern University PIs:Alok Choudhary, Wei-keng Liao Graduate Students:Avery Ching, Kenin Coloma, Jianwei.
Message Passing Programming with MPI Introduction to MPI Basic MPI functions Most of the MPI materials are obtained from William Gropp and Rusty Lusk’s.
Parallel and Grid I/O Infrastructure W. Gropp, R. Ross, R. Thakur Argonne National Lab A. Choudhary, W. Liao Northwestern University G. Abdulla, T. Eliassi-Rad.
Project 4 : SciDAC All Hands Meeting, September 11-13, 2002 A. Choudhary, W. LiaoW. Gropp, R. Ross, R. Thakur Northwestern UniversityArgonne National Lab.
The Vesta Parallel File System Peter F. Corbett Dror G. Feithlson.
Supercomputing ‘99 Parallelization of a Dynamic Unstructured Application using Three Leading Paradigms Leonid Oliker NERSC Lawrence Berkeley National Laboratory.
MPI (continue) An example for designing explicit message passing programs Advanced MPI concepts.
Resource Utilization in Large Scale InfiniBand Jobs Galen M. Shipman Los Alamos National Labs LAUR
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
I/O for Structured-Grid AMR Phil Colella Lawrence Berkeley National Laboratory Coordinating PI, APDEC CET.
Connections to Other Packages The Cactus Team Albert Einstein Institute
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
SDM Center High-Performance Parallel I/O Libraries (PI) Alok Choudhary, (Co-I) Wei-Keng Liao Northwestern University In Collaboration with the SEA Group.
1 Data Structures for Scientific Computing Orion Sky Lawlor /04/14.
1 Rocket Science using Charm++ at CSAR Orion Sky Lawlor 2003/10/21.
C OMPUTATIONAL R ESEARCH D IVISION 1 Defining Software Requirements for Scientific Computing Phillip Colella Applied Numerical Algorithms Group Lawrence.
1 HPJAVA I.K.UJJWAL 07M11A1217 Dept. of Information Technology B.S.I.T.
April 24, 2002 Parallel Port Example. April 24, 2002 Introduction The objective of this lecture is to go over a simple problem that illustrates the use.
Parallel I/O Performance Study and Optimizations with HDF5, A Scientific Data Package Christian Chilan, Kent Yang, Albert Cheng, Quincey Koziol, Leon Arber.
Parallel Algorithms & Implementations: Data-Parallelism, Asynchronous Communication and Master/Worker Paradigm FDI 2007 Track Q Day 2 – Morning Session.
1 Scalable Cosmological Simulations on Parallel Machines Filippo Gioachin¹ Amit Sharma¹ Sayantan Chakravorty¹ Celso Mendes¹ Laxmikant V. Kale¹ Thomas R.
Alternatives to Mobile Agents
Parallel I/O Optimizations
In-situ Visualization using VisIt
Resource Utilization in Large Scale InfiniBand Jobs
Component Frameworks:
GeoFEST tutorial What is GeoFEST?
Parallel Programming in C with MPI and OpenMP
Presentation transcript:

I/O Optimization for ENZO Cosmology Simulation Using MPI-IO Jianwei Li12/06/2001

Outline Today Introduction to ENZO Cosmology Simulation I/O Analysis for ENZO Cosmology Simulation Introduction to MPI-IO MPI-IO Approaches for ENZO Conclusion and Future Work

ENZO Cosmology Simulation Galaxy ClusterDark Matter Particles Rotate ENZO is a parallel, 3D cosmology hydrodynamics code with structured adaptive mesh refinement (AMR), using grid and its evolution to represent the space. Parallelism is achieved by domain decomposition of the root grid of the AMR grid hierarchy and distribution of the grids among many processors. The Cosmology Simulation is one of the problem types for ENZO code. It simulates the formation of the astrospace by evolving the initial grid and taking “slap-shots” during each cycle of evolutions until it reaches to some end status.

Cosmology Simulation Adaptive Mesh Refinement

Application Data Structure

I/O Analysis I/O based on Grid – a C++ object consisting of many large datasets Datasets and their Partition & Distribution Baryon Fields – 3D (B, B, B)Particle Datasets - 1D, Irregular

Original I/O Approach Serial Read –Processor 0 read in all grids –Processor 0 partition grids and distribute their datasets Partition grids: based on their boundaries(LeftEdges, RightEdges) Distribute Baryon Fields: 3D (B, B, B) Particle Datasets: 1D, based on particle position falling within each partitioned grid’s boundary Write –Combine partitioned topgrids on each processor to a single TopGrid on Processor 0 Combination: Baryon Fields: 3D (B, B, B) Particle Datasets: 1D, based on order of particlenumber –Processor 0 write TopGrid to a single file –Each processor write its own subgrids out to separate Files

Introduction to MPI-IO MPI-2; ROMIO; MPICH, HP MPI, SGI MPI High level interface of a parallel I/O system Support partition of file data among procs Data partition and access pattern by derived datatypes Data-sieving - strided/noncontiguous accesses Collective interface – distributed array accesses Support asynchronous I/O

Date-sieving and Collective I/O Reduce number of I/O requests, thus reduce the effect of high I/O latency

MPI-IO Approaches for ENZO For Regular Partition, Use derived datatypes and make use of the built-in data-sieving and collective interface of MPI-IO For Irregular Partition, Use simple block-wise MPI I/O and implement the data-sieving and two-phase I/O at the application level.

Implementation (1) 1.New Simulation Read Grid (Datasets) Baryon Fields: 3D (Block, Block,Block) View the file data as a 3D array, and the requested dataset part as its subarray defined by the offsets and sizes in each dimension; Then use MPI collective read. Particle Datasets: 1D Irreg(distributed by ParticlePosition) Partition the datasets block-wise, perform contiguous MPI read; Check the ParticlePosition and grid edges to determine which processor each particle belongs to, then perform communication to send particles to their right processors, where right means that particle falls within the grid boundary of the processor.

Implementation (2) 2.Write TopGrid (Datasets) Baryon Fields: 3D (Block, Block,Block) View the file data as a 3D array, and the requested dataset part as its subarray defined by the offsets and sizes in each dimension; Then use MPI collective write. Particle Datasets: 1D Irreg(combined by the order of ParticleNumber) Perform a parallel sort by the ParticleNumber and redistribute the particle data by checking which processor each particle belongs to so that its ParticleNumber falls between the ParticleNumber bounds of the processor; then perform block-wise MPI write.

Implementation (3) 3.Write Subgrids (Datasets) All processors write their own subgrids into the same single file. 4.Restart Read Grids (Datasets) Topgrid is read in the same way as new simulation read; Subgrids are read by processors in a round-robin manner; Since all subgrids are written in a single file, there can be further improvement with I/O balance, pre- fetching …

Conclusion and Future Work ENZO I/O can be effectively improved by using MPI-IO Distribution Dataset Pattern Dimension RegularIrregular 1-Dimensional Strided or contiguous derived datatypes to define data partition. Block-wise partition plus communication and/or data-sieving. Multi-Dimensional Subarray/other more complicated derived datatype may be used. Depending on pattern choose simple I/O pattern plus comm.

Future Work Using MDMS on ENZO application Useful MetaData: Dataset Rank and Dimensions Dataset Partition & Access Patterns Offsets and Sizes for Each Access of Datasets Order, Loop Access of Datasets