Using HDF5 in WRF Part of MEAD - an alliance expedition.

Slides:



Advertisements
Similar presentations
IT253: Computer Organization
Advertisements

Operating System.
Part IV: Memory Management
1 Projection Indexes in HDF5 Rishi Rakesh Sinha The HDF Group.
An HDF5-WRF module -A performance report MuQun Yang, Robert E. McGrath, Mike Folk National Center for Supercomputing Applications University of Illinois,
Chapter 11: File System Implementation
I/O Analysis and Optimization for an AMR Cosmology Simulation Jianwei LiWei-keng Liao Alok ChoudharyValerie Taylor ECE Department Northwestern University.
©Brooks/Cole, 2003 Chapter 7 Operating Systems Dr. Barnawi.
MEMORY MANAGEMENT By KUNAL KADAKIA RISHIT SHAH. Memory Memory is a large array of words or bytes, each with its own address. It is a repository of quickly.
File System Implementation
 2004 Deitel & Associates, Inc. All rights reserved. Chapter 9 – Real Memory Organization and Management Outline 9.1 Introduction 9.2Memory Organization.
HDF5 collective chunk IO A Working Report. Motivation for this project ► Found extremely bad performance of parallel HDF5 when implementing WRF- Parallel.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Status of netCDF-3, netCDF-4, and CF Conventions Russ Rew Community Standards for Unstructured Grids Workshop, Boulder
Operating systems CHAPTER 7.
Parallel HDF5 Introductory Tutorial May 19, 2008 Kent Yang The HDF Group 5/19/20081SCICOMP 14 Tutorial.
LOGO OPERATING SYSTEM Dalia AL-Dabbagh
Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
The HDF Group Parallel HDF5 Design and Programming Model May 30-31, 2012HDF5 Workshop at PSI 1.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
The WRF Model The Weather Research and Forecasting (WRF) Model is a mesoscale numerical weather prediction system designed for both atmospheric research.
February 2-3, 2006SRB Workshop, San Diego P eter Cao, NCSA Mike Wan, SDSC Sponsored by NLADR, NFS PACI Project in Support of NCSA-SDSC Collaboration Object-level.
Pursuing Faster I/O in COSMO POMPA Workshop May 3rd 2010.
Model Coupling Environmental Library. Goals Develop a framework where geophysical models can be easily coupled together –Work across multiple platforms,
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.
CMAQ Runtime Performance as Affected by Number of Processors and NFS Writes Patricia A. Bresnahan, a * Ahmed Ibrahim b, Jesse Bash a and David Miller a.
SciDAC All Hands Meeting, March 2-3, 2005 Northwestern University PIs:Alok Choudhary, Wei-keng Liao Graduate Students:Avery Ching, Kenin Coloma, Jianwei.
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
Frontiers in Massive Data Analysis Chapter 3.  Difficult to include data from multiple sources  Each organization develops a unique way of representing.
October 15, 2008HDF and HDF-EOS Workshop XII1 What will be new in HDF5?
Chapter 7 Operating Systems. Define the purpose and functions of an operating system. Understand the components of an operating system. Understand the.
Project 4 : SciDAC All Hands Meeting, September 11-13, 2002 A. Choudhary, W. LiaoW. Gropp, R. Ross, R. Thakur Northwestern UniversityArgonne National Lab.
VIRTUAL MEMORY By Thi Nguyen. Motivation  In early time, the main memory was not large enough to store and execute complex program as higher level languages.
Petascale –LLNL Appro AMD: 9K processors [today] –TJ Watson Blue Gene/L: 40K processors [today] –NY Blue Gene/L: 32K processors –ORNL Cray XT3/4 : 44K.
1 HDF5 Life cycle of data Boeing September 19, 2006.
A High performance I/O Module: the HDF5 WRF I/O module Muqun Yang, Robert E. McGrath, Mike Folk National Center for Supercomputing Applications University.
Towards Exascale File I/O Yutaka Ishikawa University of Tokyo, Japan 2009/05/21.
Accessing I/O Devices Processor Memory BUS I/O Device 1 I/O Device 2.
File System Implementation
Operating Systems Lecture 14 Segments Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard. Zhiqing Liu School of Software Engineering.
August 2001 Parallelizing ROMS for Distributed Memory Machines using the Scalable Modeling System (SMS) Dan Schaffer NOAA Forecast Systems Laboratory (FSL)
12.1 Silberschatz, Galvin and Gagne ©2003 Operating System Concepts with Java Chapter 12: File System Implementation Chapter 12: File System Implementation.
ATmospheric, Meteorological, and Environmental Technologies RAMS Parallel Processing Techniques.
Thinking in Parallel – Implementing In Code New Mexico Supercomputing Challenge in partnership with Intel Corp. and NM EPSCoR.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
Project18’s Communication Drawing Design By: Camilo A. Silva BIOinformatics Summer 2008.
Parallel I/O Performance Study and Optimizations with HDF5, A Scientific Data Package MuQun Yang, Christian Chilan, Albert Cheng, Quincey Koziol, Mike.
An HDF5-WRF module -A performance report MuQun Yang, Robert E. McGrath, Mike Folk National Center for Supercomputing Applications University of Illinois,
WRF Software Development and Performance John Michalakes, NCAR NCAR: W. Skamarock, J. Dudhia, D. Gill, A. Bourgeois, W. Wang, C. Deluca, R. Loft NOAA/NCEP:
Finite State Machines (FSM) OR Finite State Automation (FSA) - are models of the behaviors of a system or a complex object, with a limited number of defined.
Parallelization Strategies Laxmikant Kale. Overview OpenMP Strategies Need for adaptive strategies –Object migration based dynamic load balancing –Minimal.
ESMF,WRF and ROMS. Purposes Not a tutorial Not a tutorial Educational and conceptual Educational and conceptual Relation to our work Relation to our work.
SDM Center High-Performance Parallel I/O Libraries (PI) Alok Choudhary, (Co-I) Wei-Keng Liao Northwestern University In Collaboration with the SEA Group.
Uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison Wesley, 2003.
Parallel I/O Performance Study and Optimizations with HDF5, A Scientific Data Package Christian Chilan, Kent Yang, Albert Cheng, Quincey Koziol, Leon Arber.
Chapter 7 Memory Management Eighth Edition William Stallings Operating Systems: Internals and Design Principles.
ECE 456 Computer Architecture Lecture #9 – Input/Output Instructor: Dr. Honggang Wang Fall 2013.
Interrupts and Exception Handling. Execution We are quite aware of the Fetch, Execute process of the control unit of the CPU –Fetch and instruction as.
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
Copyright © 2010 The HDF Group. All Rights Reserved1 Data Storage and I/O in HDF5.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
WP18, High-speed data recording Krzysztof Wrona, European XFEL
Chapter 9 – Real Memory Organization and Management
TYPES OFF OPERATING SYSTEM
湖南大学-信息科学与工程学院-计算机与科学系
Operating Systems.
Introduction to Operating Systems
CSE451 Virtual Memory Paging Autumn 2002
Parallel Programming in C with MPI and OpenMP
Presentation transcript:

Using HDF5 in WRF Part of MEAD - an alliance expedition

Contents Introduction to WRF Introduction to parallelism of WRF Introduction to WRF IO APIs WRF-HDF5 file structure WRF-HDF5 sequential mode WRF-HDF5 parallel design and potential problems

WRF(Weather Research Forecasting Model) A limited-area weather model for research and prediction Combine NCAR/PSU MM5(research model, dynamics) with NCEP ETA(operational model, physics) Have the tendency to become the most popular limited-area weather model in this country and in the world Many Potential users Most codes in Fortran 77 and Fortran 90

From WRF tutorial

WRF software design schematic (Adapted from WRF software design document at

Model domains are decomposed for parallelism on two-levels Patch: section of model domain allocated to a distributed memory node Tile: section of a patch allocated to a shared-memory processor within a node; this is also the scope of a model layer subroutine. Distributed memory parallelism is over patches; shared memory parallelism is over tiles within patches Single version of code for efficient execution on: –Distributed-memory –Shared-memory –Clusters of SMPs –Vector and microprocessors WRF Multi-Layer Domain Decomposition Logical domain 1 Patch, divided into multiple tiles Inter-processor communication (From WRF tutorial:

From WRF tutorial:

Two level decomposition Logical domain 12 patches Level 1 Distributed memory MPI etc. Level 2 Shared memory 4 tiles We are interested in

Goal of I/O API –Make I/O infrastructures for both popular operational and research format Operations: GRIB, BUFR Research: NetCDF, HDF –“Portable I/O” – treat each I/O package as an external package; easy to turn on or off

WRF I/O Schematic

WRF data characteristics Many HDF5-equivalent datasets - currently output >100 datasets, each one < 1MB - typical dataset: 4-D real array with the dimension of time extensible Many output files but each file not big Weakly dimensional scale requirement - no real dimensional scale data May need datasets to be extensible WRF has history, restart, initial and boundary dataset type. Each WRF dataset is equivalent to one HDF5 group.

/ WRF_DATA Dim_group TKE RHO_U ETC… Time Top_bot Lat Lon Object reference First design of Schematic HDF5 File Structure of WRF Output HDF5 dataset(WRF field) HDF5 group(WRF dataset) Solid line: HDF5 datasets or sub-groups (the arrow points to) that are members of the HDF5 parent group. Dash line: the association of one HDF5 object to another HDF5 object;in terms of HDF5, object reference. May be changed depending on implementation of dimscale

An example of WRF-HDF5 output HDF5 "wrfrst_01_000002" { GROUP "/" { GROUP "DATASET=RESTART" { DATASET "ACSNOM" { DATATYPE H5T_IEEE_F32BE DATASPACE SIMPLE { ( 1, 80, 40 ) / ( H5S_UNLIMITED, 80, 40 ) } }. } }}

Must the dataset be extensible? Seems to best fit the description of the model (No timestamp limited for WRF) From J.M, the developer of WRF: impose no requirements at all on how the data are stored in the datasets.

WRF IO APIs Common APIs - Example: wrf_open_for_read ( DatasetName, Comm, IOComm_io,&SysDepInfo, DataHandle, Status ) Treat IO packages as external packages - Example: ext_open_hdf5_for_read ( DatasetName, Comm, IOComm_io,&SysDepInfo, DataHandle, Status ) May become common IO APIs for other atmospheric/ocean models like ROMS

A piece of example code SUBROUTINE wrf_open_for_read ( FileName, Comm_compute, Comm_io, SysDepInfo, &DataHandle, Status ) USE module_state_description …… Select CASE ( use_package( io_form ) ) { #ifdef NETCDF CASE(IO_NETCDF) CALL ext_ncd_open_for_read #endif #ifdef HDF5 CASE(IO_HDF5) CALL ext_hdf5_open_for_read #endif …… }

IO quilt server Not using MPI-IO Use separate I/O nodes to generate output asynchronously while computing continues in computing nodes

IO Quilt server schematic In the illustrated figure: Totally there are 17 processors 12 processors : computing tasks 4 computing groups 5 processors: IO tasks 1 IO server group Computing group One task IO group

IO Quilt server schematic continued Computing tasks IO server Asynchronous output disk Secondary IO tasks Primary IO

Evaluation of IO Quilt server Improve performance Somehow waste some computing processors If IO is too slow, computing nodes have to wait until the finishing of IO in the previous timestamp

Why using HDF5? HDF5 is the only data format that supports MPI-IO. Potential big file size in the future Potential user at NCSA- BW

Milestone highlights By March 31 st, 2003: -Complete the initial prototype implementation of WRF- HDF5 including parallel I/O support. By June 30 th, 2003: -Possibly re-design HDF5 structures of WRF output to take advantage of parallel HDF5 I/O. -Begin the initial implementation of serial and parallel versions of HDF5 module in WRF model.

Strategy of this project 1.Starting from the implementation of the sequential HDF5 parallel writer module 2.Learning parallel HDF5 on IBM NCAR SP machine 3. Designing parallel HDF5-WRF module 4.Implementing HDF5 parallel writer module 5.Testing with real case and do performance comparison study with sequential HDF5 and NetCDF.

Existing WRF IO modules NetCDF: sequential I/O that can be used in parallel run Internally a monitor will collect all the data from the computing node and write data out to the disk. IO quilt server Internal MM5 data format

Challenges for sequential mode To understand how and why WRF wants to transpose data from different memory orders to disk The current NetCDF implementation can only handle with fixed number of maximum timestep

Solutions to the challenge Transposing the data - In memory, to gain high performance of computing; fields may need to be in different orders (XYZ, XZY, YZX etc.) - In disk, we could store the data in a file with the same order as it is stored in memory. However, that may cause tool developers to do more work. For this reason, when data fields are written to the disk, they will always be transposed to the same order. When data fields are read from disk, the model will transpose to the correct memory order for each field.

An incomplete WRF-HDF5 writer Can write raw data in HDF5 format Doesn’t implement attributes yet Doesn’t implement dimensional scale yet - wait for implementation of HDF5 dimensional scale Has been tested on O2K, PC linux and NCAR IBM SP

Parallel WRF-HDF5 design Two level decomposition Logical domain Level 1 Distributed memory MPI etc.

Parallel WRF-HDF5 design Solution 1: Each computing node is also an I/O node; All nodes will participate in writing data to the output file in a parallel file system.

Computing tasks p2 p1 p0 p5 p4 p3 p8p7p6 p11p10p9 p7 p6 p8 p9 p10 p11 p5 p4 p3 p2 p1 p0 Parallel HDF5 Parallel File System Parallel WRF-HDF5 design - Solution 1

Parallel WRF-HDF5 design 1 Pros: 1) Kind of easy to implement Cons: 1) With many processors, overhead(many small writes) will be expensive.

IO Quilt server schematic continued Computing tasks IO server Asynchronous output disk Secondary IO tasks Primary IO

Parallel WRF-HDF5 design Solution 2: Like I/O quilt server, Divide nodes into computing nodes and I/O nodes. WRF data fields are written to the disk through I/O nodes via parallel HDF5.

Parallel WRF-HDF5 design-solution 2 Computing tasks Parallel File System Secondary IO tasks PHDF5

Parallel WRF-HDF5 design 2 Pros: 1)Overhead is small, scaleable? Cons: 1)Need extra work to transfer data from computing nodes to IO nodes 2)For computing bound, IO nodes kind of Waste.

Parallel WRF-HDF5 design Solution 3: Similar to solution 2, but all nodes will be used to compute; some nodes will also be used for IO. WRF data fields are written to the disk through I/O nodes via parallel HDF5.

Parallel WRF-HDF5 design-solution 3 Computing tasksParallel File System PHDF5 IO group

Parallel WRF-HDF5 design 3 Pros: 1)Overhead is small, scaleable? 2)For computing bound, gain performance. Cons: 1)Need extra work to transfer data from computing nodes to IO. 2)For IO bound, may affect performance.

An extra problem for all solutions The HDF5 dataset has to be extensible in time. This requires chunk storage. Challenge for chunk size: The dimensional size in each patch is not necessarily the same, the chunk size can not be assigned easily. Two potential solutions for chunk size problem: - find common factor of the chunk size: may cause very bad performance - Partial data transferring from computing nodes to I/O nodes and using synchronization schemes(semaphore etc.) to avoid inconsistencies when writing data from different I/O nodes to the same place in a dataset. Hard to implement and cannot assure good performance.

Another thought to handle this problem Does WRF really need WRF fields to be extensible? Answer: Not required according to WRF developers. Good news, bad news? Then how we store the raw data?

Some facts related to WRF- HDF5 module WRF-HDF5 module Sequential Computing Sequential IO Sequential HDF5 Parallel Computing Sequential IO Sequential HDF5 Parallel Computing Parallel IO Parallel HDF5