Marlo Maddox Code 587 Advanced Data Management & Analysis Branch HDF/HDF-EOS Workshop VII - Silver Spring, MD September 23 – 25, 2003 An Evaluation of.

Slides:



Advertisements
Similar presentations
Computer Architecture
Advertisements

Memory.
Symbol Table.
Chapter 12 File Management Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,
DETAILED DESIGN, IMPLEMENTATIONA AND TESTING Instructor: Dr. Hany H. Ammar Dept. of Computer Science and Electrical Engineering, WVU.
C6 Databases.
Preparing CMOR for CMIP6 and other WCRP Projects
File Management Chapter 12. File Management File management system is considered part of the operating system Input to applications is by means of a file.
File Management Chapter 12. File Management A file is a named entity used to save results from a program or provide data to a program. Access control.
NetCDF An Effective Way to Store and Retrieve Scientific Datasets Jianwei Li 02/11/2002.
1 File Management Chapter File Management File management system consists of system utility programs that run as privileged applications Input to.
Physical design. Stage 6 - Physical Design Retrieve the target physical environment Create physical data design Create function component implementation.
DESIGN OF LARGE SCALE DATA ARCHIVAL AND RETRIEVAL SYSTEM FOR TRANSPORTATION SENSOR (WRITE-ONCE-READ-MANY TYPE) DATA. by Nirish Dhruv Department of Computer.
File Management.
Chapter 9 Database Design
1 Angel: Interactive Computer Graphics 4E © Addison-Wesley 2005 Models and Architectures Ed Angel Professor of Computer Science, Electrical and Computer.
File Management Chapter 12.
Rebecca Boger Earth and Environmental Sciences Brooklyn College.
© 2011 Autodesk Freely licensed for use by educational institutions. Reuse and changes require a note indicating that content has been modified from the.
File Management Chapter 12. File Management File management system is considered part of the operating system Input to applications is by means of a file.
Databases C HAPTER Chapter 10: Databases2 Databases and Structured Fields  A database is a collection of information –Typically stored as computer.
Metadata Creation with the Earth System Modeling Framework Ryan O’Kuinghttons – NESII/CIRES/NOAA Kathy Saint – NESII/CSG July 22, 2014.
Beyond the Visualization Pipeline Werner Benger 1, Marcel Ritter, Georg Ritter, Wolfram Schoor 1 Scientific Visualization Group Center for Computation.
Copyright © 2012 Pearson Education, Inc. Chapter 8 Two Dimensional Arrays.
HDF5 A new file format & software for high performance scientific data management.
Ohio State University Department of Computer Science and Engineering Automatic Data Virtualization - Supporting XML based abstractions on HDF5 Datasets.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Development of ORBIT Data Generation and Exploration Routines G. Shelburne K. Indireshkumar E. Feibush.
CS 390- Unix Programming Environment CS 390 Unix Programming Environment Topics to be covered: Distributed Computing Fundamentals.
Chapter 3 Digital Representation of Geographic Data.
The netCDF-4 data model and format Russ Rew, UCAR Unidata NetCDF Workshop 25 October 2012.
File Management Chapter 12. File Management File management system is considered part of the operating system Input to applications is by means of a file.
1 File Management Chapter File Management n File management system consists of system utility programs that run as privileged applications n Concerned.
Environment Change Information Request Change Definition has subtype of Business Case based upon ConceptPopulation Gives context for Statistical Program.
The Community Coordinated Modeling Center: A Brief Overview NASA Goddard Space Flight Center Lika Guhathakurta
1 Memory Management (b). 2 Paging  Logical address space of a process can be noncontiguous; process is allocated physical memory whenever the latter.
- 1 - HDF5, HDF-EOS and Geospatial Data Archives HDF and HDF-EOS Workshop VII September 24, 2003.
Hank Childs, University of Oregon Lecture #3 Fields, Meshes, and Interpolation (Part 2)
CCGrid, 2012 Supporting User Defined Subsetting and Aggregation over Parallel NetCDF Datasets Yu Su and Gagan Agrawal Department of Computer Science and.
The HDF Group Introduction to netCDF-4 Elena Pourmal The HDF Group 110/17/2015.
Connections to Other Packages The Cactus Team Albert Einstein Institute
Jay Lofstead Input/Output APIs and Data Organization for High Performance Scientific Computing November.
HDF and HDF-EOS Workshop VII September 24, 2003 HDF5, HDF-EOS and Geospatial Data Archives Don Keefer Illinois State Geological Survey Mike Folk Univ.
Chapter 1: Overview of SAS System Basic Concepts of SAS System.
Parallel I/O Performance Study and Optimizations with HDF5, A Scientific Data Package MuQun Yang, Christian Chilan, Albert Cheng, Quincey Koziol, Mike.
Global MHD Simulation with BATSRUS from CCMC ESS 265 UCLA1 (Yasong Ge, Megan Cartwright, Jared Leisner, and Xianzhe Jia)
Outline of the Objects used in Analysis Base Classes (module) Compound Classes (module) Utilities/Helpers (module) MrsData (mrs_data) Isa: MrsPack Container.
Silberschatz, Galvin and Gagne ©2011 Operating System Concepts Essentials – 8 th Edition Chapter 2: The Linux System Part 2.
Two Dimensional Arrays. Students will be able to: code algorithms to solve two- dimensional array problems. use 2-D arrays in programs. pass two-use 2-D.
PIDX PIDX - a parallel API to capture the data models used by HPC application and write it out in an IDX format. PIDX enables simulations to write out.
Hello world !!! ASCII representation of hello.c.
METADATA IN.NET Presented By Sukumar Manduva. INTRODUCTION  What is Metadata ? Metadata is a binary information which contains the complete description.
Coupling and Cohesion Pfleeger, S., Software Engineering Theory and Practice. Prentice Hall, 2001.
Copyright © 2010 The HDF Group. All Rights Reserved1 Data Storage and I/O in HDF5.
Creating Database Objects
reduction data treatment for ARCS
Chapter 12 File Management
SOFTWARE DESIGN AND ARCHITECTURE
Lecture 2 The Relational Model
CSCE 990: Advanced Distributed Systems
Databases.
RELATIONAL DATABASE MODEL
Paging and Segmentation
GENERAL VIEW OF KRATOS MULTIPHYSICS
Introduction of Week 13 Return assignment 11-1 and 3-1-5
Lecture 2 Components of GIS
Database Design Hacettepe University
Creating Database Objects
Integrated Statistical Production System WITH GSBPM
Presentation transcript:

Marlo Maddox Code 587 Advanced Data Management & Analysis Branch HDF/HDF-EOS Workshop VII - Silver Spring, MD September 23 – 25, 2003 An Evaluation of Science Data Formats and Their Use at the Community Coordinated Modeling Center

The Community Coordinated Modeling Center What the CCMC provides: Scientific validation Model coupling Metrics implementations Advanced visualization Model runs on request

Covering the Entire Domain

Space Weather Models patch-panel architecture

Challenges No rules for standard model interfaces Each new model has unique output format Developer/user needs to become familiar with internal structure of each output file Custom read routines to access model data Data is not self describing Reduces portability and reuse of –Data output itself –Tools created to analyze data

Every Models Output Is Unique Specialized I/O routines required for every interface Unsuitable for use in flexible model chain No commonality between data passing through interfaces m Advanced Visualization Tools Storage each models output in different formats model 1 model 2 model 3 model n n input modules n x m interfaces required Environment Without Standard

Every Models Output Is Unique m Advanced Visualization Tools Storage each models output in one standard format model 1 model 2 model 3 model n data format converter one input module n + m interfaces required Original output can be preserved Standard format for storage, coupling, & visualization Model developers continue to have freedom of choice Ensures compatibility between models for coupling Ground work for which standard, reusable interfaces and tools can be developed Standardized Environment

Model Selected for Testing Block-adaptive-tree-Solarwind-roe-upwind-scheme ( BATSRUS ) global magnetosphere MHD model –Developed by CSEM at university of Michigan –Uses MPI and Fortran 90 standard –Executes on massively parallel computer systems –Adaptive grid of blocks arranged in varying degrees of spatial refinement levels –Solves 3D MHD equations in finite volume form using numerical methods related to roe’s approximate Riemann solver –Attached to an ionospheric potential solver that provides electric potentials and conductances in the ionosphere

Understanding the BATSRUS Models Output magnetospheric plasma parameters –Atomic mass unit density –Pressure –Velocity –Magnetic field –Electric currents ionospheric parameters –Electric potential –Hall and Pedersen conductances General Scientific Output

BATSRUS.OUT File variable values grid information data variables names special parameters dimension sizes time step information units bytevalue number of bytes n for next record 5 n n+1 n+2 n+3 n+4 number of bytes n for previous record n bytes containing units for variables R amu/cm3 km/s nT nPa J/m3 uA/m2

BATSRUS.OUT File variable values grid information data variables names special parameters dimension sizes time step information units general information static non-variant data

BATSRUS.OUT File variable values grid information data variables names special parameters dimension sizes time step information units 4 byte record buffer all x positions values all y positions values all z positions values

BATSRUS.OUT File variable values grid information data variables names special parameters dimension sizes time step information units jz jy jx e p b1z b1y b1x bz by bx uz uy ux rho 4 byte record buffer all b1x values

Designing the CDF CDF files have two main components –Attributes – metadata describing contents of CDF Global – describe CDF as a whole Variable – describe specific characteristics of the variables –Records – collections of variables Scalar Vector N-dimensional arrays ( where n <= 10 ) Identify potential metadata ( or any static data ) from original output file Include this data in the global attributes portion of the CDF

CDF Variables CDFs contain two types of variables –rVariables – all have the same dimensionality –zVariables – can each have different dimensionalities CDF Dimensionality –a variable with one dimension is like an array –number of elements in array correspond to the dimension size

CCMC CDF Variables jz jy jx e p b1z b1y b1x bz by bx uz uy ux rho x y z BATSRUS model contains 18 dynamic variables –3 position variables –15 plot variables 18 CDF rVariables –one record per variable –one dimensional variables –dimension size = number of cells in grid –18 records vs million in previous scheme

BATRUS.OUT to CDF 1:[1] = :[2] = :[3] = :[4] = :[5] = :[6] = :[7] = :[8] = first column indicates current record number column two references the current records element index – each element of the record stores a value for the current variable 1:[9] = :[10] = :[11] = :[12] = :[13] = :[14] = :[15] = :[16] = :[17] = :[18] = :[19] = :[20] = :[21] = :[22] = :[23] = :[24] = :[ ] = :[ ] = :[ ] = :[ ] = :[ ] = :[ ] = :[ ] = :[ ] =

CDF Attributes ! Skeleton table for the "bats_2_cdf_OUTPUT.cdf" CDF. ! Generated: Monday, 22-Sep :06:08 ! CDF created/modified by CDF V2.7.1 ! Skeleton table created by CDF V2.7.1 #header CDF NAME: bats_2_cdf_OUTPUT.cdf DATA ENCODING: NETWORK MAJORITY: ROW FORMAT: SINGLE ! Variables G.Attributes V.Attributes Records Dims Sizes ! / /z #GLOBALattributes ! Attribute Entry Data ! Name Number Type Value ! "Project" 1: CDF_CHAR { "CCMC" }. "Disclamer" 1: CDF_CHAR { "INSERT TERMS OF USAGE HERE" }. "Generated_By" 1: CDF_CHAR { "Marlo Maddox" }. "Generation_Date" 1: CDF_CHAR { "3/27/2003" }. "Simulation_Model" 1: CDF_CHAR { "BATSRUS" }.

"Elapsed_Time_In_Seconds" 1: CDF_FLOAT { }. "Number_Of_Dimensions" 1: CDF_INT4 { -3 }. "Number_Of_Special_Parameters" 1: CDF_INT4 { 10 }. "Special_Parameters" 1: CDF_FLOAT { } 2: CDF_FLOAT { } 3: CDF_FLOAT { } 4: CDF_FLOAT { 3.0 } 5: CDF_FLOAT { 1.0 } 6: CDF_FLOAT { 1.0 } 7: CDF_FLOAT { 3.0 } 8: CDF_FLOAT { 6.0 } 9: CDF_FLOAT { 6.0 } 10: CDF_FLOAT { 6.0 }. "Number_Of_Plot_Variables" 1: CDF_INT4 { 15 }. "X_Dimension_Size" 1: CDF_INT4 { }. "Y_Dimension_Size" 1: CDF_INT4 { 1 }. "Z_Dimension_Size" 1: CDF_INT4 { 1 }. "Current_Iteration_Step" 1: CDF_INT4 { }. CDF Attributes

#variables ! Variable Data Number Record Dimension ! Name Type Elements Variance Variances ! "x" CDF_FLOAT 1 T T ! Attribute Data ! Name Type Value ! "Description" CDF_CHAR { "X position for center of cell in grid..." } "Dictionary_Key" CDF_CHAR { "CCMC/SWMF Data Dictionary Entry" } "Valid_Min" CDF_FLOAT { } "Valid_Max" CDF_FLOAT { }. ! RV values were not requested. CDF Variables

Compression Performance Tests

Performance Score (original_file_size) (cdf_file_size) 1 t *

Performance Results Optimal CDF storage format –Single one-record rVariables –Dimension size equal to number of cells in grid Uncompressed CDF creation time of 1.5 seconds CDF file size virtually the same as original BATSRUS output file size Method could be applied to additional models in similar fashion

Conclusion BATRUS.Out to CDF conversion results promising –1.5 second uncompressed CDF creation time –Resulting file size virtually unchanged OpenDx successfully imported CDF data using standard input module (only had to specify input file name) –Requires minimal initial development to correctly categorize imported data Closer to establishing a data format standard within the CCMC

Future Work Research HDF 5 data standard Test BATRUS output conversion performance with HDF 5 Compare CDF vs. HDF 5 performance Propose use of either or both Develop standard naming conventions for variables ( similar to ISTP program )

generic attributes list (.h) generic/default variable attributes list (.h) model specific attributes list (.h) model specific variable attributes list (.h) Registered Variables List CCMC_name native/alias x x_pos, xp y y_pos, yp Model Variable List MAP global/file attributes variable attributes variable names main conversion routine assembled standard model components main read driver main write driver read model a routine read model b routine read model n routine convert to cdf convert to hdf5 standard data file with common attributes and variable names for each registered model Conversion Software Architecture