@AU_EarthObs SPD and KEA: HDF5 based file formats for Earth Observation Pete Bunting 1, John Armston 2, Sam Gillingham 3, Neil Flood 4 1. Aberystwyth University,

Slides:



Advertisements
Similar presentations
1 Projection Indexes in HDF5 Rishi Rakesh Sinha The HDF Group.
Advertisements

File Processing - Organizing file for Performance MVNC1 Organizing Files for Performance Chapter 6 Jim Skon.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Mr. D. J. Patel, AITS, Rajkot 1 Operating Systems, by Dhananjay Dhamdhere1 Static and Dynamic Memory Allocation Memory allocation is an aspect of a more.
File Systems.
1 Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes November 14, 2007.
Advance Database System
Raster Data in ArcSDE 8.2 Why Put Images in a Database? What are Basic Raster Concepts? How Raster data stored in Database?
Database Design Concepts Info 1408 Lecture 2 An Introduction to Data Storage.
Rebecca Boger Earth and Environmental Sciences Brooklyn College.
Tree-Structured Indexes. Range Searches ``Find all students with gpa > 3.0’’ –If data is in sorted file, do binary search to find first such student,
Sharing imagery and raster data in ArcGIS
Objectives Learn what a file system does
Systems analysis and design, 6th edition Dennis, wixom, and roth
1 © Prentice Hall, 2002 Physical Database Design Dr. Bijoy Bordoloi.
MPEG MPEG-VideoThis deals with the compression of video signals to about 1.5 Mbits/s; MPEG-AudioThis deals with the compression of digital audio signals.
AERONET Web Data Access and Relational Database David Giles Science Systems and Applications, Inc. NASA Goddard Space Flight Center.
Amber Annett David Bell October 13 th, What will happen What is this business about personal web pages? Designated location of your own web page.
Window NT File System JianJing Cao (#98284).
HDF5 A new file format & software for high performance scientific data management.
1 Perception, Illusion and VR HNRS 299, Spring 2008 Lecture 14 Introduction to Computer Graphics.
Component 4: Introduction to Information and Computer Science Unit 4: Application and System Software Lecture 3 This material was developed by Oregon Health.
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
File Processing - Indexing MVNC1 Indexing Jim Skon.
Raster Concepts.
Tree-Structured Indexes Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY courtesy of Joe Hellerstein for some slides.
CPSC 252 Operator Overloading and Convert Constructors Page 1 Operator overloading We would like to assign an element to a vector or retrieve an element.
HDF Hierarchical Data Format Nancy Yeager Mike Folk NCSA University of Illinois at Urbana-Champaign, USA
CE Operating Systems Lecture 17 File systems – interface and implementation.
The HDF Group Introduction to netCDF-4 Elena Pourmal The HDF Group 110/17/2015.
July 20, Update on the HDF5 standardization effort Elena Pourmal, Mike Folk The HDF Group July 20, 2006 SPG meeting, Palisades, NY.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Memory Hierarchy: Terminology Hit: data appears in some block in the upper level (example: Block X)  Hit Rate : the fraction of memory access found in.
May 30-31, 2012 HDF5 Workshop at PSI May Partial Edge Chunks Dana Robinson The HDF Group Efficient Use of HDF5 With High Data Rate X-Ray Detectors.
Adding a Hard Drive. BIOS / UEFI The Unified Extensible Firmware Interface (UEFI) defines a software interface between an operating system and platform.
Chapter 5 Record Storage and Primary File Organizations
Key Terms Attribute join Target table Join table Spatial join.
Non Contiguous Memory Allocation
Lesson Objectives Aims You should be able to:
CPSC 231 Organizing Files for Performance (D.H.)
Subject Name: File Structures
Index An index is a performance-tuning method of allowing faster retrieval of records. An index creates an entry for each value that appears in the indexed.
Moving from HDF4 to HDF5/netCDF-4
Computer Science Higher
Data Sharing We all need data
HDF5 Metadata and Page Buffering
Chapter 11: File System Implementation
Dr Samantha Lavender and Davide Mainas,
What is FITS? FITS = Flexible Image Transport System
COMP 430 Intro. to Database Systems
File Management Chase Goehring.
Web Design and Development
Updating GML datasets S-100 WG TSM September 2017
Physical Database Design
Chapter 11: File System Implementation
Further Data Structures
Coding Concepts (Data Structures)
Lecture 19: Data Storage and Indexes
The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited)
File Storage and Indexing
CSE451 Virtual Memory Paging Autumn 2002
Indexing 4/11/2019.
Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes May 16, 2008.
Chapter 11: File System Implementation
Real-World File Structures
Geoprocessing Sample Tools for Lidar Data Management
Publishing image services in ArcGIS
Geoprocessing Sample Tools for Lidar
Lecture 20: Representing Data Elements
Presentation transcript:

@AU_EarthObs SPD and KEA: HDF5 based file formats for Earth Observation Pete Bunting 1, John Armston 2, Sam Gillingham 3, Neil Flood 4 1. Aberystwyth University, UK 2. University of Maryland, USA 3. Landcare Research, NZ 4. Science Division, Queensland Government, Australia

Contents Sorted Pulse Data (SPD) Format –For storing laser scanning data KEA Image File Format –Implementation of the GDAL raster data model.

SPD: Little History… The first version of ‘SPDLib’ was written in 2008 –‘Sorted Point Data’, simply stored a 2D grid based index alongside the points file I was using a ENVI image file to store the header information (as a 2 band image). Having multiple files per datasets wasn’t ideal also LAS missing fields (e.g., height) I wanted for processing. –Colleague suggested looking at HDF John Armston visited Aberystwyth with a set of full waveform acquisitions for use in his PhD. –‘Sorted Pulse Data’ was born.

Why a Pulse? Transmitted Received Video created by John Armston using SPDLib Python binding.

SPD File Format

Sorted… Indexing makes processing faster –Cartesian –Spherical –Polar

SPD & HDF5

Why HDF5? Another file format… –Not just another block of binary you cannot do anything with unless you have a format definition. Fields can be logically named and data types defined and read from the file. –Self describing.

Compression zlib compression is used by default –Provided by HDF5 library –Compression block size can be varied using SPD header parameters File sizes are on average slight smaller than an uncompressed LAS file but larger than LAZ. –More complex data structures –Two pieces of information pulse and point(s)

KEA: Little History… Created in 2012 and funded by Landcare Research, NZ. The problem: “How to have large attribute tables of data alongside raster data?” Erdas Imagine format (HFA, *.img) supports attribute tables but compression is only supported for 32bit file sizes (i.e., < 2Gb). –Attribute tables are also uncompressed. BigTiff supports large raster imagery but not attribute tables. Initial implementation with a hdf5 file for attribute table with a separate image file (e.g., tiff). –This was untidy and having to keep track of multiple files is not desirable. “Why not just put the image in the HDF5 file with a gdal driver?” –Result the KEA HDF5 schema.

Raster Storage: KEA file format HDF5 based image file format GDAL driver –Therefore the format can be used in any GDAL compatibly software (e.g., ArcMap) Support for large raster attribute tables zlib based compression –Small file sizes –10 m SPOT mosaic of New Zealand ~5GB per island (Each approx , pixels) Bunting and Gillingham 2013

KEA File Structure This structure is essentially the GDAL raster data model. GDAL is defacto standard for EO raster data I/O. Used in open source and commercial software (e.g., ESRI). We added a few addition for our own needs. Attribute table has concept of ‘neighbours’ to allow transversal of a set of clumps (e.g., object oriented image classification).

KEA Size and Speed

Is HDF5 a good base? Yes. - We’ve found it excellent. –Coding is quick and relatively easy –No worrying about Endian etc. Originally SPD was developed on PowerPC Mac. –If used correctly compression is good, with little overhead of the HDF5 structures –Possible to make complex and flexible data structures. However, it is the data structures in the file rather the ‘file format’ that is important thing.

However, Compound data types can reduce flexibility –Not possible to dynamically add new fields (c struct) Use tables instead (as implemented in KEA attribute tables) –i.e., Single data type per table No boolean data type (C data types) –Store as int8, wasted space? No compression on ‘ragged’ data structure HDF5 file can get defragmented –Many changes (i.e., data added) happening within the file. Cannot remove data from the file –Deleting does not reduce file size. Split data into suitable compression blocks and use / process data in those blocks.

SPD v4 Updated version of SPD (v3 has been the version widely used) Learning lessons from SPD and KEA –Remove compound data types –Uses tables of single data type rather than compound data types. –Made as much optional as possible. –Multiple waveforms per pulse. Implemented in pyLiDAR – Pulses are very useful –But some times points are all you need Multiple methods of spatially indexing the data is useful –2D grid useful for many but not all applications.

Questions