Lossy compression of structured scientific data sets

Slides:



Advertisements
Similar presentations
Switching Techniques In large networks there might be multiple paths linking sender and receiver. Information may be switched as it travels through various.
Advertisements

T.Sharon-A.Frank 1 Multimedia Compression Basics.
Speech Compression. Introduction Use of multimedia in personal computers Requirement of more disk space Also telephone system requires compression Topics.
An Introduction to Sparse Coding, Sparse Sensing, and Optimization Speaker: Wei-Lun Chao Date: Nov. 23, 2011 DISP Lab, Graduate Institute of Communication.
Storey: Electrical & Electronic Systems © Pearson Education Limited 2004 OHT 26.1 Data Acquisition and Conversion  Introduction  Sampling  Signal Reconstruction.
Lossless Compression of Meteorological Data in GRIB Format R. Lorentz Fraunhofer Institute for Scientific Computation and Algorithms (SCAI) Germany.
A Matlab Playground for JPEG Andy Pekarske Nikolay Kolev.
Multimedia for the Web: Creating Digital Excitement Multimedia Element -- Graphics.
Data dan Teknologi Multimedia Sesi 08 Nofriyadi Nurdam.
Storage Devices and Media
Chapter 3 – Computer Hardware Computer Components – Hardware (cont.) Lecture 3.
Digital Video An Introduction to the Digital Signal File Formats Acquisition IEEE 1394.
Switching Techniques Student: Blidaru Catalina Elena.
Digital Sound and Video Chapter 10, Exploring the Digital Domain.
TERMS TO KNOW. Programming Language A vocabulary and set of grammatical rules for instructing a computer to perform specific tasks. Each language has.
Department of Physics and Astronomy DIGITAL IMAGE PROCESSING
GODIAN MABINDAH RUTHERFORD UNUSI RICHARD MWANGI.  Differential coding operates by making numbers small. This is a major goal in compression technology:
Alan Norton Multiresolution Visualization and Analysis of Turbulence using VAPOR Alan Norton NCAR/CISL Boulder, CO USA Turbulent Theory and Modeling: GTP.
Compression is the reduction in size of data in order to save space or transmission time. And its used just about everywhere. All the images you get on.
Flash Cards Computer Technology.
COMP Bitmapped and Vector Graphics Pages Using Qwizdom.
Digital Camcorder and Video Computer Multimedia. Two most important factors that make up a video Frames per second ( fps ) The resolution ( # of pixels.
Digital Video and Multimedia If images can portray a powerful message then video (as a series of related images) is a serious consideration for any multimedia.
Kalman filtering techniques for parameter estimation Jared Barber Department of Mathematics, University of Pittsburgh Work with Ivan Yotov and Mark Tronzo.
Audio Compression Usha Sree CMSC 691M 10/12/04. Motivation Efficient Storage Streaming Interactive Multimedia Applications.
Glencoe Introduction to Multimedia Chapter 9 Video 1 Chapter Video 9  Section 9.1 Video in Multimedia  Section 9.2 Work with Video Contents.
Multimedia and The Web.
Analogue vs Digital. Analogue  Lots of different frequencies, lots of different amplitudes  Wave recorded as it is.
 In electrical engineering and computer science image processing is any form of signal processing for which the input is an image, such as a photograph.
Data Representation and Storage Lecture 5. Representations A number value can be represented in many ways: 5 Five V IIIII Cinq Hold up my hand.
1 3 Computing System Fundamentals 3.7 Utility Software.
Switching breaks up large collision domains into smaller ones Collision domain is a network segment with two or more devices sharing the same Introduction.
Quality Levels of Reproduction Adolf Knoll National Library of the Czech Republic.
Binary and Hard Disk Aslin Izmitli PEOPLE Program.
In this lecture, you will learn: 1 Basic ideas of video compression General types of compression methods.
Image Compression Supervised By: Mr.Nael Alian Student: Anwaar Ahmed Abu-AlQomboz ID: IT College “Multimedia”
Communicating Quantitative Information Is a picture worth 1000 words? Digital images. Number bases Standards, Compression Will [your] images last? Homework:
The Worlds of Database Systems From: Ch. 1 of A First Course in Database Systems, by J. D. Pullman and H. Widom.
Mark Rast Laboratory for Atmospheric and Space Physics Department of Astrophysical and Planetary Sciences University of Colorado, Boulder Kiepenheuer-Institut.
Raster data models Rasters can be different types of tesselations SquaresTrianglesHexagons Regular tesselations.
Lecture 7: Intro to Computer Graphics. Remember…… DIGITAL - Digital means discrete. DIGITAL - Digital means discrete. Digital representation is comprised.
2005/12/021 Fast Image Retrieval Using Low Frequency DCT Coefficients Dept. of Computer Engineering Tatung University Presenter: Yo-Ping Huang ( 黃有評 )
VAPoR: A Discovery Environment for Terascale Scientific Data Sets Alan Norton & John Clyne National Center for Atmospheric Research Scientific Computing.
Dasar-Dasar Multimedia
Image File Formats. What is an Image File Format? Image file formats are standard way of organizing and storing of image files. Image files are composed.
Chapter 1 Background 1. In this lecture, you will find answers to these questions Computers store and transmit information using digital data. What exactly.
A Quick Illustration of JPEG 2000 Presented by Kim-Huei Low Chun Data Fok.
1 LES of Turbulent Flows: Lecture 7 (ME EN ) Prof. Rob Stoll Department of Mechanical Engineering University of Utah Fall 2014.
Sound (analogue signal). time Sound (analogue signal) time.
Chapter 8 Lossy Compression Algorithms. Fundamentals of Multimedia, Chapter Introduction Lossless compression algorithms do not deliver compression.
Memory The term memory is referred to computer’s main memory, or RAM (Random Access Memory). RAM is the location where data and programs are stored (temporarily),
RATE SCALABLE VIDEO COMPRESSION Bhushan D Patil PhD Research Scholar Department of Electrical Engineering Indian Institute of Technology, Bombay Powai,
Raster Data Models: Data Compression Why? –Save disk space by reducing information content –Methods Run-length codes Raster chain codes Block codes Quadtrees.
Spatial Data Models Geography is concerned with many aspects of our environment. From a GIS perspective, we can identify two aspects which are of particular.
DIGITAL COMMUNICATION. Introduction In a data communication system, the output of the data source is transmitted from one point to another. The rate of.
How Has This Course Changed Your Perception of Digital Media
GCSE COMPUTER SCIENCE Topic 3 - Data 3.3 Data Storage and Compression.
Denary to Binary Numbers & Binary to Denary
Vocabulary byte - The technical term for 8 bits of data.
Unit 2- Lesson 1 & 2- Bytes and File Sizes / Text Compression
Unit 2- Lesson 1 & 2- Bytes and File Sizes / Text Compression
BTEC NCF Dip in Comp - Unit 02 Fundamentals of Computer Systems Lesson 10 - Text & Image Representation Mr C Johnston.
Vocabulary byte - The technical term for 8 bits of data.
Vocabulary byte - The technical term for 8 bits of data.
Switching Techniques In large networks there might be multiple paths linking sender and receiver. Information may be switched as it travels through various.
Introduction to Computers
Compression, Lossy, Lossless
Computer Systems – Unit 1
Switching Techniques.
Unit 2- Lesson 1 & 2- Bytes and File Sizes / Text Compression
Presentation transcript:

Lossy compression of structured scientific data sets -Shreya Mittapalli New Jersey Institute of Technology Friday Jul 31, 2015 NCAR Supervisors: John Clyne and Alan Nortan HSS, CISL-NCAR

Problem we are trying to solve: Due to advancement in technology, large data is collected by the supercomputers, satellites, etc. There are two problems with Big Data:- The hard-disk which collects the data might not have enough disk-space. The speed at which the data can be read might be much lesser than the required speed. For example:

To tackle this problem, we compress the data. One way to compress the data is using Wavelets. Because of their multi-resolution and information compaction properties, wavelets are widely used for lossy compression in numerous consumer multimedia applications (e.g. images, music, and video). For example:

The parrot is compressed in the ratio 1:35 and the rose 1:18 using wavelets Source: http://arxiv.org/ftp/arxiv/papers/1004/1004.3276.pdf

What is Lossy Compression? Lossy Compression is the class of data encoding methods that uses inexact approximations (or partial data discarding) to represent the content. These techniques are used to reduce data size for storage, handling, and transmitting content. Source: Wikipedia

In lossy compression: Advantage Disadvantage Compressed data can be stored in hard disk and it also saves a lot of computation time While reconstructing back the data, some data is permanently lost.

Project Goal To determine compression parameters that: minimize distortion for a desired output file size. reduce the computation time and come with the best possible outcome.

Experiments done To achieve the project goal, we have been attempting to experimentally determine the optimal parameter choices for compressing numerical simulation data using wavelets. For this we experimented on three different big data sets, viz., two wrf hurricane data sets Katrina and Sandy and one turbulence data set Taylor Green.

Sandy ——— Grid resolution; 5320 x 5000 x 149 (= 16 Gigabytes / 3D variable) # 3D variables : 15 Time steps ~100 Total data set size: ~24Terabytes Katrina ————— Grid resolution; 316 x 310 x 35 (= 10 Megabytes / 3D variable) # 3D variables : 12 Time steps ~60 Total data set size: ~9 Gigabytes TG — Grid resolution; 1024^3 (= 4 Gigabytes / 3D variable) # 3D variables : 6 Total data set size: ~2.5Terabytes

Images of Hurricane Katrina which occurred on 29th August, 2005.

Images of Hurricane Sandy which occurred on October 25, 2012

Measurements: lmax, rmse, time Image of vortex iso-surfaces in a viscous flow starting from Taylor-Green initial conditions. Source : http://www.galcit.caltech.edu/research/highlights We constructed a python framework that allowed us to change various compression parameters like wavelet type and block size each time. Measurements: lmax, rmse, time Compression ratios:1,2,4,16,32,64,128,256

Compression parameters we wanted to explore: 1) Compare wavelet-types Bior3.3 and Bior4.4 The wavelet Bior4.4 is also called CDF9/7 wavelet which is widely used in the digital signal processing and image compression. The wavelet Bior3.3 is traditionally used in Vapor software. Goal: Determine if Bior4.4 is better than Bior3.3

Compression parameters we wanted to explore: 2) Compare block size 64x64x64 with other block sizes. 64 256

Compression parameters we wanted to explore: 2) Compare block size 64x64x64 with other block sizes. Determine if smaller blocks are better than the larger blocks. The two contrasting features are:- Smaller blocks are more computationally efficient than larger blocks. Larger blocks introduce less artefacts than the smaller blocks.

b) If the block sizes are not in integral multiples of the 64, some extra data is introduced to cover up the gap. This is called padding. The problem with padding is that while we are looking to compress the data, an extra data is introduced. For TG data, there is no padding but for Katrina and Sandy data, we have 50% and 30% padding respectively. Goal: Determine if the aligned data has comparable errors with the padded data. Example to illustrate padding:

64 50 64 196 149 150 50 64 50

We did the following three experiments: We compared the wavelet types Bior3.3 and bior4.4 for all the three data sets. We compared larger blocks with smaller blocks. For TG: 64x64x64 vs 128x128x128 vs 256x256x256 We compared padded data with aligned data. a) For Katrina: 64x64x64 vs 64x64x35 b) For Sandy: 64x64x64 vs 64x64x50

The plots for Katrina data illustrating Experiment 1. Bior3.3 vs bior4.4

Aligned data vs padded data The plot for sandy data illustrating Experiment 2. Aligned data vs padded data

Bigger blocks vs smaller blocks The plots for TG data illustrating Experiment 3. Bigger blocks vs smaller blocks

Lmax error for the wx variable of TG data set for the block sizes 64x64x64,128x128x128 and 256x256x256

RMSE error for the wx variable of TG data set for the block sizes 64x64x64,128x128x128 and 256x256x256

 Source: Pablo Mininni, U. of Buenos Aires. When using a larger block size (256^3 vs 64^3) for the vx component of the TG data set( the data is compressed 512:1), we see improved compression quality as illustrated above:

Time taken for the wx variable of TG data set to construct the raw data for the block sizes 64x64x64,128x128x128 and 256x256x256

Conclusion: Bior4.4 is in some cases better than Bior3.3 Surprisingly larger block (say 256x256x256) is better than 64x64x64 in terms of both the computation time and error. The errors of the aligned data and the padded data are comparable.

Acknowledgements: My Supervisors John Clyne and Alan Nortan for their continued support. Dongliang Chu, Samuel Li and Kim Zhang. Delilah Gail Rutledge NCAR