Data Compression for PDS4

Slides:



Advertisements
Similar presentations
T.Sharon-A.Frank 1 Multimedia Compression Basics.
Advertisements

15 Data Compression Foundations of Computer Science ã Cengage Learning.
Data Compression CS 147 Minh Nguyen.
Image Compression. Data and information Data is not the same thing as information. Data is the means with which information is expressed. The amount of.
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
1 CS 502: Computing Methods for Digital Libraries Lecture 9 Conversion to Digital Formats Anne Kenney, Cornell University Library.
Compression & Huffman Codes
SWE 423: Multimedia Systems Chapter 7: Data Compression (1)
T.Sharon-A.Frank 1 Multimedia Size of Data Frame.
Compression & Huffman Codes Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
September,2012 File Compression 8/6/ Compiled By:- Solomon W. Demissie.
Guilford County SciVis V106.01
V Obtained from a Guildford County workshop-Summer, 2014.
Data dan Teknologi Multimedia Sesi 08 Nofriyadi Nurdam.
Spring 2015 Mathematics in Management Science Binary Linear Codes Two Examples.
Image Formation and Digital Video
File Formats Different applications (programs) store data in different formats. Applications support some file formats and not others. Open…, Save…, Save.
Data Compression for PDS4 Lisa Gaddis, Sue LaVoie, Jeff Anderson, Elizabeth Rye PDS Imaging Node March 26, 2010.
Media File Formats Jon Ivins, DMU. Text Files n Two types n 1. Plain text (unformatted) u ASCII Character set is most common u 7 bits are used u This.
Compression Algorithms Robert Buckley MCIS681 Online Dr. Smith Nova Southeastern University.
Lecture 10 Data Compression.
CS 1308 Computer Literacy and the Internet. Creating Digital Pictures  A traditional photograph is an analog representation of an image.  Digitizing.
1 Analysis of Algorithms Chapter - 08 Data Compression.
Linux+ Guide to Linux Certification Chapter Thirteen Compression, System Back-Up, and Software Installation.
Multimedia Specification Design and Production 2012 / Semester 1 / L3 Lecturer: Dr. Nikos Gazepidis
Introduction to Interactive Media 03: The Nature of Digital Media.
Data Compression. Compression? Compression refers to the ways in which the amount of data needed to store an image or other file can be reduced. This.
Images 01/29/04 Resources: Yale Web Style Guide The GIF Controversy Unisys - lzw.
Image Compression (Chapter 8) CSC 446 Lecturer: Nada ALZaben.
In this lecture, you will learn: 1 Basic ideas of video compression General types of compression methods.
1 Classification of Compression Methods. 2 Data Compression  A means of reducing the size of blocks of data by removing  Unused material: e.g.) silence.
Addressing Image Compression Techniques on current Internet Technologies By: Eduardo J. Moreira & Onyeka Ezenwoye CIS-6931 Term Paper.
Advances in digital image compression techniques Guojun Lu, Computer Communications, Vol. 16, No. 4, Apr, 1993, pp
Comp 335 File Structures Data Compression. Why Study Data Compression? Conserves storage space Files can be transmitted faster because there are less.
STATISTIC & INFORMATION THEORY (CSNB134) MODULE 11 COMPRESSION.
Sound (analogue signal). time Sound (analogue signal) time.
Multi-media Data compression
CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 7 – Basics of Compression (Part 2) Klara Nahrstedt Spring 2012.
Introduction to Interactive Media Interactive Media Raw Materials: Digital Data.
Information Systems Design and Development Media Types Computing Science.
Submitted To-: Submitted By-: Mrs.Sushma Rani (HOD) Aashish Kr. Goyal (IT-7th) Deepak Soni (IT-8 th )
Data Compression Michael J. Watts
Multimedia Systems Dr. Wissam Alkhadour.
File Formats Different applications (programs) store data in different formats. Applications support some file formats and not others. Open…, Save…, Save.
GCSE COMPUTER SCIENCE Topic 3 - Data 3.3 Data Storage and Compression.
Textbook does not really deal with compression.
File Compression 3.3.
Compression & Huffman Codes
4k… 4K format was named because it has 4000 pixels horizontal resolution approximately. Meanwhile, standard 1080p and 720p resolutions were named because.
IMAGE COMPRESSION.
Video Basics.
Data Compression.
Multimedia Outline Compression RTP Scheduling Spring 2000 CS 461.
Associated Hardware and File Handling
CIS265/506 Files & Indexing CIS265/506: File Indexing.
Data Compression.
Video Compression - MPEG
Chapter 7.2: Layer 5: Compression
Data Compression CS 147 Minh Nguyen.
Overview What is Multimedia? Characteristics of multimedia
Why Compress? To reduce the volume of data to be transmitted (text, fax, images) To reduce the bandwidth required for transmission and to reduce storage.
short term and long term speed, capacity, compression formats, access
Topic 3: Data Compression.
UNIT IV.
Image Coding and Compression
15 Data Compression Foundations of Computer Science ã Cengage Learning.
Chapter 8 – Compression Aims: Outline the objectives of compression.
WJEC GCSE Computer Science
15 Data Compression Foundations of Computer Science ã Cengage Learning.
Presentation transcript:

Data Compression for PDS4 Lisa Gaddis, Sue LaVoie, Jeff Anderson, Elizabeth Rye PDS Imaging Node March 26, 2010

Syntax Data Compression Image Compression File Compression Encodes information using fewer bits Reduces consumption of expensive resources Data storage and/or transmission bandwidth Requires decompression Trade-offs degree of compression amount of ‘distortion’ introduced computational resources required for decompression Image Compression Application of data compression to digital images Reduces redundancy in images to improve efficiency of storage and transmission Lossless and lossy methods Preserve image quality at a given bit- or compression-rate File Compression Reduces redundancy at the file level Many available tools ZIP GZIP BZIP2 Imaging Node Data Compression

Why image compression? Image compression for data providers and archivists NASA missions deliver significant numbers of large image files Need to support and/or reduce storage costs and data transmission times of images Promotes exchange between different users and systems Athough falling in cost, storage is expensive for many TB of data and multiple copies FY10: ~$750/TB for RAID storage with network infrastructure Imaging Node Data Compression

Image Compression Lossless compression Lossy compression Exploits data redundancy Image can be recovered exactly ‘Run-length encoding’ makes use of redundant patterns or ‘runs’ ‘LZW (Lempel Ziv Welch) encoding’ also address strings of characters; builds up a table of strings and their corresponding codes ‘Huffman coding’ uses a binary encoding tree to represent commonly occurring values in few bits and less frequently occurring values in more bits Best for documents, computer programs, line drawings, etc. JPEG2000 has a lossless option, approved for use by PDS Lossy compression Exploits data redundancy and ‘irrelevant’ data Image data are not recovered exactly JPEG JPEG2000 (lossy) Best for digital images, audio, video Not approved for PDS archive data Exceptions: Browse and some EDR images (e.g., Clementine UVVIS and NIR) are lossy JPEG images (5.5 ave. compression rate) Imaging Node Data Compression

MRO and LRO images Not your typical images MESSENGER MDIS, Viking Orbiter, Galileo SSI, etc. Framing cameras 800 samples x 800 lines to 1024 samples x 1024 lines Roughly one megabyte (MB) per observation PDS Imaging Node combined archive requirements for all missions other than LRO and MRO is <25 TB MRO/HiRISE, LRO/LROC Line-scan cameras 10,000-20,000 samples x 50,000-100,000 lines Roughly 500 to 2,000 MB per observation Combined expected archive total for MRO and LRO is 500 TB 20X larger than sum total of all other Imaging Node holdings Imaging Node Data Compression

Image Compression for HiRISE RDRs Why image compression was needed Enormous volume of HiRISE archive, 1 yr EDR – 12,100 Gb (~1.5 TB) RDR – 92,500 Gb (11.3 TB) Very large Standard Data Products EDR (2048 X 64,000, 16-bit) = 262 MB RDR (40,000 x 64,000, after reprojection, 16-bit) = ~500 to 1000 MB Advantages for delivery of RDR data in JPEG2000 format Losslessly recompressed format Wavelet compression greatly improves speed of web access Fast browse, zoom, pan capabilities for handling large files Volume projections EDR DVD volumes: 321 (losslessly recompressed) vs 482 (uncompressed) (1.5 compression ratio) RDR DVD volumes: 2400 (losslessly compressed) vs 7300 (uncompressed) (assuming 3.0 compression ratio) Imaging Node Data Compression

HiRISE Example JPEG2000 image compression applied to map-projected RDR images only lots of null pixels Nulls are highly compressed as a result of the lossless compression using JPEG2000 Projected ~3:1 compression ratios Achieved 15:1 in recent tests Imaging Node Data Compression

Past Experience Problems with compression Voyager, Viking, and MGS-MOC PDS archives contain losslessly compressed data Decompression algorithms (e.g., in ISIS) break due to New compilers New operating systems Changes in hardware architecture (32-bit vs 64-bit) JPEG2000 compressed HiRISE RDR images are supported by ISIS3 But, when JPEG2000 format reaches end-of-life, software maintenance to read this format will be much more difficult than the existing Voyager/Viking/MGS-MOC algorithms A proliferation of image compression formats in PDS would be a problem for long-term archiving and usability of the images Imaging Node Data Compression

Data Storage Costs: MRO & LRO Expected PDS storage requirements for the MRO nominal mission are 75TB High capacity RAID storage & network infrastructure costs ~$750 per TB The hardware cost to store a single copy of the MRO data is ~$56K Only one copy of the three required by PDS Does not include data from an extended mission Archive includes JPEG2000 compressed images LRO archive volume is projected to be ~400 TB Hardware cost for one copy is ~$300K Same caveats as above apply Imaging Node Data Compression

PDS3 Compressed Image Formats Clem-JPEG (not in PDS Standards Reference) Huffman First Difference (“) JPEG2000 Improved compression efficiency (vs. JPEG) Highly scalable embedded data streams Progressive lossy to lossless compression within a single data stream Arbitrarily crop images in the compressed domain Selectively enhance quality of spatial “regions of interest” Support for very large images Used for HiRISE & LROC RDRs Previous Pixel (“) Run Length (“) Zip, gzip = GNU zip Widely used open-source tool Runs on a variety of common computer platforms Available since 1992 Imaging Node Data Compression

Possible Solution for PDS4 Allow File Compression Use standard, non-patented algorithms (e.g., Lempel-Ziv 77, Huffman coding) Use stable, open-source, well-maintained software (e.g., gzip) Tests using gzip, HiRISE data RDRs HiRISE RDR, JPEG2000 = 454 MB Uncompressed, converted to raw format = 6.6 GB (15x larger) Compressed using gzip = 1.1 GB (2.5x larger) EDRs Not compressed, typical file size = 250 MB gzipped versions = 100 MB (2.5x smaller) Overall the HiRISE archive would be 5% smaller gzip EDRs Convert RDRs to raw, then gzip Imaging Node Data Compression

Recommendation Allow file-based compression (such as gzip, bzip2) in PDS4 Stable, free, widely used open-source software tool Works on a variety of common computer platforms Macs, PCs, Solaris, MSDOS, VAX, etc. Maintained by open-source community Consistent with PDS3 history, PDS4 plans for simplification Reduces storage costs Improves data transfer rates over internet Supports management and delivery of high-volume data sets for providers and users Imaging Node Data Compression

Policy Questions Do we permit compression at all in the PDS4 archive? If so: Do we want a mixture of compressed and uncompressed data? One copy is uncompressed, two are compressed Do we distinguish between EDRs and RDRs and other derived products? Do we distinguish between frequently accessed data and those offline and/or in ‘deep archive’ storage? Store deep archive data in uncompressed form or use one approved compression format (e.g., gzip) Permit nodes to use and maintain other compression methods as needed for one or more copies Whatever we decide, do we require older, compressed data to be ‘restored’ to meet requirements of the new compression policy? Imaging Node Data Compression