Data Compression for PDS4 Lisa Gaddis, Sue LaVoie, Jeff Anderson, Elizabeth Rye PDS Imaging Node March 26, 2010.

Slides:



Advertisements
Similar presentations
T.Sharon-A.Frank 1 Multimedia Compression Basics.
Advertisements

15 Data Compression Foundations of Computer Science ã Cengage Learning.
Data Compression CS 147 Minh Nguyen.
Image Compression. Data and information Data is not the same thing as information. Data is the means with which information is expressed. The amount of.
1 CS 502: Computing Methods for Digital Libraries Lecture 9 Conversion to Digital Formats Anne Kenney, Cornell University Library.
Data Compression Michael J. Watts
Compression & Huffman Codes
School of Computing Science Simon Fraser University
Compression Techniques. Digital Compression Concepts ● Compression techniques are used to replace a file with another that is smaller ● Decompression.
SWE 423: Multimedia Systems Chapter 7: Data Compression (1)
Compression JPG compression, Source: Original 10:1 Compression 45:1 Compression.
T.Sharon-A.Frank 1 Multimedia Size of Data Frame.
Compression & Huffman Codes Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
September,2012 File Compression 8/6/ Compiled By:- Solomon W. Demissie.
Data dan Teknologi Multimedia Sesi 08 Nofriyadi Nurdam.
Spring 2015 Mathematics in Management Science Binary Linear Codes Two Examples.
Image Formation and Digital Video
File Formats Different applications (programs) store data in different formats. Applications support some file formats and not others. Open…, Save…, Save.
Joint Picture Experts Group(JPEG)
Using Multimedia on the Web
Media File Formats Jon Ivins, DMU. Text Files n Two types n 1. Plain text (unformatted) u ASCII Character set is most common u 7 bits are used u This.
Lecture 10 Data Compression.
Chapter 2 Source Coding (part 2)
Compression is the reduction in size of data in order to save space or transmission time. And its used just about everywhere. All the images you get on.
 Refers to sampling the gray/color level in the picture at MXN (M number of rows and N number of columns )array of points.  Once points are sampled,
Concepts of Multimedia Processing and Transmission IT 481, Lecture 5 Dennis McCaughey, Ph.D. 19 February, 2007.
1 Analysis of Algorithms Chapter - 08 Data Compression.
Linux+ Guide to Linux Certification Chapter Thirteen Compression, System Back-Up, and Software Installation.
Multimedia Specification Design and Production 2012 / Semester 1 / L3 Lecturer: Dr. Nikos Gazepidis
Introduction to Interactive Media 03: The Nature of Digital Media.
Data Compression. Compression? Compression refers to the ways in which the amount of data needed to store an image or other file can be reduced. This.
Images 01/29/04 Resources: Yale Web Style Guide The GIF Controversy Unisys - lzw.
The LZ family LZ77 LZ78 LZR LZSS LZB LZH – used by zip and unzip
In this lecture, you will learn: 1 Basic ideas of video compression General types of compression methods.
1 Classification of Compression Methods. 2 Data Compression  A means of reducing the size of blocks of data by removing  Unused material: e.g.) silence.
Addressing Image Compression Techniques on current Internet Technologies By: Eduardo J. Moreira & Onyeka Ezenwoye CIS-6931 Term Paper.
Chapter 17 Image Compression 17.1 Introduction Redundant and irrelevant information  “Your wife, Helen, will meet you at Logan Airport in Boston.
Advances in digital image compression techniques Guojun Lu, Computer Communications, Vol. 16, No. 4, Apr, 1993, pp
Data Integrity Issues: How to Proceed? Engineering Node Elizabeth Rye August 3, 2006
Class 9 LBSC 690 Information Technology Multimedia.
Comp 335 File Structures Data Compression. Why Study Data Compression? Conserves storage space Files can be transmitted faster because there are less.
STATISTIC & INFORMATION THEORY (CSNB134) MODULE 11 COMPRESSION.
Sound (analogue signal). time Sound (analogue signal) time.
Chapter Five Making Connections Efficient: Multiplexing and Compression Data Communications and Computer Networks: A Business User’s Approach Eighth Edition.
Introduction to Interactive Media Interactive Media Raw Materials: Digital Data.
Information Systems Design and Development Media Types Computing Science.
Submitted To-: Submitted By-: Mrs.Sushma Rani (HOD) Aashish Kr. Goyal (IT-7th) Deepak Soni (IT-8 th )
Or, how to make it all fit! DIGITAL VIDEO FILES AND COMPRESSION STANDARDS.
Data Compression Michael J. Watts
File Formats Different applications (programs) store data in different formats. Applications support some file formats and not others. Open…, Save…, Save.
GCSE COMPUTER SCIENCE Topic 3 - Data 3.3 Data Storage and Compression.
File Compression 3.3.
Data Compression for PDS4
Compression & Huffman Codes
4k… 4K format was named because it has 4000 pixels horizontal resolution approximately. Meanwhile, standard 1080p and 720p resolutions were named because.
Video Basics.
Data Compression.
Data Compression.
Data Compression CS 147 Minh Nguyen.
Overview What is Multimedia? Characteristics of multimedia
Why Compress? To reduce the volume of data to be transmitted (text, fax, images) To reduce the bandwidth required for transmission and to reduce storage.
short term and long term speed, capacity, compression formats, access
Topic 3: Data Compression.
UNIT IV.
Image Coding and Compression
15 Data Compression Foundations of Computer Science ã Cengage Learning.
Chapter 8 – Compression Aims: Outline the objectives of compression.
15 Data Compression Foundations of Computer Science ã Cengage Learning.
Presentation transcript:

Data Compression for PDS4 Lisa Gaddis, Sue LaVoie, Jeff Anderson, Elizabeth Rye PDS Imaging Node March 26, 2010

Imaging NodeData Compression 2 Syntax Data Compression –Encodes information using fewer bits –Reduces consumption of expensive resources Data storage and/or transmission bandwidth –Requires decompression –Trade-offs degree of compression amount of ‘distortion’ introduced computational resources required for decompression Image Compression –Application of data compression to digital images –Reduces redundancy in images to improve efficiency of storage and transmission –Lossless and lossy methods –Preserve image quality at a given bit- or compression-rate File Compression –Reduces redundancy at the file level –Many available tools ZIP GZIP BZIP2

Imaging NodeData Compression 3 Why image compression? Image compression for data providers and archivists –NASA missions deliver significant numbers of large image files –Need to support and/or reduce storage costs and data transmission times of images –Promotes exchange between different users and systems –Athough falling in cost, storage is expensive for many TB of data and multiple copies FY10: ~$750/TB for RAID storage with network infrastructure

Imaging NodeData Compression 4 Image Compression Lossless compression –Exploits data redundancy –Image can be recovered exactly ‘Run-length encoding’ makes use of redundant patterns or ‘runs’ ‘LZW (Lempel Ziv Welch) encoding’ also address strings of characters; builds up a table of strings and their corresponding codes ‘Huffman coding’ uses a binary encoding tree to represent commonly occurring values in few bits and less frequently occurring values in more bits –Best for documents, computer programs, line drawings, etc. –JPEG2000 has a lossless option, approved for use by PDS Lossy compression –Exploits data redundancy and ‘irrelevant’ data –Image data are not recovered exactly JPEG JPEG2000 (lossy) –Best for digital images, audio, video –Not approved for PDS archive data Exceptions: Browse and some EDR images (e.g., Clementine UVVIS and NIR) are lossy JPEG images (5.5 ave. compression rate)

Imaging NodeData Compression 5 MRO and LRO images Not your typical images –MESSENGER MDIS, Viking Orbiter, Galileo SSI, etc. Framing cameras 800 samples x 800 lines to 1024 samples x 1024 lines Roughly one megabyte (MB) per observation PDS Imaging Node combined archive requirements for all missions other than LRO and MRO is <25 TB –MRO/HiRISE, LRO/LROC Line-scan cameras 10,000-20,000 samples x 50, ,000 lines Roughly 500 to 2,000 MB per observation Combined expected archive total for MRO and LRO is 500 TB 20X larger than sum total of all other Imaging Node holdings

Imaging NodeData Compression 6 Image Compression for HiRISE RDRs Why image compression was needed –Enormous volume of HiRISE archive, 1 yr EDR – 12,100 Gb (~1.5 TB) RDR – 92,500 Gb (11.3 TB) –Very large Standard Data Products EDR (2048 X 64,000, 16-bit) = 262 MB RDR (40,000 x 64,000, after reprojection, 16-bit) = ~500 to 1000 MB –Advantages for delivery of RDR data in JPEG2000 format Losslessly recompressed format Wavelet compression greatly improves speed of web access Fast browse, zoom, pan capabilities for handling large files –Volume projections EDR DVD volumes: 321 (losslessly recompressed) vs 482 (uncompressed) (1.5 compression ratio) RDR DVD volumes: 2400 (losslessly compressed) vs 7300 (uncompressed) (assuming 3.0 compression ratio)

Imaging NodeData Compression 7 HiRISE Example JPEG2000 image compression applied to map-projected RDR images only lots of null pixels Nulls are highly compressed as a result of the lossless compression using JPEG2000 Projected ~3:1 compression ratios Achieved 15:1 in recent tests

Imaging NodeData Compression 8 Past Experience Problems with compression –Voyager, Viking, and MGS-MOC PDS archives contain losslessly compressed data –Decompression algorithms (e.g., in ISIS) break due to New compilers New operating systems Changes in hardware architecture (32-bit vs 64-bit) –JPEG2000 compressed HiRISE RDR images are supported by ISIS3 But, when JPEG2000 format reaches end-of-life, software maintenance to read this format will be much more difficult than the existing Voyager/Viking/MGS-MOC algorithms A proliferation of image compression formats in PDS would be a problem for long-term archiving and usability of the images

Imaging NodeData Compression 9 Data Storage Costs: MRO & LRO Expected PDS storage requirements for the MRO nominal mission are 75TB –High capacity RAID storage & network infrastructure costs ~$750 per TB –The hardware cost to store a single copy of the MRO data is ~$56K Only one copy of the three required by PDS Does not include data from an extended mission Archive includes JPEG2000 compressed images LRO archive volume is projected to be ~400 TB –Hardware cost for one copy is ~$300K –Same caveats as above apply

Imaging NodeData Compression 10 PDS3 Compressed Image Formats Clem-JPEG (not in PDS Standards Reference) Huffman First Difference (“) JPEG2000 –Improved compression efficiency (vs. JPEG) –Highly scalable embedded data streams –Progressive lossy to lossless compression within a single data stream –Arbitrarily crop images in the compressed domain –Selectively enhance quality of spatial “regions of interest” –Support for very large images Used for HiRISE & LROC RDRs Previous Pixel (“) Run Length (“) Zip, gzip = GNU zip –Widely used open-source tool –Runs on a variety of common computer platforms –Available since 1992

Imaging NodeData Compression 11 Possible Solution for PDS4 Allow File Compression –Use standard, non-patented algorithms (e.g., Lempel-Ziv 77, Huffman coding) –Use stable, open-source, well-maintained software (e.g., gzip) Tests using gzip, HiRISE data –RDRs HiRISE RDR, JPEG2000 = 454 MB Uncompressed, converted to raw format = 6.6 GB (15x larger) Compressed using gzip = 1.1 GB (2.5x larger) –EDRs Not compressed, typical file size = 250 MB gzipped versions = 100 MB (2.5x smaller) –Overall the HiRISE archive would be 5% smaller gzip EDRs Convert RDRs to raw, then gzip

Imaging NodeData Compression 12 Recommendation Allow file-based compression (such as gzip, bzip2) in PDS4 –Stable, free, widely used open-source software tool –Works on a variety of common computer platforms Macs, PCs, Solaris, MSDOS, VAX, etc. –Maintained by open-source community Consistent with PDS3 history, PDS4 plans for simplification Reduces storage costs Improves data transfer rates over internet Supports management and delivery of high-volume data sets for providers and users

Imaging NodeData Compression 13 Policy Questions Do we permit compression at all in the PDS4 archive? If so: –Do we want a mixture of compressed and uncompressed data? One copy is uncompressed, two are compressed –Do we distinguish between EDRs and RDRs and other derived products? –Do we distinguish between frequently accessed data and those offline and/or in ‘deep archive’ storage? Store deep archive data in uncompressed form or use one approved compression format (e.g., gzip) Permit nodes to use and maintain other compression methods as needed for one or more copies Whatever we decide, do we require older, compressed data to be ‘restored’ to meet requirements of the new compression policy?