JPEG C OMPRESSION A LGORITHM I N CUDA Group Members: Pranit Patel Manisha Tatikonda Jeff Wong Jarek Marczewski Date: April 14, 2009.

Slides:

Advertisements

Similar presentations

T.Sharon-A.Frank 1 Multimedia Compression Basics.

Advertisements

JPEG DCT Quantization FDCT of 8x8 blocks.

Data Compression CS 147 Minh Nguyen.

Time-Frequency Analysis Analyzing sounds as a sequence of frames

Chapter 7 End-to-End Data

SWE 423: Multimedia Systems

Chapter 5 Making Connections Efficient: Multiplexing and Compression

School of Computing Science Simon Fraser University

Computer Graphics Recitation 6. 2 Motivation – Image compression What linear combination of 8x8 basis signals produces an 8x8 block in the image?

Department of Computer Engineering University of California at Santa Cruz Data Compression (3) Hai Tao.

Computer Science 335 Data Compression.

Image (and Video) Coding and Processing Lecture: DCT Compression and JPEG Wade Trappe Again: Thanks to Min Wu for allowing me to borrow many of her slides.

JPEG Still Image Data Compression Standard

ARINDAM GOSWAMI ERIC HUNEKE MERT USTUN ADVANCED EMBEDDED SYSTEMS ARCHITECTURE SPRING 2011 HW/SW Implementation of JPEG Decoder.

CMPT 365 Multimedia Systems

T.Sharon-A.Frank 1 Multimedia Image Compression 2 T.Sharon-A.Frank Coding Techniques – Hybrid.

CS430 © 2006 Ray S. Babcock Lossy Compression Examples JPEG MPEG JPEG MPEG.

5. 1 JPEG “ JPEG ” is Joint Photographic Experts Group. compresses pictures which don't have sharp changes e.g. landscape pictures. May lose some of the.

Roger Cheng (JPEG slides courtesy of Brian Bailey) Spring 2007

1 JPEG Compression CSC361/661 Burg/Wong. 2 Fact about JPEG Compression JPEG stands for Joint Photographic Experts Group JPEG compression is used with.jpg.

Image Compression JPEG. Fact about JPEG Compression JPEG stands for Joint Photographic Experts Group JPEG compression is used with.jpg and can be embedded.

Image and Video Compression

Image Compression - JPEG. Video Compression MPEG –Audio compression Lossy / perceptually lossless / lossless 3 layers Models based on speech generation.

Still Image Conpression JPEG & JPEG2000 Yu-Wei Chang /18.

Trevor McCasland Arch Kelley.  Goal: reduce the size of stored files and data while retaining all necessary perceptual information  Used to create an.

CM613 Multimedia storage and retrieval Lecture: Lossy Compression Slide 1 CM613 Multimedia storage and retrieval Lossy Compression D.Miller.

Chapter 2 Source Coding (part 2)

Compression is the reduction in size of data in order to save space or transmission time. And its used just about everywhere. All the images you get on.

Introduction to JPEG Alireza Shafaei ( ) Fall 2005.

CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 8 – JPEG Compression (Part 3) Klara Nahrstedt Spring 2012.

Lab #5-6 Follow-Up: More Python; Images Images ● A signal (e.g. sound, temperature infrared sensor reading) is a single (one- dimensional) quantity that.

Klara Nahrstedt Spring 2011

D ATA C OMMUNICATIONS Compression Techniques. D ATA C OMPRESSION Whether data, fax, video, audio, etc., compression can work wonders Compression can be.

JPEG. The JPEG Standard JPEG is an image compression standard which was accepted as an international standard in  Developed by the Joint Photographic.

Image Processing and Computer Vision: 91. Image and Video Coding Compressing data to a smaller volume without losing (too much) information.

Indiana University Purdue University Fort Wayne Hongli Luo

JPEG CIS 658 Fall 2005.

Chapter 7 – End-to-End Data Two main topics Presentation formatting Compression We will go over the main issues in presentation formatting, but not much.

Addressing Image Compression Techniques on current Internet Technologies By: Eduardo J. Moreira & Onyeka Ezenwoye CIS-6931 Term Paper.

Understanding JPEG MIT-CETI Xi’an ‘99 Lecture 10 Ben Walter, Lan Chen, Wei Hu.

1 Image Formats. 2 Color representation An image = a collection of picture elements (pixels) Each pixel has a “color” Different types of pixels Binary.

Copyright © 2003 Texas Instruments. All rights reserved. DSP C5000 Chapter 18 Image Compression and Hardware Extensions.

Digital Image Processing Lecture 21: Lossy Compression Prof. Charlene Tsai.

Data compression. lossless – looking for unicolor areas or repeating patterns –Run length encoding –Dictionary compressions Lossy – reduction of colors.

The task of compression consists of two components, an encoding algorithm that takes a file and generates a “compressed” representation (hopefully with.

STATISTIC & INFORMATION THEORY (CSNB134) MODULE 11 COMPRESSION.

Chapter 8 Lossy Compression Algorithms. Fundamentals of Multimedia, Chapter Introduction Lossless compression algorithms do not deliver compression.

Introduction to JPEG m Akram Ben Ahmed

JPEG. Introduction JPEG (Joint Photographic Experts Group) Basic Concept Data compression is performed in the frequency domain. Low frequency components.

Implementing JPEG Encoder for FPGA ECE 734 PROJECT Deepak Agarwal.

By Dr. Hadi AL Saadi Lossy Compression. Source coding is based on changing of the original image content. Also called semantic-based coding High compression.

IS502:M ULTIMEDIA D ESIGN FOR I NFORMATION S YSTEM M ULTIMEDIA OF D ATA C OMPRESSION Presenter Name: Mahmood A.Moneim Supervised By: Prof. Hesham A.Hefny.

Yingfang Zhang Department of Computer Science UCCS

JPEG Compression What is JPEG? Motivation

Data Compression.

Digital Image Processing Lecture 21: Lossy Compression May 18, 2005

Digital Image Processing Lecture 21: Lossy Compression

JPEG Image Coding Standard

Data Compression.

Data Compression CS 147 Minh Nguyen.

Software Equipment Survey

CMPT 365 Multimedia Systems

JPEG Still Image Data Compression Standard

The JPEG Standard.

Image Coding and Compression

Presentation transcript:

JPEG C OMPRESSION A LGORITHM I N CUDA Group Members: Pranit Patel Manisha Tatikonda Jeff Wong Jarek Marczewski Date: April 14, 2009

O UTLINE  Motivation  JPEG Algorithm  Design Approach in CUDA  Benchmark  Conclusion

M OTIVATION Growth of Digital Imaging Applications Effective algorithm for Video Compression Applications Loss of Data Information must be minimal JPEG is a lossy compression algorithm that reduces the file size without affecting quality of image It perceive the small changes in brightness more readily than we do small change in color

JPEG A LGORITHM Step 1: Divide sample image into 8x8 blocks Step 2: Apply DCT DCT is applied to each block It replaces actual color of block to average matrix which is analyze for entire matrix This step does not compress the file In general: Simple color space model: [R,G,B] per pixel JPEG uses [Y, Cb, Cr] Model Y = Brightness Cb = Color blueness Cr = Color redness

JPEG A LGORITHM Step 3: Quantization First Compression Step Each DCT coefficient is divided by its corresponding constant in Quantization table and rounded off to nearest integer The result of quantizing the DCT coefficients is that smaller, unimportant coefficients will be replaced by zeros and larger coefficients will lose precision. It is this rounding-off that causes a loss in image quality. Step 4: Apply Huffman Encoding Apply Huffman encoding to Quantized DCT Coefficient to reduce the image size further Step 5: Decoder Decoder of JPEG consist of: Huffman Decoding De-Quantization IDCT

DCT and IDCT

Discrete Cosine Transform Separable transform algorithm (1D and then the 2D): 2D DCT is performed in a 2 pass approach one for horizontal direction and one for vertical direction DCT 1 st pass2 nd pass

Discrete Cosine Transform Translate DCT into matrix cross multiplication Pre-calculate Cosine values are stored as constant array Inverse DCT are calculated in the same way only with P 00 P 01 P 02 P 03 P 04 P 05 P 06 P 07 P 10 P 11 P 12 P 13 P 14 P 15 P 16 P 17 P 20 P 21 P 22 P 23 P 24 P 25 P 26 P 27 P 30 P 31 P 32 P 33 P 34 P 35 P 36 P 37 P 40 P 41 P 42 P 43 P 44 P 45 P 46 P 47 P 50 P 51 P 52 P 53 P 54 P 55 P 56 P 57 P 60 P 61 P 62 P 63 P 64 P 65 P 66 P 67 P 70 P 71 P 72 P 73 P 74 P 75 P 76 P 77 C 00 C 01 C 02 C 03 C 04 C 05 C 06 C 07 C 10 C 11 C 12 C 13 C 14 C 15 C 16 C 17 C 20 C 21 C 22 C 23 C 24 C 25 C 26 C 27 C 30 C 31 C 32 C 33 C 34 C 35 C 36 C 37 C 40 C 41 C 42 C 43 C 44 C 45 C 46 C 47 C 50 C 51 C 52 C 53 C 54 C 55 C 56 C 57 C 60 C 61 C 62 C 63 C 64 C 65 C 66 C 67 C 70 C 71 C 72 C 73 C 74 C 75 C 76 C 77 x

DCT CUDA Implementation Each thread within each block has the same number of calculation Each thread multiply and accumulated eight elements P 00 P 01 P 02 P 03 P 04 P 05 P 06 P 07 P 10 P 11 P 12 P 13 P 14 P 15 P 16 P 17 P 20 P 21 P 22 P 23 P 24 P 25 P 26 P 27 P 30 P 31 P 32 P 33 P 34 P 35 P 36 P 37 P 40 P 41 P 42 P 43 P 44 P 45 P 46 P 47 P 50 P 51 P 52 P 53 P 54 P 55 P 56 P 57 P 60 P 61 P 62 P 63 P 64 P 65 P 66 P 67 P 70 P 71 P 72 P 73 P 74 P 75 P 76 P 77 C 00 C 01 C 02 C 03 C 04 C 05 C 06 C 07 C 10 C 11 C 12 C 13 C 14 C 15 C 16 C 17 C 20 C 21 C 22 C 23 C 24 C 25 C 26 C 27 C 30 C 31 C 32 C 33 C 34 C 35 C 36 C 37 C 40 C 41 C 42 C 43 C 44 C 45 C 46 C 47 C 50 C 51 C 52 C 53 C 54 C 55 C 56 C 57 C 60 C 61 C 62 C 63 C 64 C 65 C 66 C 67 C 70 C 71 C 72 C 73 C 74 C 75 C 76 C 77 x Thread.x = 2 Thread.y = 3

DCT Grid and Block Two methods and approach Each thread block process 1 macro blocks (64 threads) Each thread block process 8 macro blocks (512 threads)

DCT and IDCT GPU results 512x x x2048

DCT Results

IDCT Results

Quantization

Break the image into 8x8 blocks 8x8 Quantized matrix to be applied to the image. Every content of the image is multiplied by the Quantized value and divided again to round to the nearest integer value.

Quantization CUDA Programing Method 1 – Exact implementation as in CPU Method 2 – Shared memory to copy 8x8 image Method 3 – Load divided values into shared memory.

Quantization CUDA Results

Quantization CPU vs GPU Results

Tabulated Results for Quantization Method 2 and Method 3 have similar performance on small image sizes Method 3 might perform better on images bigger that 2048x2048 Quantization is ~x70 faster for the first method and much more as resolution increases. Quantization is ~ x180 faster for method2 and 3 and much more as resolution increases. Method 1Method 2Method 3CPUxCPU - 1xCPU - 2xCPU x x x

Huffman Encode/Decode

Huffman Encoding Basics Utilizes frequency of each symbol Lossless compression Uses VARIABLE length code for each symbol IMAGE

Challenges Encoding is a very very very serial process Variable length of symbols is a problem Encoding: don’t know when symbols needs to be written unless all other symbols are encoded. Decoding: don’t know where symbols start

ENCODING

DECODING Decoding: don’t know where symbols start Need redundant calculation Uses decoding table, rather then tree Decode then shift by n bits. STEP 1: divide bitstream into overlapping segments. 65 bytes. Run 8 threads on each segment with different starting positions

DECODING STEP 2: Determine which threads are valid, throw away others

DECODING - challenges Each segment takes fixed number of encoded bits, but it results in variable length decoded output 64 bit can result in 64 bytes of output. Memory explosion Memory address for input do not advance in fixed pattern as output address Memory collisions Decoding table doesn’t fit into one address line Combining threads is serial NOTE: to simplify the algorithm, max symbol length was assumed to be 8 bits. (it didn’t help much)

Huffman Results Encoding Step one is very fast: ~100 speed up Step two – algorithm is wrong – no results Decoding 3 times slower then classic CPU method. Using shared memory for encoding table resolved only some conflicts (5 x slower -> 4 x slower) Conflicts on inputs bitstream Either conflicts on input or output data Moving 65 byte chunks to shared memory and ‘sharing’ it between 8 threads didn’t help much (4 x slower -> 3 x slower) ENCODING should be left to CPU

Conclusion & Results

Results CPU 512x512 - CPU1024x768 - CPU2048x2048 -CPU DCT Quantization IDCT GPU 512x512 -GPU1024x768 -GPU2048x2048 -GPU DCT Quantization IDCT Performance Gain 512x x x2048 DCT Quantization IDCT

Performance Gain DCT and IDCT are the major consumers of the computation time. Computation increases with the increase with resolution. Total Processing time for 2k image is 5.224ms and for the CPU is => speed up of 36x

GPU Performance DCT and IDCT still take up the major computation cycles but reduced by a x100 magnitude. 2K resolution processing time is 7ms using the GPU as compared to ~900ms with the CPU.

Conclusion CUDA implementation for transform and quantization is much faster than CPU (x36 faster) Huffman Algorithm does not parallelize well and final results show x3 slower than CPU. GPU architecture is well optimized for image and video related processing. High Performance Applications - Interframe, HD resolution/Realtime video compression/decompression.

Conclusion – Image Quality Resolution – 1024x768 CPU GPU

Conclusion – Image Quality Resolution – 2048x2048

Conclusion – Image Quality Resolution – 512x512 CPU GPU