Page 1 CS Department Parallel Design of JPEG2000 Image Compression Xiuzhen Huang CS Department UC Santa Barbara April 30th, 2003.

Slides:



Advertisements
Similar presentations
Distributed Systems CS
Advertisements

CA 714CA Midterm Review. C5 Cache Optimization Reduce miss penalty –Hardware and software Reduce miss rate –Hardware and software Reduce hit time –Hardware.
Image Data Representations and Standards
Render Cache John Tran CS851 - Interactive Ray Tracing February 5, 2003.
Taxanomy of parallel machines. Taxonomy of parallel machines Memory – Shared mem. – Distributed mem. Control – SIMD – MIMD.
Error detection and concealment for Multimedia Communications Senior Design Fall 06 and Spring 07.
SWE 423: Multimedia Systems
Binary Image Compression Using Efficient Partitioning into Rectangular Regions IEEE Transactions on Communications Sherif A.Mohamed and Moustafa M. Fahmy.
12a.1 Introduction to Parallel Computing UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008.
Parallel Computing Overview CS 524 – High-Performance Computing.
Wavelet Transform A very brief look.
Computer Science 335 Data Compression.
Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR.
Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.
High Performance Communication using MPJ Express 1 Presented by Jawad Manzoor National University of Sciences and Technology, Pakistan 29 June 2015.
T.Sharon-A.Frank 1 Multimedia Image Compression 2 T.Sharon-A.Frank Coding Techniques – Hybrid.
CPE 731 Advanced Computer Architecture Multiprocessor Introduction
1 Presenter: Chien-Chih Chen Proceedings of the 2002 workshop on Memory system performance.
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
On Error Preserving Encryption Algorithms for Wireless Video Transmission Ali Saman Tosun and Wu-Chi Feng The Ohio State University Department of Computer.
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
Coding techniques for digital cinema Andreja Samčović University of Belgrade Faculty of Transport and Traffic Engineering.
JPEG C OMPRESSION A LGORITHM I N CUDA Group Members: Pranit Patel Manisha Tatikonda Jeff Wong Jarek Marczewski Date: April 14, 2009.
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
Parallel Edge Detection Daniel Dobkin Asaf Nitzan.
CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 8 – JPEG Compression (Part 3) Klara Nahrstedt Spring 2012.
Binary Image Compression via Monochromatic Pattern Substitution: A Sequential Speed-Up Luigi Cinque and Sergio De Agostino Computer Science Department.
Klara Nahrstedt Spring 2011
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Follow the Data Data (and information) move from place to place in computer systems and networks. As it moves it changes form frequently. This story describes.
AN EXTENDED OPENMP TARGETING ON THE HYBRID ARCHITECTURE OF SMP-CLUSTER Author : Y. Zhao 、 C. Hu 、 S. Wang 、 S. Zhang Source : Proceedings of the 2nd IASTED.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Tekin Bicer Gagan Agrawal 1.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.
GU Junli SUN Yihe 1.  Introduction & Related work  Parallel encoder implementation  Test results and Analysis  Conclusions 2.
Introduction to Parallel Rendering Jian Huang, CS 594, Spring 2002.
Case Study in Computational Science & Engineering - Lecture 2 1 Parallel Architecture Models Shared Memory –Dual/Quad Pentium, Cray T90, IBM Power3 Node.
Parallel Algorithms Patrick Cozzi University of Pennsylvania CIS Spring 2012.
Parallel Algorithms Patrick Cozzi University of Pennsylvania CIS Fall 2013.
An Efficient Implementation of Scalable Architecture for Discrete Wavelet Transform On FPGA Michael GUARISCO, Xun ZHANG, Hassan RABAH and Serge WEBER Nancy.
Pipelined and Parallel Computing Data Dependency Analysis for 1 Hongtao Du AICIP Research Mar 9, 2006.
Advances in digital image compression techniques Guojun Lu, Computer Communications, Vol. 16, No. 4, Apr, 1993, pp
ATmospheric, Meteorological, and Environmental Technologies RAMS Parallel Processing Techniques.
CSCI-100 Introduction to Computing Hardware Part II.
Image File Formats. What is an Image File Format? Image file formats are standard way of organizing and storing of image files. Image files are composed.
Memory Hierarchy: Terminology Hit: data appears in some block in the upper level (example: Block X)  Hit Rate : the fraction of memory access found in.
OpenMP for Networks of SMPs Y. Charlie Hu, Honghui Lu, Alan L. Cox, Willy Zwaenepoel ECE1747 – Parallel Programming Vicky Tsang.
Image Processing A Study in Pixel Averaging Building a Resolution Pyramid With Parallel Computing Denise Runnels and Farnaz Zand.
Computer Organization CS224 Fall 2012 Lesson 52. Introduction  Goal: connecting multiple computers to get higher performance l Multiprocessors l Scalability,
3-D WAVELET BASED VIDEO CODER By Nazia Assad Vyshali S.Kumar Supervisor Dr. Rajeev Srivastava.
Parallel Computing Presented by Justin Reschke
VLSI Design of 2-D Discrete Wavelet Transform for Area-Efficient and High- Speed Image Computing - End Presentation Presentor: Eyal Vakrat Instructor:
SIMD Implementation of Discrete Wavelet Transform Jake Adriaens Diana Palsetia.
CS61C L20 Thread Level Parallelism II (1) Garcia, Spring 2013 © UCB Senior Lecturer SOE Dan Garcia inst.eecs.berkeley.edu/~cs61c.
Background Computer System Architectures Computer System Software.
Image Processing Architecture, © Oleh TretiakPage 1Lecture 5 ECEC 453 Image Processing Architecture Lecture 5, 1/22/2004 Rate-Distortion Theory,
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
JPEG Compression What is JPEG? Motivation
Parallelizing an Image Compression Toolbox
Parallel Programming By J. H. Wang May 2, 2017.
Steven Ge, Xinmin Tian, and Yen-Kuang Chen
JPEG.
Department of Computer Science University of California, Santa Barbara
Introduction to Multiprocessors
Wavelet “Block-Processing” for Reduced Memory Transfers
Image Coding and Compression
Follow the Data Data (and information) move from place to place in computer systems and networks. As it moves it changes form frequently. This story.
Chapter 4 Multiprocessors
Implementation of a De-blocking Filter and Optimization in PLX
Presentation transcript:

Page 1 CS Department Parallel Design of JPEG2000 Image Compression Xiuzhen Huang CS Department UC Santa Barbara April 30th, 2003

Page 2 CS Department Outline Introduction to image compression JPEG2000 compression scheme Parallel implementation of JPEG2000 – On distributed-memory multiprocessors – On shared-memory multiprocessors Conclusion

Page 3 CS Department Introduction to Image Compression Why do we need image compression? 1280 pixels 800 pixels 1280  800  3 (RGB) = 3 M bytes File size of a small digital photo without compression: To speedup the image transmission over Internet and reduce image storage space, we need compression

Page 4 CS Department Introduction to Image Compression Original Picture 3 M bytes JPEG2000 Compression 19 K bytes Compression Ratio: >150 times ! No noticeable difference in picture quality

Page 5 CS Department JPEG2000 International Standard JPEG2000: the new international standard for image compression, is much more efficient than the old JPEG international standard. For the same compression ratio / bit rate / file size, the JPEG2000 picture has much better quality. JPEGJPEG2000 Original Picture Compression ratio : 50:1 Strong blockiness

Page 6 CS Department JPEG2000 International Standard JPEG2000 has a much Higher computational complexity than JPEG, especially for larger pictures. Need parallel implementation to reduce compression time.

Page 7 CS Department JPEG2000 Compression Scheme Wavelet Transform Input Blockwise Partition Coding of each block Binary Compressed data Major steps of JPEG2000 image compression Wavelet transform uses most of the image compression time (>80%) parallel implementation should focus on wavelet transform

Page 8 CS Department JPEG2000 Compression Scheme Brief Introduction to Wavelet Transform Step 1: Horizontal wavelet transform of an image for each row do 1-D wavelet transform; end What is 1-D wavelet transform ?

Page 9 CS Department A simple example: 1-D Haar wavelet transform One array of image data [1, 1] [1, -1] 2 2 First half of the output Second half of the output JPEG2000 Compression Scheme Horizontal Wavelet Transform of Each Row Average of neighboring pixels Difference of neighboring pixels Low- Frequency coefficients High- Frequency coefficients Low High Low-pass filter high-pass filter Down-sample by 2

Page 10 CS Department JPEG2000 Compression Scheme Wavelet Transform Step 2: Vertical transform of image for each column of the new image do 1-D wavelet transform; end

Page 11 CS Department Horizontal Wavelet Transform of Each Row Low High Vertical Wavelet Transform of Each Column Low High Low High JPEG2000 Compression Scheme

Page 12 CS Department Parallel Design of JPEG2000 Compression Two Parallel Computing Architectures Shared-Memory Multiprocessors Has a single address space. Allow processors to communicate through variables stored in a shared address space Programming tool: openMP Distributed-Memory Multiprocessors Each processor has its own memory module Processors communicate to each other over a high-speed network Programming tool: MPI (Message Passing Interface)

Page 13 CS Department Parallel Implementation of JPEG2000 Compression on Distributed-Memory Multiprocessors

Page 14 CS Department Parallel Design of JPEG2000 Compression-DMP Traditional Approach The image is first divided into n regions on rows. Each processor performs 1-D horizontal wavelet transform Then, the new image is divided into n regions on columns. Each processor performs 1-D vertical wavelet transform. This approach requires intensive data transmission among processors, has very high network communication cost.

Page 15 CS Department Parallel Design of JPEG2000 Compression-DMP Tiling Approach JPEG2000 international standard supports tile- based image compression. A large image is divided into several tiles and each image tile is compressed independently. P1P2P3 P5 P8 P7 P4P6 P9

Page 16 CS Department Choose MPI for parallel implementation of JPEG2000, because the JPEG2000 software is written in C, which supported by MPI. Basic framework is: Parallel Design of JPEG2000 Compression-DMP

Page 17 CS Department Number of processors Compression Time (Sec) The picture shows the compression time using different tile size. For each tile size,processor number increases,compression time is reduced.The small tile need larger computation overhead. Size: 32 Size: 256 Image: 512x512 Parallel Design of JPEG2000 Compression-DMP

Page 18 CS Department There is a jump between one process and two processes. When there is only one process, JPEG2000 compression is sequential If there are more than two processes involved in the program, Process 1 is responsible for collecting data, while the others are responsible for processing different tiles and sending processed data back to the Process 1. Note Parallel Design of JPEG2000 Compression-DMP

Page 19 CS Department Parallel Implementation of JPEG2000 Compression on Shared-Memory Multiprocessors

Page 20 CS Department Parallel Design of JPEG2000 Compression-SMP A problem with tile-based approach Images compressed by JPEG, JPEG2000, and JPEG2000 with relatively small tiles. Each tile is compressed independently, which causes discontinuity across tile edges, also called blockiness.

Page 21 CS Department Parallel Design of JPEG2000 Compression-SMP Another parallel architecture is shared-memory multiprocessors. The excellent price-performance ratio of Intel-based SMPs make such systems very popular in many data processing applications. There are also many available programming tools for shared memory processor, such as openMP and Java Threads.

Page 22 CS Department In SMP, we do not need worry about data communication over network, because the data is in the shared memory. So there is no need for tile partitioning. Therefore, we can use the traditional data partitioning approach for horizontal and vertical wavelet transforms. Parallel Design of JPEG2000 Compression-SMP

Page 23 CS Department Parallel Design of JPEG2000 Compression-SMP JPEG2000 image compression is implemented on a 4- processor SMP system using direct openMP. The speedup in wavelet transform is only about 1.6 times, which is supposed to be near 4 times. Why?

Page 24 CS Department Parallel Design of JPEG2000 Compression-SMP It is found that the vertical wavelet transform requires more than 10 times the horizontal transform. But we know that both vertical and horizontal transforms have the same number of operations. verticalhorizontal

Page 25 CS Department Cache Miss Problem Parallel Design of JPEG2000 Compression-SMP In computer memory, the image data is stored line by line in a raster-scan order (from left to right, from top to bottom). Each continuous block of image data is brought into the cache from memory for wavelet transform. In horizontal wavelet transform, as the filter window is moving, the data of next transform is often available, few cache miss.

Page 26 CS Department Cache Miss Problem Parallel Design of JPEG2000 Compression-SMP In vertical wavelet transform, the filtering is done in the vertical direction, however, the data is brought into cache in a horizontal way. So, there are very frequent cache miss. data filtering Solution Do vertical transform of several columns at the same time to make full use of the existing data in the cache., instead of column by column Significantly reduces cache miss.

Page 27 CS Department Original Vertical transform Parallel Design of JPEG2000 Compression-SMP Improved Vertical transform The vertical transform is speed up by about 10 times.

Page 28 CS Department Parallel Design of JPEG2000 Compression-SMP Using the improved vertical wavelet transform, the overall speedup times of wavelet transform is now close to the number of processors.

Page 29 CS Department Give a brief review JPEG2000 image compression. Discussed two approaches for parallel implementation of JPEG2000 image compression: distributed memory multiprocessor and shared memory multiprocessor. Conclusion Question?