Binary Image Compression via Monochromatic Pattern Substitution: A Sequential Speed-Up Luigi Cinque and Sergio De Agostino Computer Science Department.

Slides:



Advertisements
Similar presentations
Topics covered: CPU Architecture CSE 243: Introduction to Computer Architecture and Hardware/Software Interface.
Advertisements

Analysis of Algorithms CS Data Structures Section 2.6.
Appendix A — 1 FIGURE A.2.2 Contemporary PCs with Intel and AMD CPUs. See Chapter 6 for an explanation of the components and interconnects in this figure.
Parallelized variational EM for Latent Dirichlet Allocation: An experimental evaluation of speed and scalability Ramesh Nallapati, William Cohen and John.
Advanced Topics in Algorithms and Data Structures Page 1 Parallel merging through partitioning The partitioning strategy consists of: Breaking up the given.
Binary Image Compression Using Efficient Partitioning into Rectangular Regions IEEE Transactions on Communications Sherif A.Mohamed and Moustafa M. Fahmy.
Video Coding with Linear Compensation (VCLC) Arif Mahmood, Zartash Afzal Uzmi, Sohaib A Khan Department of Computer.
Department of Computer Engineering University of California at Santa Cruz Video Compression Hai Tao.
Parallel Prefix Sum (Scan) GPU Graphics Gary J. Katz University of Pennsylvania CIS 665 Adapted from articles taken from GPU Gems III.
Computer Science 335 Data Compression.
Even faster point set pattern matching in 3-d Niagara University and SUNY - Buffalo Laurence Boxer Research partially supported by a.
SPIE Vision Geometry - July '99 Even faster point set pattern matching in 3-d Niagara University and SUNY - Buffalo Laurence Boxer Research.
Page 1 CS Department Parallel Design of JPEG2000 Image Compression Xiuzhen Huang CS Department UC Santa Barbara April 30th, 2003.
1 Efficient Multithreading Implementation of H.264 Encoder on Intel Hyper- Threading Architectures Steven Ge, Xinmin Tian, and Yen-Kuang Chen IEEE Pacific-Rim.
Lossless Compression - I Hao Jiang Computer Science Department Sept. 13, 2007.
CS430 © 2006 Ray S. Babcock Lossy Compression Examples JPEG MPEG JPEG MPEG.
Adnan Ozsoy & Martin Swany DAMSL - Distributed and MetaSystems Lab Department of Computer Information and Science University of Delaware September 2011.
K-Ary Search on Modern Processors Fakultät Informatik, Institut Systemarchitektur, Professur Datenbanken Benjamin Schlegel, Rainer Gemulla, Wolfgang Lehner.
On Error Preserving Encryption Algorithms for Wireless Video Transmission Ali Saman Tosun and Wu-Chi Feng The Ohio State University Department of Computer.
A Survey of Parallel Tree- based Methods on Option Pricing PRESENTER: LI,XINYING.
Improved results for a memory allocation problem Rob van Stee University of Karlsruhe Germany Leah Epstein University of Haifa Israel WADS 2007 WAOA 2007.
18.337: Image Median Filter Rafael Palacios Aeronautics and Astronautics department. Visiting professor (IIT-Institute for Research in Technology, University.
Computer Science 320 Measuring Speedup. What Is Running Time? T(N, K) says that the running time T is a function of the problem size N and the number.
Parallel Algorithms Sorting and more. Keep hardware in mind When considering ‘parallel’ algorithms, – We have to have an understanding of the hardware.
IMAGE COMPRESSION USING BTC Presented By: Akash Agrawal Guided By: Prof.R.Welekar.
Concepts of Multimedia Processing and Transmission IT 481, Lecture 5 Dennis McCaughey, Ph.D. 19 February, 2007.
Tinoosh Mohsenin and Bevan M. Baas VLSI Computation Lab, ECE Department University of California, Davis Split-Row: A Reduced Complexity, High Throughput.
Intelligent Database Systems Lab 1 Advisor : Dr. Hsu Graduate : Jian-Lin Kuo Author : Silvia Nittel Kelvin T.Leung Amy Braverman 國立雲林科技大學 National Yunlin.
 2005 Pearson Education, Inc. All rights reserved Searching and Sorting.
 Pearson Education, Inc. All rights reserved Searching and Sorting.
The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization Jia Wang, Shiyan Hu Department of Electrical and Computer Engineering.
Towards a Billion Routing Lookups per Second in Software  Author: Marko Zec, Luigi, Rizzo Miljenko Mikuc  Publisher: SIGCOMM Computer Communication Review,
Complexity of Algorithms
Distributed computing using Projective Geometry: Decoding of Error correcting codes Nachiket Gajare, Hrishikesh Sharma and Prof. Sachin Patkar IIT Bombay.
Parallel Algorithms Patrick Cozzi University of Pennsylvania CIS Spring 2012.
CSC 211 Data Structures Lecture 13
A Parallel Implementation of MSER detection GPGPU Final Project Lin Cao.
CS 361 – Chapters 8-9 Sorting algorithms –Selection, insertion, bubble, “swap” –Merge, quick, stooge –Counting, bucket, radix How to select the n-th largest/smallest.
Communication and Computation on Arrays with Reconfigurable Optical Buses Yi Pan, Ph.D. IEEE Computer Society Distinguished Visitors Program Speaker Department.
Efficient Local Statistical Analysis via Integral Histograms with Discrete Wavelet Transform Teng-Yok Lee & Han-Wei Shen IEEE SciVis ’13Uncertainty & Multivariate.
Review 1 Arrays & Strings Array Array Elements Accessing array elements Declaring an array Initializing an array Two-dimensional Array Array of Structure.
U N I V E R S I T Y O F S O U T H F L O R I D A Hadoop Alternative The Hadoop Alternative Larry Moore 1, Zach Fadika 2, Dr. Madhusudhan Govindaraju 2 1.
Lecture 4: Lossless Compression(1) Hongli Luo Fall 2011.
1 Channel Coding (III) Channel Decoding. ECED of 15 Topics today u Viterbi decoding –trellis diagram –surviving path –ending the decoding u Soft.
Speeding up Lossless Image Compression: Experimental Results on a Parallel Machine Luigi Cinque and Sergio De Agostino Computer Science Department Sapienza.
JPEG.
Reporter :Chien-Wen Huang Date : Information Sciences, Vol. 176, No. 22, Nov. 2006, pp Received 29 December 2004; received in revised.
Image Processing A Study in Pixel Averaging Building a Resolution Pyramid With Parallel Computing Denise Runnels and Farnaz Zand.
1. Searching The basic characteristics of any searching algorithm is that searching should be efficient, it should have less number of computations involved.
Compressing Bi-Level Images by Block Matching on a Tree Architecture Sergio De Agostino Computer Science Department Sapienza University of Rome ITALY.
Unit-8 Sorting Algorithms Prepared By:-H.M.PATEL.
Computer Science 320 Measuring Sizeup. Speedup vs Sizeup If we add more processors, we should be able to solve a problem of a given size faster If we.
1 Potential for Parallel Computation Chapter 2 – Part 2 Jordan & Alaghband.
1 A simple parallel algorithm Adding n numbers in parallel.
Analyzing Memory Access Intensity in Parallel Programs on Multicore Lixia Liu, Zhiyuan Li, Ahmed Sameh Department of Computer Science, Purdue University,
Introduction to Communication Lecture (11) 1. Digital Transmission A computer network is designed to send information from one point to another. This.
Using the VTune Analyzer on Multithreaded Applications
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Interconnection Networks (Part 2) Dr.
Steven Ge, Xinmin Tian, and Yen-Kuang Chen
Real-Time Ray Tracing Stefan Popov.
Fei Li Jinjun Xiong University of Wisconsin-Madison
Yu Su, Yi Wang, Gagan Agrawal The Ohio State University
Lesson 15: Processing Arrays
Image Processing, Leture #16
Image Compression Purposes Requirements Types
A Block Based MAP Segmentation for Image Compression
Algorithm Analysis T(n) O() Growth Rates 5/21/2019 CS 303 – Big ‘Oh’
WJEC GCSE Computer Science
Scalable light field coding using weighted binary images
Presentation transcript:

Binary Image Compression via Monochromatic Pattern Substitution: A Sequential Speed-Up Luigi Cinque and Sergio De Agostino Computer Science Department Sapienza University of Rome, Italy Luca Lombardi Computer Science Department University of Pavia, Italy

The Binary Image Compression Scheme The image is read by a raster scan. If the 4 x 4 subarray in position (i,j) is monochromatic then we compress the largest monochromatic rectangle in that position else the 4x4 subarray is left uncompressed. The positions covered by the detected monochromatic rectangles and non-monochromatic 4x4 sub-arrays are skipped in the linear scan of the image.

The Binary Image Compression Scheme The encoding scheme starts each coding item with a flag field indicating whether there is a monochromatic rectangle (0 for white, 10 for black) or raw data (11). If the flag field is 11 then 16 uncompressed bits follow else the width and the length of the monochromatic rectangle must be encoded. A variable length coding technique is used which has the same compression effectiveness of the block matching method (Storer and Helfgott [97], The Computer Journal) and it is suitable for implementations on large scale parallel systems.

The Coding Technique 12 bits are a practical upper bound to encode the width or the length of a rectangle. Either 12 or 8 or 4 bits are used to encode the width and the length, defining 9 classes of rectangles. Therefore, the flag fields 0 and 10 are followed by a second flag field indicating one of the nine classes. We can partition an image into a thousand blocks and apply compression via monochromatic pattern substitution to each block independently with no relevant loss of effectiveness.

Experimental Results on the Speed-Up In order to implement decompression on an array of processors with distributed memory and no interconnections, we indicate the end of the encoding of a block with 111 changing the flag field 11 to 110. We obtained the expected speed-up of the compression and decompression times, achieving parallel running times about twenty times faster than the sequential ones with up to 32 processors of a 256 Intel Xeon 3.06 GHz processors machine (avogadro.cilea.it) on a test set of large (4096 x 4096 pixels) binary images.

Compression Efficiency The monochromatic pattern substitution technique requires O(Mlog M) time to compute a rectangle of size M and the worst-case sequential time is Ω(nlog M) for an image of size n. The technique has the same complexity of the block matching method but it is twice faster in practice. In conclusion, the monochromatic pattern substitution method is more scalable and more efficient.

Worst Case Running time ≈ M(1+1/2+1/3+…+1/M)= θ(Mlog M)

Waste Factor The waste factor is the average number of rectangles covering one pixel in the parsing of the image. On realistic data, it is conjectured the waste factor is always less than two. It follows that the sequential time of the monochromatic pattern substitution technique is O(nlog M). In practice, time is linear as for the square block matching technique. The waste factor decreases to 1 when the parallel procedure is applied to an image partitioned into up to 256 blocks.

Waste Factor Same behavior on the CCITT image test set (1728 x 2376 pixels) and the set of 4096 x 4096 pixels images. ProcessorsWaste Factor

Sequential Speed-up Number of Blocks Coding Time Decoding Time Average running times on the 4096 x 4096 pixels images (cs.) Average running times on the CCITT set (cs.) Number of Blocks Coding Time Decoding Time

Speeding up Parallel Computation The CCITT times were obtained with a single core of a quadcore (CPU Intel Core 2 Quad Q GHz). Since the sequential speed-up is also obtained with smaller partitions, it can be applied to parallel computation as well. Using the full power of the quadcore, the compression and decompression times are lowered to 1.5 and 0.6 cs. respectively using partitions of 64 blocks.

Speeding up Parallel Computation The times of the 4096x4096 pixels images were obtained with a single core of the 256 processors machine avogadro.cilea.it. Using 16 processors, we lowered the compression and decompression times to 3 and 1 cs. respectively using partitions of 16 blocks (5 and 3 cs. are the times obtained for compression and decompression with no sequential speed-up). Generally speaking, such sequential speed-up can be applied to small scale parallel systems.

Future work As future work, we wish to implement compression and decompression via monochromatic pattern substitution on a graphical processing unit (GPU). With such device, we will have more cores available for experiments. If monochromatic pattern substitution can be realized on a general purpose GPU, it will be straightforward to experience the effects of the sequential speed-up presented here.

Thank You