Compressing Bi-Level Images by Block Matching on a Tree Architecture Sergio De Agostino Computer Science Department Sapienza University of Rome ITALY
Lossless image compression by block matching is an extension of the LZ1 method to bi-level images. A square greedy matching heuristic using a simple hashing scheme provides a linear time implementation. Storer [1996], Proc. IEEE Data Compression Conf. A slower rectangle greedy matching technique requires O(Mlog M) time to compute a match (M is the size of a match) and the worst-case sequential time is Ω(nlog M) for an image of size n. Storer and Helfgott [1997], The Computer Journal The image is scanned in some linear order and the window is unrestricted.
The Compression Scheme The image is read by a raster scan and the matching algorithm works with a perfect hashing table with one position for each possible 4x4 subarray. All-zero and all-one rectangles are handled differently. The encoding scheme starts each pointer with a flag field indicating whether there is a monochromatic rectangle (0 for white, 10 for black), a match (110) or raw data (111).
If the 4 x 4 subarray in position (i,j) is monochromatic, then we compute the largest monochromatic rectangle in that position and encode the width and the length. Otherwise we compute the largest rectangular match in the position provided by the hash table, encode its position, width and length, and update the table with the current position. If the subarray is not hashed, then it is left uncompressed and added to the hash table with its current position. The positions covered by matches are skipped in the linear scan of the image.
Worst Case Running time ≈ M(1+1/2+1/3+…+1/M)= θ(Mlog M)
Parallel Block Matching A work-optimal parallel block matching algorithm requiring O(log M log n) time can be implemented on a shared memory machine. An m x m' image is partitioned into w x l rectangular areas A i,j for 1≤ i ≤ ┌ m/w ┐ and 1≤ j ≤ ┌ m’/l ┐, where w and l are θ(log 1/2 m x m'), and the sequential block matching algorithm is applied to each area. If w, l are θ(α 1/2 ) and α is Ω(log n), the algorithm can be implemented in O(α logM) time with O(n/α) processors.
Merging phases Larger monochromatic rectangles are computed by merging adjacent monochromatic areas.
The Tree Architecture We extend an m x m’ bi-level image with dummy rows and dummy columns so that the image can be partitioned in l x w rectangular areas B i,j for 1 ≤ i, j ≤ 2 h, where h = min { d : 2 d ≥ max(m/l, m’/w) }. The tree is full and binary. The leaves of the full binary tree are 2 2h and labeled from 1 to 2 2h from left to right. 2h is the height of the tree.
Storing the Image STORE(image I, integers μ, i, j) if μ >1 STORE(I, μ/2, i, j) STORE(I, μ/2, i+ μ/2, j) STORE(I, μ /2, i, j+ μ/2) STORE(I, μ/2, i+ μ/2, j+\mu/2) else store B i,j into leaf k; k=k+1 k is a global integer variabile initially set to 1. STORE(I, 2 h, 1, 1) stores the rectangular areas into the leaves of the tree.
Example
Merging phases For each area of a larger monochromatic rectangle, the indices of the areas at the upper left and lower right corners are stored in the corresponding leaf.
Vertical merging phases The k-th vertical merging phase is computed by broadcasting the information through processors up to level 2h - 2k + 1.
Horizontal merging phases The k-th horizontal merging phase is computed by broadcasting the information through processors up to level 2h - 2k.
Two assumptions needed to obtain an O(α log M) time encoder/decoder with O(n/α) processors on a full binary tree architecture are: the number of monochromatic matches with length or width ≥ 2 k ┌ log 1/2 n ┐ is O(n/(2 2k log n) for 1 ≤ k ≤ h – 1. each pixel is covered by a small constant number of monochromatic matches. It follows that the amount of information each processor at level k must broacast is constant for 1 ≤ k ≤ h – 1.
Encoding on the Tree The leaf processors produce the pointers. If the leaf processor stores a non-monochromatic area, the sequence of pointers is produced by a raster scan. The end of the sequence of pointers is indicated with the flag field 1111 (the flag field 111 is changed to 1110). A pointer encoding a monochromatic rectangle obtained by merging is produced by the leaf processor storing the upper left corner.
The order of the pointers is the one of the leaves. A pointer encoding a monochromatic rectangle obtained by merging is followed by the flag field 1111 and the index of the next leaf storing some pointer. Such index is computed by parallel suffix computation if for each leaf a variable is set to 1 if the leaf stores at least a pointer, 0 otherwise. Such encoding is realized in O(α) time with O(n/α) processors on a full binary tree architecture.
The Parallel Decoder For each area A i,j, the input phase of the parallel decoder identifies the corresponding sequence of pointers, if any, and stores it into the corresponding leaf by reading the encoding binary string from left to right and detecting the flag fields equal to Once the input phase is completed, the parallel decoding is divided in two phases and requires O(α log M) time with O(n/α) processors. The parallel decoder is not work-optimal since the sequential decoder requires linear time.
Phase 1 For each area A i,j one processor decodes the corresponding sequence of pointers, if any. For each monochromatic rectangle, the left upper portion corresponding to a given area A i,j is decoded. i or j are odd values. A i,j
Phase 2 Step 1: Left upper monochromatic area A 2i-1,j is copied on A 2i,j if it is monochromatic and has the same color (information provided by the pointer); the same is done horizontally. → ↓ → A 2i-1,j A 2i,,j
Step k: Areas A (i-1)2^(k-1)+1,j … A i2^(k-1),j, with i odd, are copied respectively on A i2^(k-1)+1,j … A (i+1)2^(k-1),j if monochromatic with the same color (similarly on the vertical boundaries). ↓ →
Parallel Complexity Phase 1 requires O(α) time with O(n/α) processors. Phase 2 requires O(α log M) time with O(n/α) processors under the same assumptions made for the parallel block matching algorithm. After phase 2, the image is stored into the leaves as with the STORE procedure. In conclusion, the computation of the block matching encoder/decoder takes O(α log M) time on a full binary tree architecture with O(n/α) processors, where α is an integer parameter in Ω(log n).
Future Work ●To make block matching suitable for large scale array architectures by improving its compression locally. ●To study how scalability and compression effectiveness relate to each other. ●To detect experimentally an upper bound to the integer parameter α for which the tree architecture is effective. ●To detect if there are values of the integer parameter α which it is better to discriminate.