Download presentation
Presentation is loading. Please wait.
1
IR IL Compression
2
code for integer encoding x > 0 and Length = log 2 x +1 e.g., 9 represented as. code for x takes 2 log 2 x +1 bits (ie. factor of 2 from optimal) Length-1 Optimal for Pr(x) = 1/2x 2, and i.i.d integers
3
It is a prefix-free encoding… Given the following sequence of coded integers, reconstruct the original sequence: 0001000001100110000011101100111 8 63 597
4
code for integer encoding Use -coding to reduce the length of the first field Useful for medium-sized integers e.g., 19 represented as. coding x takes about log 2 x + 2 log 2 ( log 2 x ) + 2 bits. Optimal for Pr(x) = 1/2x(log x) 2, and i.i.d integers
5
Variable-byte codes [10.2 bits per TREC12] Wish to get very fast (de)compress byte-align Given a binary representation of an integer Append 0s to front, to get a multiple-of-7 number of bits Form groups of 7-bits each Append to the last group the bit 0, and to the other groups the bit 1 (tagging) e.g., v=2 14 +1 binary(v) = 100000000000001 10000001 10000000 00000001 Note: We waste 1 bit per byte, and avg 4 for the first byte. But it is a prefix code, and encodes also the value 0 !!
6
Rice code (simplification of Golomb code) It is a parametric code: depends on k Quotient q= (v-1)/k , and the rest is r= v – k * q – 1 Useful when integers concentrated around k How do we choose k ? Usually k 0.69 * mean(v) [Bernoulli model] Optimal for Pr(v) = p (1-p) v-1, where mean(v)=1/p, and i.i.d ints [q times 0s] 1 Log k bits
7
Interpolative coding = 1 1 1 2 2 2 2 4 3 1 1 1 M = 1 2 3 5 7 9 11 15 18 19 20 21 Recursive coding preorder traversal of a balanced binary tree At every step we know (initially, they are encoded): num = |M| = 12, Lidx=1, low = 1, Ridx=12, hi = 21 Take the middle element: h= (Lidx+Ridx)/2=6 M[6]=9, num_left= h – Lidx = 5, num_right= Ridx-h = 6 low + left_size =1+5 = 6 ≤ M[h] ≤ hi – right_size = (21 – 6) = 15 We can encode 9-6=3 in log 2 (15-6+1) = 4 bits lo=1, hi=9-1=8, num=5 lo=9+1=10, hi=21, num=6
8
PForDelta coding 1011 …01 11 0142231110 233…11332313422 a block of 128 numbers Use b (e.g. 2) bits to encode 128 numbers or create exceptions Encode exceptions: ESC or pointers Choose b to encode 90% values, or trade-off: b waste more bits, b more exceptions Translate data: [base, base + 2 b -1] [0,2 b -1]
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.