Download presentation
Presentation is loading. Please wait.
Published byKerry Morton Modified over 9 years ago
2
Divide the encoded file into blocks of size b Use an auxiliary bit vector to indicate the beginning of each block Time – O(b) Time vs. Memory storage tradeoff
3
Grossi, Gupta and Vitter – 2003 110010100 10100 0101 00110001 01001 00010011101010011 010 10010 01 10
4
Grossi and Ottaviano - Wavelet trees based on Patricia trie Brisaboa, Ladra, Navarro (IPM 2013) – Wavelet tree for Byte Codes Kulekci (DCC 2014) - Elias and Rice code P. Prochazka, J. Holub – (DCC 2014) compression for similar biological sequences
5
Fibonacci Codes Rank and Select Random Access using auxiliary index Random Access using Wavelet trees Improved Wavelet trees for Random Access Experimental Results
6
Fibonacci Codes Rank and Select Random Access using auxiliary index Random Access using Wavelet trees Improved Wavelet trees for Random Access Experimental Results
7
0 1 1 2 3 5 8 13 21 34 55 89 144 … Basis elements of a numeration system
8
1248163264128 Basis elements:111 73 = 001101 3412358132155 Fibonacci:000 73 = No adjacent 1’s00000
9
EExample: 19 = 101001 PProblem: Not instantaneous Solution: Reverse the codeword EExample: 19 = {{11, 011, 0011, 1011, 00011, 10011, 01011, 000011, 100011, 010011, 001011, 101011, 0000011, …} 1101001 1 1001011
10
SSet of strings ending in 11 with no other adjacent 1’s {{11, 011, 0011, 1011, 00011, 10011, 01011, 000011, 100011, 010011, 001011, 101011, 0000011, …}
11
Fibonacci Codes Rank and Select Random Access using auxiliary index Random Access using Wavelet trees Improved Wavelet trees for Random Access Experimental Results
12
Given a bit vector B of length n rank 1 (B,i) - (resp. rank 0 (B,i) ) - the number of 1s (resp. 0s) up to and including position i in B select 1 (B,i) - (resp. select 0 (B,i) ) - returns the index of the i th 1 (resp. 0s)
13
rank 1 (B,i) = i-rank 0 (B,i) › compute only rank 1 (B,i) Naive Solution: Store rank answers: Example: 1234567891011121314151617181920 01000101100001111001 01111223444445678889
14
Store rank answers every lg 2 n bits of B. › Use lg n bits for each answer Divide each chunk into ( lg n)/2 chunks, Store rank answers relative to last sample every ( lg n)/2 bits › Use 2lglg n bits per sub-sample Bottom Level – use a simple Lookup table. Space Complexity -
15
7041 blocks 21627... 613 950 Output = 7041+613+ 000…000 000…011 000…101 000…112 … 1111…0 1111…1
16
Fibonacci Codes Rank and Select Random Access using auxiliary index Random Access using Wavelet trees Improved Wavelet trees for Random Access Experimental Results
17
1. E(T) compress T 2. Generate B of size |E(T)| so that: B[i] 1 iff E(T)[i] is the first bit of a codeword 3. Construct a rank/select data structure for B Space Complexity
18
Fibonacci Codes Rank and Select Random Access using auxiliary index Random Access using Wavelet trees Improved Wavelet trees for Random Access Experimental Results
19
T = COMPRESSORS = {C, M, P, E, O, R, S} Occ = {1,1,1,1,2,2,3} E(T)= 01011 0011 10011 00011 011 1011 11 11 0011 011 11 100101 101 011 00111 01 00100111001 1111 1 1 11 1
20
extract(V root, i){ code v V root while v is not a leaf if B v [i] = 0; v left(v) codecode 0 i rank 0 (B v, i) else v right(v) codecode 1 i rank 1 (B v, i) return D(code)
21
select x (T, i){ w leaf corresponding to f(x) v father of w while v V root if w is a left child of v iindex of the i th 0 in B v else iindex of the i th 1 in B v return i
22
Redundant information for single child nodes. › Similar to the collapsing strategy suffix trees
23
100101 101 011 00111 01 00100111001 1111 1 1 11 1 100101 101 011 00111 01 00100111001 E(T)= 01011 0011 10011 00011 011 1011 11 11 0011 011 11 E(T)= 01011 0011 10011 00011 011 1011 11 11 0011 011 11
24
if suffix of code = 0 codecode 11 if suffix of code 11 codecode 1 return D(code)
25
Recursive definition of a FWT of depth h+1 Assumption: if the tree is of depth h+1 then all the F h codewords of length h+1 are in the alphabet.
26
N h+1 =N h +N h-1 +3 ThTh T h-1 T h+1
27
23452345 N h+1 =N h +3F h N h+1 =3F h+2 -3 P h-1 =2F h+2 -3 P h-1 /N h+1 =(2F h+2 -3)/3F h+2 -3 ⅔ h
29
English Heaps – distribution of 26 characters and 371 bigram Finnish – Pesonen- 29 letters French – Tr é sor de la Langue Fran ç aise 26 letters German Bauer & Goos– 30 letters Hebrew and Aramaic The Responsa Retrieval Project– 30 letters, 735 bigrams Italian – 26 letters Spanish – 26 letters Portuguese – 26 letters
30
File n HeightFWTPrunedHuffman English2684.904.434.19 Finnish2984.764.444.04 French2684.534.144.00 German3084.704.374.15 Hebrew3084.824.424.29 Italian2684.704.324.00 Portuguese2684.674.284.01 Spanish2684.714.304.05 Russian3285.134.764.47 English-2378148.788.567.44 Hebrew-2743159.138.978.04
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.