Presentation is loading. Please wait.

Presentation is loading. Please wait.

Quasi-Distinct Parsing and Optimal Compression Methods

Similar presentations


Presentation on theme: "Quasi-Distinct Parsing and Optimal Compression Methods"— Presentation transcript:

1 Quasi-Distinct Parsing and Optimal Compression Methods
Avivit Levy Shenkar College and CRI, Haifa University Joint work with: Amihood Amir, Yonatan Aumann, Yuri Roshko

2 Motivation Analysis of compression schemes is a complex task.
We do not well compress everything that is theoretically compressible. Conceptually different schemes are needed. Theoretical tools capable of analysis of such schemes are needed.

3 Our Work Provide a generalization of LZ78 optimality proof - New tool for proving optimality of compression schemes. Present a new code: Arithmetic Progressions Tree (APT) Apply the new theorem to analyze APT as a compression scheme.

4 Parsing of Strings Parsing divides the given string into phrases. Possibly a dictionary. Example: LZ78 parsing

5 Distinct Parsing Distinct parsing: All phrases are distinct.
Allows no repetitions in the created dictionary. Example 1 Consider the parsing: The i-th phrase is the next i bits. This is a distinct parsing.

6 Distinct Parsing 1,0,11,01,111,011,00,010,0110,10 Example 2
Consider LZ78 parsing: This is a distinct parsing. 1,0,11,01,111,011,00,010,0110,10

7 Quasi-Distinct Parsing
Quasi-distinct parsing allows infinitely many repetitions in the created dictionary. As long as overall repetitions are o(n/log n). Gives more flexibility in the parsing definition. Does it affect the efficiency?

8 Quasi-Distinct Parsing
Example Consider the parsing: Next i phrases defined by i times taking the next i bits. This is a quasi-distinct parsing: A phrase of length i cannot repeat more than i-1 times. Overall repetitions bounded by O(n2/3).

9 LZ Optimality Theorem For any stochastic ergodic process, if
S distinct parsing of X1,…,Xn giving c(n) phrases The total codeword length Len(X1,…,Xn)=c(n)(log c(n)+1) Then lim sup Len(X1,…,Xn)/n≤H().

10 Generalized Optimality Theorem
For any stochastic ergodic process, if S quasi-distinct parsing of X1,…,Xn giving c(n) phrases The total codeword length Len(X1,…,Xn)=c(n)(log c(n)+) Then lim sup Len(X1,…,Xn)/n≤H().

11 Correctness Where in LZ78 proof the distinctness condition is used ?
For bounding c(n)=O(n/log n). To get Ziv’s inequality: The relation between c(n)log c(n) and a Markov approximation of .

12 The Dictionary Size Bound
c(n)=O(n/log n) apply for quasi-distinct parsing: Take one instance for each phrase – This set of distinct phrases is O(n/log n) by LZ bound. The multi-set of other phrases is o(n/log n) by quasi-distinctness definition.

13 Ziv’s Inequality We somehow get:
But what about the sum of probabilities?

14 Application of Theorem: APT
Arithmetic Progressions Tree (APT): Recursive cover by arithmetic progressions. Idea presented by AmirLevyReuveni (FI, 2008) for different purpose. Example: S= 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 1 1 1

15 Application of Theorem: APT…
19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 19 17 16 14 11 9 8 5 4 2 1 19 17 16 14 11 9 8 5 4 2 1 19 17 16 14 11 9 8 5 4 2 1 19 17 16 14 11 9 8 5 4 2 1 19 17 16 14 11 9 8 5 4 2 1 19 17 16 14 11 9 8 5 4 2 1 2 16 8 4 1 2 8 4 1 2 16 8 4 1 2 4 1 2 16 8 4 3 2 11 17 2 4 2 1 16

16 Application of Theorem: APT…
generate APT tree representing data generate code representing APT tree APT tree binary data user APT code binary data decode data to APT tree reconstruct binary data from APT tree APT code APT tree

17 Application of Theorem: APT…
How to analyze? By theorem: Show a QD parsing Bound total phases representation First step: parsing By start positions of progressions. S= The start positions in APT: 2,4,11,16 1 1 1 1

18 Application of Theorem: APT…
Is it quasi-distinct? empirical evidence no combinatorial proof yet APT quasi-distinctness hypothesis.

19 Application of Theorem: APT…
APT parsing considers only leaves. How to apply theorem? use theorem for leaves storage analysis. use APT structure to infer tree storage. Idea: Get bound on total number of APT nodes, based on leaves bound. Property: Should be enough 1’s, but also enough 0’s to get APT nodes. Sum is n.

20 Application of Theorem: APT…
Second step: how to store? The challenge is the references. use APT reference string. Example: The APT reference string of S= R= 2,4,11,16 (leaf), 1,2,3,4,17 (internal) reference string more compressible, has smaller entropy.

21 Conclusions Contribution 1: Theoretical tool
Two conditions sufficient for optimality: Defined by quasi-distinct parsing. Having total length c(n)(log c(n)+). Contribution 2: The APT coding First application of new theorem. First presentation of APT as code. First (weak) theoretical analysis of APT.


Download ppt "Quasi-Distinct Parsing and Optimal Compression Methods"

Similar presentations


Ads by Google