Compression for Fixed-Width Memories Ori Rottenstriech, Amit Berman, Yuval Cassuto and Isaac Keslassy Technion, Israel
Data Compression: Communication Channel vs. Memory 2 Communication Channel Memory Minimize the average word length Maximize the probability that data unit can be stored in a row
Memory Hierarchy 3 Longer Rows (and capacity) Better Performance SRAM DRAM Flash / HDD Processor
Long encoded rows cannot be stored within a fixed-width memory word o We would like to maximize the probability of a successful encoding within the fixed width Related work: Compression with low probability of buffer overflow [Jelinek 1968, Humblet 1981] Joint Compression for Two Data Blocks 4 Width (L bits) Successfully encoded rows: Too long rows, stored in a slower memory with longer access time:
Four possible data entries are encoded within L=3 bits by Success probability of Encoding Example 5 First field with code Second field with code Possible entries encoded by the encoding scheme
Huffman-based encoding: for Successfully-encoded entries: Better encoding: for Successfully-encoded entries: Huffman-based Encoding may not be optimal S 1,1 0.9 S 1, S 1, S 1, S 2,1 0.5 S 2,2 0.2 S 2, S 2, S 1,1 0.9 S 1, S 1, S 1, S 2,1 0.5 S 2,2 0.2 S 2, S 2,
Definition (Entry Distribution): A n entry distribution, is characterized by two (ordered) sets of elements with their corresponding vectors of positive appearance probabilities s.t. and. Let Definition (Encoding Scheme): A n encoding scheme of an entry distribution D is a pair of two prefix codes. is a prefix code of the set of elements in the first or second field. Problem Definition 7
Definition (Encoding Width Bound): Given an encoding scheme and an encoding width bound of L bits, we say that an entry is encoded successfully if its encoding width is not larger than the encoding width bound, i.e.. Definition (Success Probability): The success probability of an encoding scheme is the probability that the encoding of an arbitrary entry would be successful. Problem Definition (2) 8
Definition (Optimal Success Probability): For an entry distribution D, we denote by the optimal success probability that can be obtained by any encoding scheme Our goal: Find an encoding scheme that maximizes the success probability Constraint on the encoding scheme (Kraft’s Inequality): There exists a prefix encoding of the elements in a set with codeword lengths iff Problem Definition (3) 9
For an entry distribution, we define the following optimization problem: We would like to maximize the success probability while satisfying the Kraft’s inequality for the two prefix codes We have to represent all elements in (Including those that will never be a part of a successfully-encoded entry.) The codeword lengths should be positive integers. Optimization Problem 10
Outline Introduction and Problem Definition General Properties Bounds on the success probability Optimal Conditional Encoding Summary 11
We assume that for the elements of are ordered in a non-increasing order of probabilities: if. Property: For For Property: For the encoding scheme composed of two fixed-length codes with codewords of bits is optimal for and is not optimal for. Property: An encoding scheme with an average encoding width satisfies. General Properties 12
Definition (Monotone Encoding Scheme): An encoding scheme of an entry distribution is called monotone if for implies that Lemma: For any distribution and any there exists a monotone optimal encoding scheme. Monotone Coding 13
Outline Introduction and Problem Definition General Properties Bounds on the success probability Optimal Conditional Encoding Summary 14
first elements w.p. Theorem: The optimal success probability satisfies for Proof outline: We can encode the first elements in using bits and the first elements in using bits. An entry composed of these elements has a width of bits and is encoded successfully. Bounds on the optimal success probability 15 Width (L bits) first elements w.p.
last elements w.p. Theorem: The optimal success probability satisfies for Proof outline: In any monotone encoding scheme, the last elements in are encoded in at least bits, and the last elements in in at least bits. An entry composed of two such elements has a width of at least bits. Bounds on the optimal success probability 16 Width (L bits) last elements w.p.
Outline Introduction and Problem Definition General Properties Bounds on the success probability Optimal Conditional Encoding Summary 17
Given a code of one column, we would like to find a code for the second column that maximizes the success probability. We assume: The case is trivial. Idea: We suggest a dynamic-programming algorithm. For each element in, we consider the possible codeword lengths. Lemma: We can limit the search space of codeword lengths to be [1,3W] bits. We define the weight of a codeword of length as Kraft’s inequality The sum of weights of all codewords in should be at most. Optimal Conditional Encoding 18
Definition (Function F(n,k)): For we denote by the maximal sum of probabilities of entries that can be encoded successfully and satisfy: The second element in the entry is one of the first elements in The sum of weights of the first codewords is at most Property: The maximal success probability of a conditional encoding scheme is given by Optimal Conditional Encoding (2) 19
Theorem: The function satisfies for and for. For To calculate, we suggest a dynamic-programming algorithm based on the recursive formula Property: The time complexity of the algorithm is Optimal Conditional Encoding (3) 20
Concluding Remarks New approach for compression in fixed-width memories Analysis of the optimal success probability Finding the optimal conditional encoding 23