2004 SDU Uniquely Decodable Code 1.Related Notions 2.Determining UDC 3.Kraft Inequality
2004, 2009 SDU 2 A Resulting Problem Given a coding scheme of the source symbols, how to verify whether it is uniquely decodable or not?
2004, 2009 SDU 3 Related Notions alphabet: = {0, 1, …, -1} symbol or letter: an element of alphabet word: a sequence of symbols of finite length Code: a collection of words on a specified alphabet codeword: a word in a code message: a sequence of codewords Uniquely decodable code C: every message can be uniquely decomposed into the codewords in C {0, 10, 01} vs {0, 10, 11}
2004, 2009 SDU 4 Related Notions prefix and suffix: if w = ps, then p is prefix of w and s is suffix of w empty word: a word with length 0 suffix word: a non-empty word t is called a suffix word if there exist two messages C 1 C 2 …C m and C 1 ’C 2 ’…C n ’ such that C i, C j ’ are all codewords for 1 i m, 1 j n, and C 1 C 1 ’, t is the suffix of C n ’, C 1 C 2 …C m t = C 1 ’C 2 ’…C n ’.
2004, 2009 SDU 5 A Key Lemma for Determining UDC Lemma. A code C is uniquely decodable if and only if each suffix word is not a codeword in C. Proof. Suppose that a suffix word t is a codeword in C, according to the definition of suffix word, there exist two messages C 1 C 2 …C m and C 1 ’C 2 ’…C n ’ such that C 1 C 1 ’ and C 1 C 2 …C m t = C 1 ’C 2 ’…C n ’. Hence, there are two ways to decompose the message C 1 ’C 2 ’…C n ’, indicating that C is not uniquely decodable. A contradiction to that C is a UDC.
2004, 2009 SDU 6 Proof Suppose that C is not uniquely decodable, then there exists some message which can be decomposed in more than one ways. Let be such a message of the least length, = C 1 C 2 …C k = C 1 ’C 2 ’…C n ’, where C i (1 i k), C j ’ (1 j n) are all codewords, and C 1 C 1 ’. Without loss of generality, assume that C k is a suffix of C n ’, then C k is a suffix word. A contradiction to that each suffix word is not a codeword in C.
2004, 2009 SDU 7 UDC Verification By the key lemma If we can generate all the suffix words of a code C If none of suffix words is a codeword in C, then C is uniquely decodable. If some suffix words are codewords, then C is not uniquely decodable. The following determining algorithm is directly from the key lemma.
2004, 2009 SDU 8 The Determining Algorithm UDC-Verification(C) 1 T 2 for each pair of codeword C i, C j C (i j) do 3 if C i = C j, then return NO. (C is not uniquely decodable) 4 if there exists a word s such that C i s = C j or C i = C j s, then T T {s} 5 endfor 6 for each pair of suffix word t and codeword C k do 7 if t = C k, then return NO. (C is not uniquely decodable) 8 if there exists a word s such that ts = C k or C k s = t, then T T {s} 9 endfor 10 return YES. (C is uniquely decodable)
2004, 2009 SDU 9 Correctness of Algorithm Theorem. The algorithm UDC-Verification correctly verifies whether a code C is uniquely decodable or not. Proof. we should prove: (1) Each word s put into T in Step 1.2 or Step 2.2 is a suffix word. (2) If the algorithm stops at Step 3, then the algorithm computes all the suffix words of code C and ensures that they are not codewords.
2004, 2009 SDU 10 Proof (1). The word s put in T in Step 1.2 is obviously a suffix word. We next consider the word s put into T in Step 2.2. As t is a suffix word, there exist codewords C 1, C 2,…, C m and C 1 ’, C 2 ’, …, C n ’ such that C 1 C 1 ’ and C 1 C 2 …C m t = C 1 ’C 2 ’…C n ’. If ts = C k, then C 1 C 2 …C m C k = C 1 ’C 2 ’…C n ’s, indicating s is a suffix word. If C k s = t, then C 1 C 2 …C m C k s = C 1 ’C 2 ’…C n ’, indicating s is a suffix word.
2004, 2009 SDU 11 (2). For each suffix word t of C, let m(t) = C 1 C 2 …C m be the shortest message satisfying C 1 C 2 …C m t = C 1 ’C 2 ’…C n ’ and t is the suffix of C n ’. Prove by induction on the length of m(t) that t can be generated by the algorithm. Basic Step: |m(t)| = 1, then n = m =1, so t is generated in Step 1.2. Inductive Step: Suppose every suffix word p with |m(p)| < |m(t)| had been generated by the algorithm, we now prove that t can also be generated by the algorithm. Because t is the suffix of C n ’, we have pt = C n ’, then C 1 C 2 …C m = C 1 ’C 2 ’…C n-1 ’p. Proof
2004, 2009 SDU 12 Proof (i). If p = C m, then C m t = C n ’, t is generated in Step 1.2. (ii). If p is suffix of C m, according to C 1 C 2 …C m = C 1 ’C 2 ’…C n-1 ’p, p is a suffix word. For |m(p)| < |m(t)|, the inductive hypothesis indicates that p had been generated by the algorithm. So when applying suffix word p and codeword C n ’ in Step 2, Step 2.2 will put t into T since pt = C n ’. (iii). If C m is a suffix of p, then C m t is suffix of C n ’, then C m t is a suffix word for C 1 C 2 …C m t = C 1 ’C 2 ’…C n ’, and |m(C m t)| |C 1 C 2 …C m-1 |, the inductive hypothesis indicates that C m t had been generated by the algorithm. So when applying suffix word C m t and codeword C m in Step 2, Step 2.2 will put t into T for C m t = C m t. suffix word
2004, 2009 SDU 13 Time Complexity Analysis Suppose there are n codewords in C, and the length of the longest word is l, then Step 1: O(n 2 l) comparisons Step 2: Number of suffix words is at most O(nl), So O(n 2 l 2 ) comparisons and O(n 2 l 2 ) insertion of suffix words into T. Totally, O(n 2 l 2 ).
2004, 2009 SDU 14 Property of UDC—Kraft Inequality 1.Let C = {C 1, C 2, …, C n } be a uniquely decodable code on an alphabet of cardinality , let l i = |C i | for 1 i n, then we have 2.Conversely, if a set of integers {l 1, l 2,..., l n } satisfies the Kraft inequality, then a prefix code C = {C 1, C 2, …, C n } can be found with codeword lengths {l 1, l 2,..., l n }. Note: prefix code C = {C 1, C 2, …, C n } means that neither C i nor C j is a prefix of the other, for each pair of codewords C i and C j (i j). Strictly, called prefix-free code Prefix-free code is UDC {00, 10, 11, 100, 111} vs {00, 10, 11, 010, 011} Kraft Inequality
2004, 2009 SDU 15 Proof of Property 1 (in text book page 246): Let m be an arbitrary positive integer, then For each of n m messages consisting of m codewords, there is a unique corresponding term in the above formula. Let N(m, j) be the number of messages of length j and consisting of m codewords. Then C is uniquely decodable, there are no identical messages. So N(m, j) j, We have So, for any positive integer m > 0, there is, So the Kraft Inequality Holds. length of the longest codeword in C
2004, 2009 SDU 16 Proof of Property 2 Let 1 < 2 < … < m be m integers such that {l 1, l 2, …, l n } = { 1, 2, …, m } when ignoring repeats. Let k j is the number of l i ’s that equals to j. We should prove that, there exists a prefix code C such that the number of codewords in C with length j is k j. The Kraft Inequality becomes Prove by induction that: For each 1 r m, there exists prefix code C r such that for any 1 j r, the number of codewords in C r with length j is k j.
2004, 2009 SDU 17 Proof of Property 2 Basic Step: r = 1, the above inequality means k 1 - 1 1, which is k 1 1. Obviously there exist 1 different words of length 1, we can arbitrarily select k 1 of them to form C 1. Inductive Step: Suppose that C r exists for r < m, we prove that C r+1 exist for r +1 m. From, we have, which means Among the r+1 different words with length r+1, there are codewords with length j in C r. So we can select k r+1 different words with length r+1, and the codewords in C r are not prefix of them. So we extend C r to C r+1.
2004, 2009 SDU 18 Thanks for attention!