Shannon Entropy Shannon worked at Bell Labs (part of AT&T) Major question for telephone communication: How to transmit signals most efficiently and effectively across telephone wires? Shannon adapted Boltzmann’s statistical mechanics ideas to the field of communication. Claude Shannon, 19162001
Shannon’s Formulation of Communication Message Source Message Receiver Message (e.g., a word) Message source : Set of all possible messages this source can send, each with its own probability of being sent next. Message: E.g., symbol, number, or word Information content H of the message source: A function of the number of possible messages, and their probabilities Informally: The amount of “surprise” the receiver has upon receipt of each message
Message source: One-year-old Messages: “Da” Probability 1 No surprise; no information content Message source: One-year-old Messages: “Da” Probability 1 InformationContent (one-year-old) = 0 bits
Message source: Three-year-old More surprise; more information content Message source: Three-year-old Messages: 500 words (w1 , w2 , ... , w500) Probabilities: p1 , p2 , ... , p500 InformationContent (three-year-old) > 0 bits
Shannon information (H): If all messages have the same probability, then Units = “bits per message” Example: Random bits (1, 0) Example: Random DNA (A, C, G, T) [meaning in “bits per message”] Example: Random notes in an octave (C, D, E, F, G, A, B, C’) [meaning in “bits per message”]
General formula for Shannon Information Content
General formula for Shannon Information Content Let M be the number of possible messages, and pi be the probability of message i.
General formula for Shannon Information Content Let M be the number of possible messages, and pi be the probability of message i.
Example: Biased coin Example: Text
Relation to Coding Theory: Information content = average number of bits it takes to encode a message from a given message source, given an “optimal coding”. This gives the compressibility of a text.
Huffman Coding An optimal (minimal) and unambiguous coding, based on information theory. Algorithm devised by David Huffman in 1952 Online calculator: http://planetcalc.com/2481/ David Huffman
Phrase: to be or not to be Huffman code of phrase: Huffman Coding Example Name:_____________________________ Frequency 5 4 3 2 1 Phrase: to be or not to be Huffman code of phrase: (remember to include sp code for spaces) Average bits per character in code: Shannon entropy of phrase:
Phrase: to be or not to be Huffman code of phrase: Huffman Coding Example Name:_____________________________ Frequency 5 4 3 2 1 Phrase: to be or not to be Huffman code of phrase: (remember to include sp code for spaces) Average bits per character in code: Shannon entropy of phrase:
Clustering C c3 c1 c2 What is the entropy of each cluster? What is the entropy of the clustering?