(C) 2000, The University of Michigan 1 Language and Information Handout #2 September 21, 2000.

Slides:



Advertisements
Similar presentations
Sampling and Pulse Code Modulation
Advertisements

Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Information Theory EE322 Al-Sanie.
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
SIMS-201 Compressing Information. 2  Overview Chapter 7: Compression Introduction Entropy Huffman coding Universal coding.
Protein- Cytokine network reconstruction using information theory-based analysis Farzaneh Farhangmehr UCSD Presentation#3 July 25, 2011.
Chapter 6 Information Theory
SWE 423: Multimedia Systems
Fundamental limits in Information Theory Chapter 10 :
Text Operations: Coding / Compression Methods. Text Compression Motivation –finding ways to represent the text in fewer bits –reducing costs associated.
Information Theory Eighteenth Meeting. A Communication Model Messages are produced by a source transmitted over a channel to the destination. encoded.
Information Theory and Security. Lecture Motivation Up to this point we have seen: –Classical Crypto –Symmetric Crypto –Asymmetric Crypto These systems.
Data Structures – LECTURE 10 Huffman coding
1 Chapter 5 A Measure of Information. 2 Outline 5.1 Axioms for the uncertainty measure 5.2 Two Interpretations of the uncertainty function 5.3 Properties.
Information Theory Rong Jin. Outline  Information  Entropy  Mutual information  Noisy channel model.
Distributed Source Coding 教師 : 楊士萱 老師 學生 : 李桐照. Talk OutLine Introduction of DSCIntroduction of DSC Introduction of SWCQIntroduction of SWCQ ConclusionConclusion.
June 1, 2004Computer Security: Art and Science © Matt Bishop Slide #32-1 Chapter 32: Entropy and Uncertainty Conditional, joint probability Entropy.
Lecture 2: Basic Information Theory Thinh Nguyen Oregon State University.
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
CSI Uncertainty in A.I. Lecture 201 Basic Information Theory Review Measuring the uncertainty of an event Measuring the uncertainty in a probability.
Source Coding Hafiz Malik Dept. of Electrical & Computer Engineering The University of Michigan-Dearborn
Information Theory and Security
Noise, Information Theory, and Entropy
1 Lossless Compression Multimedia Systems (Module 2) r Lesson 1: m Minimum Redundancy Coding based on Information Theory: Shannon-Fano Coding Huffman Coding.
X= {x 0, x 1,….,x J-1 } Y= {y 0, y 1, ….,y K-1 } Channel Finite set of input (X= {x 0, x 1,….,x J-1 }), and output (Y= {y 0, y 1,….,y K-1 }) alphabet.
Albert Gatt Corpora and Statistical Methods. Probability distributions Part 2.
Noise, Information Theory, and Entropy
1 Statistical NLP: Lecture 5 Mathematical Foundations II: Information Theory.
Basic Concepts in Information Theory
Some basic concepts of Information Theory and Entropy
Huffman Coding Vida Movahedi October Contents A simple example Definitions Huffman Coding Algorithm Image Compression.
Information and Coding Theory
§1 Entropy and mutual information
STATISTIC & INFORMATION THEORY (CSNB134)
2. Mathematical Foundations
Information Theory & Coding…
INFORMATION THEORY BYK.SWARAJA ASSOCIATE PROFESSOR MREC.
1. Entropy as an Information Measure - Discrete variable definition Relationship to Code Length - Continuous Variable Differential Entropy 2. Maximum Entropy.
§4 Continuous source and Gaussian channel
1 Foundations of Statistical Natural Language Processing By Christopher Manning & Hinrich Schutze Course Book.
Chapter 11: The Data Survey Supplemental Material Jussi Ahola Laboratory of Computer and Information Science.
Channel Capacity.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
COMMUNICATION NETWORK. NOISE CHARACTERISTICS OF A CHANNEL 1.
JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Essential Information Theory I AI-lab
Summer 2004CS 4953 The Hidden Art of Steganography A Brief Introduction to Information Theory  Information theory is a branch of science that deals with.
Huffman Code and Data Decomposition Pranav Shah CS157B.
Outline Transmitters (Chapters 3 and 4, Source Coding and Modulation) (week 1 and 2) Receivers (Chapter 5) (week 3 and 4) Received Signal Synchronization.
Mathematical Foundations Elementary Probability Theory Essential Information Theory Updated 11/11/2005.
Source Coding Efficient Data Representation A.J. Han Vinck.
Abdullah Aldahami ( ) April 6,  Huffman Coding is a simple algorithm that generates a set of variable sized codes with the minimum average.
Basic Concepts of Information Theory Entropy for Two-dimensional Discrete Finite Probability Schemes. Conditional Entropy. Communication Network. Noise.
1 Lecture 7 System Models Attributes of a man-made system. Concerns in the design of a distributed system Communication channels Entropy and mutual information.
Basic Concepts of Information Theory A measure of uncertainty. Entropy. 1.
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
Channel Coding Theorem (The most famous in IT) Channel Capacity; Problem: finding the maximum number of distinguishable signals for n uses of a communication.
Mutual Information, Joint Entropy & Conditional Entropy
JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Essential Information Theory II AI-lab
This file contains figures from the book: Information Theory A Tutorial Introduction by Dr James V Stone 2015 Sebtel Press. Copyright JV Stone. These.
UNIT I. Entropy and Uncertainty Entropy is the irreducible complexity below which a signal cannot be compressed. Entropy is the irreducible complexity.
UNIT –V INFORMATION THEORY EC6402 : Communication TheoryIV Semester - ECE Prepared by: S.P.SIVAGNANA SUBRAMANIAN, Assistant Professor, Dept. of ECE, Sri.
Statistical methods in NLP Course 2 Diana Trandab ă ț
Basic Concepts of Information Theory Entropy for Two-dimensional Discrete Finite Probability Schemes. Conditional Entropy. Communication Network. Noise.
Statistical methods in NLP Course 2
Introduction to Information theory
Applied Algorithmics - week7
Corpora and Statistical Methods
COT 5611 Operating Systems Design Principles Spring 2012
COT 5611 Operating Systems Design Principles Spring 2014
Presentation transcript:

(C) 2000, The University of Michigan 1 Language and Information Handout #2 September 21, 2000

(C) 2000, The University of Michigan 2 Course Information Instructor: Dragomir R. Radev Office: 305A, West Hall Phone: (734) Office hours: TTh 3-4 Course page: Class meets on Thursdays, 5-8 PM in 311 West Hall

(C) 2000, The University of Michigan 3 Readings Textbook: –Oakes, Chapter 2, pages 53 – 76 Additional readings –M&S, Chapter 7, pages (minus Section 7.4) –M&S, Chapter 8, pages (minus Sections 8.3-4)

(C) 2000, The University of Michigan 4 Information Theory

(C) 2000, The University of Michigan 5 Entropy Let p(x) be the probability mass function of a random variable X, over a discrete set of symbols (or alphabet) X: p(x) = P(X=x), x  X Example: throwing two coins and counting heads and tails Entropy (self-information): is the average uncertainty of a single random variable:

(C) 2000, The University of Michigan 6 Information theoretic measures Claude Shannon (information theory): “information = unexpectedness” Series of events (messages) with associated probabilities: p i (i = 1.. n) Goal: to measure the information content, H(p 1, …, p n ) of a particular message Simplest case: the messages are words When p i is low, the word is less informative

(C) 2000, The University of Michigan 7 Properties of information content H is a continuous function of the p i If all p are equal (p i = 1/n), then H is a monotone increasing function of n if a message is broken into two successive messages, the original H is a weighted sum of the resulting values of H

(C) 2000, The University of Michigan 8 Example Only function satisfying all three properties is the entropy function: p 1 = 1/2, p 2 = 1/3, p 3 = 1/6 H = -  p i log 2 p i

(C) 2000, The University of Michigan 9 Example (cont’d) H = - (1/2 log 2 1/2 + 1/3 log 2 1/3 + 1/6 log 2 1/6) = 1/2 log /3 log /6 log 2 6 = 1/ / /6 = 1.46 H =  p i log 2 (1/p i ) Alternative formula for H:

(C) 2000, The University of Michigan 10 Another example Example: –No tickets left: P = 1/2 –Matinee shows only: P = 1/4 –Eve. show, undesirable seats: P = 1/8 –Eve. Show, orchestra seats: P = 1/8

(C) 2000, The University of Michigan 11 Example (cont’d) H = - (1/2 log 1/2 + 1/4 log 1/4 + 1/8 log 1/8 + 1/8 log 1/8) H = - (1/2 x -1) + (1/4 x -2) + (1/8 x -3) + (1/8 x -3) H = 1.75 (bits per symbol)

(C) 2000, The University of Michigan 12 Characteristics of Entropy When one of the messages has a probability approaching 1, then entropy decreases. When all messages have the same probability, entropy increases. Maximum entropy: when P = 1/n (H = ??) Relative entropy: ratio of actual entropy to maximum entropy Redundancy: 1 - relative entropy

(C) 2000, The University of Michigan 13 Entropy examples Letter frequencies in Simplified Polynesian: P(1/8), T(1/4), K(1/8), A(1/4), I (1/8), U (1/8) What is H(P)? What is the shortest code that can be designed to describe simplified Polynesian? What is the entropy of a weighted coin? Draw a diagram.

(C) 2000, The University of Michigan 14 Joint entropy and conditional entropy The joint entropy of a pair of discrete random variables X, Y  p(x,y) is the amount of information needed on average to specify both their values H (X,Y) = -  x  y p(x,y) log 2 p(X,Y) The conditional entropy of a discrete random variable Y given another X, for X, Y  p(x,y) expresses how much extra information is need to communicate Y given that the other party knows X H (Y|X) = -  x  y p(x,y) log 2 p(y|x)

(C) 2000, The University of Michigan 15 Connection between joint and conditional entropies There is a chain rule for entropy (note that the products in the chain rules for probabilities have become sums because of the log): H (X,Y) = H(X) + H(Y|X) H (X 1,…,X n ) = H(X 1 ) + H(X 2 |X 1 ) + … + H(X n |X 1,…,X n-1 )

(C) 2000, The University of Michigan 16 Simplified Polynesian revisited

(C) 2000, The University of Michigan 17 Mutual information Mutual information: reduction in uncertainty of one random variable due to knowing about another, or the amount of information one random variable contains about another. H(X,Y) = H(X) + H(Y|X) = H(Y) + H(X|Y) H(X) – H(X|Y) = H(Y) – H(Y|X) = I(X;Y)

(C) 2000, The University of Michigan 18 Mutual information and entropy H(X|Y) I(X;Y) H(Y|X) H(X|Y) H(X,Y) I(X;Y) is 0 iff two variables are independent For two dependent variables, mutual information grows not only with the degree of dependence, but also according to the entropy of the variables

(C) 2000, The University of Michigan 19 Formulas for I(X;Y) I(X;Y) = H(X) – H(X|Y) = H(X) + H(Y) – H(X,Y) I(X;Y) =  xy p(x,y) log 2 p(x)p(y) p(x,y) Since H(X|X) = 0, note that H(X) = H(X)-H(X|X) = I(X;X) I(x;y) = log 2 p(x)p(y) p(x,y) : pointwise mutual information

(C) 2000, The University of Michigan 20 The noisy channel model Encoder Channel p(y|x) Decoder Message from a finite alphabet XYŴ W Input to channel Output from channel Attempt to reconstruct message based on output p p Binary symmetric channel

(C) 2000, The University of Michigan 21 Statistical NLP as decoding problems

(C) 2000, The University of Michigan 22 Coding

(C) 2000, The University of Michigan 23 Compression Huffman coding (prefix property) Ziv-Lempel codes (better) arithmetic codes (better for images - why?)

(C) 2000, The University of Michigan 24 Huffman coding Developed by David Huffman (1952) Average of 5 bits per character Based on frequency distributions of symbols Algorithm: iteratively build a tree of symbols starting with the two least frequent symbols

(C) 2000, The University of Michigan 25

(C) 2000, The University of Michigan c bd f g ij he a

(C) 2000, The University of Michigan 27

(C) 2000, The University of Michigan 28 Exercise Consider the bit string: Use the Huffman code from the example to decode it. Try inserting, deleting, and switching some bits at random locations and try decoding.

(C) 2000, The University of Michigan 29 Ziv-Lempel coding Two types - one is known as LZ77 (used in GZIP) Code: set of triples a: how far back in the decoded text to look for the upcoming text segment b: how many characters to copy c: new character to add to complete segment

(C) 2000, The University of Michigan 30 p pe pet peter peter_ peter_pi peter_piper peter_piper_pic peter_piper_pick peter_piper_picked peter_piper_picked_a peter_piper_picked_a_pe peter_piper_picked_a_peck_ peter_piper_picked_a_peck_o peter_piper_picked_a_peck_of peter_piper_picked_a_peck_of_pickl peter_piper_picked_a_peck_of_pickled peter_piper_picked_a_peck_of_pickled_pep peter_piper_picked_a_peck_of_pickled_pepper peter_piper_picked_a_peck_of_pickled_peppers

(C) 2000, The University of Michigan 31

(C) 2000, The University of Michigan 32 Arithmetic coding Uses probabilities Achieves about 2.5 bits per character

(C) 2000, The University of Michigan 33

(C) 2000, The University of Michigan 34 Exercise Assuming the alphabet consists of a, b, and c, develop arithmetic encoding for the following strings: aaaaab ababaa abccab cbabac