Noise, Information Theory, and Entropy

Slides:



Advertisements
Similar presentations
Physical Layer: Signals, Capacity, and Coding
Advertisements

Noise, Information Theory, and Entropy (cont.) CS414 – Spring 2007 By Karrie Karahalios, Roger Cheng, Brian Bailey.
Lecture 4 (week 2) Source Coding and Compression
Applied Algorithmics - week7
Sampling and Pulse Code Modulation
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Greedy Algorithms Amihood Amir Bar-Ilan University.
Information Theory EE322 Al-Sanie.
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
Chapter 6 Information Theory
Compression & Huffman Codes
SWE 423: Multimedia Systems
Digital Data Transmission ECE 457 Spring Information Representation Communication systems convert information into a form suitable for transmission.
Fundamental limits in Information Theory Chapter 10 :
A Data Compression Algorithm: Huffman Compression
Sep 06, 2005CS477: Analog and Digital Communications1 Introduction Analog and Digital Communications Autumn
Lossless data compression Lecture 1. Data Compression Lossless data compression: Store/Transmit big files using few bytes so that the original files.
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
2015/7/12VLC 2008 PART 1 Introduction on Video Coding StandardsVLC 2008 PART 1 Variable Length Coding  Information entropy  Huffman code vs. arithmetic.
CSI Uncertainty in A.I. Lecture 201 Basic Information Theory Review Measuring the uncertainty of an event Measuring the uncertainty in a probability.
Data Compression Basics & Huffman Coding
Noise, Information Theory, and Entropy
Noise and SNR. Noise unwanted signals inserted between transmitter and receiver is the major limiting factor in communications system performance 2.
©2003/04 Alessandro Bogliolo Background Information theory Probability theory Algorithms.
Huffman Coding Vida Movahedi October Contents A simple example Definitions Huffman Coding Algorithm Image Compression.
Information Theory & Coding…
INFORMATION THEORY BYK.SWARAJA ASSOCIATE PROFESSOR MREC.
Source Coding-Compression
§4 Continuous source and Gaussian channel
CS Spring 2011 CS 414 – Multimedia Systems Design Lecture 7 – Basics of Compression (Part 2) Klara Nahrstedt Spring 2011.
Prof. Amr Goneid Department of Computer Science & Engineering
Course Review for Final ECE460 Spring, Common Fourier Transform Pairs 2.
Channel Capacity.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
1 Analog/Digital Modulation Analog Modulation The input is continuous signal Used in first generation mobile radio systems such as AMPS in USA. Digital.
COMMUNICATION NETWORK. NOISE CHARACTERISTICS OF A CHANNEL 1.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.
Communication System A communication system can be represented as in Figure. A message W, drawn from the index set {1, 2,..., M}, results in the signal.
Outline Transmitters (Chapters 3 and 4, Source Coding and Modulation) (week 1 and 2) Receivers (Chapter 5) (week 3 and 4) Received Signal Synchronization.
Coding Theory Efficient and Reliable Transfer of Information
Additive White Gaussian Noise
Abdullah Aldahami ( ) April 6,  Huffman Coding is a simple algorithm that generates a set of variable sized codes with the minimum average.
Lecture 4: Lossless Compression(1) Hongli Luo Fall 2011.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
Bahareh Sarrafzadeh 6111 Fall 2009
Outline Transmitters (Chapters 3 and 4, Source Coding and Modulation) (week 1 and 2) Receivers (Chapter 5) (week 3 and 4) Received Signal Synchronization.
Baseband Receiver Receiver Design: Demodulation Matched Filter Correlator Receiver Detection Max. Likelihood Detector Probability of Error.
Raptor Codes Amin Shokrollahi EPFL. BEC(p 1 ) BEC(p 2 ) BEC(p 3 ) BEC(p 4 ) BEC(p 5 ) BEC(p 6 ) Communication on Multiple Unknown Channels.
1 CSCD 433 Network Programming Fall 2013 Lecture 5a Digital Line Coding and other...
1 Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004.
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 7 – Basics of Compression (Part 2) Klara Nahrstedt Spring 2012.
ECE 101 An Introduction to Information Technology Information Coding.
Channel Coding Theorem (The most famous in IT) Channel Capacity; Problem: finding the maximum number of distinguishable signals for n uses of a communication.
1 Review of Probability and Random Processes. 2 Importance of Random Processes Random variables and processes talk about quantities and signals which.
UNIT I. Entropy and Uncertainty Entropy is the irreducible complexity below which a signal cannot be compressed. Entropy is the irreducible complexity.
Institute for Experimental Mathematics Ellernstrasse Essen - Germany DATA COMMUNICATION introduction A.J. Han Vinck May 10, 2003.
UNIT –V INFORMATION THEORY EC6402 : Communication TheoryIV Semester - ECE Prepared by: S.P.SIVAGNANA SUBRAMANIAN, Assistant Professor, Dept. of ECE, Sri.
(C) 2000, The University of Michigan 1 Language and Information Handout #2 September 21, 2000.
Chapter 4: Information Theory. Learning Objectives LO 4.1 – Understand discrete and continuous messages, message sources, amount of information and its.
1 CSCD 433 Network Programming Fall 2016 Lecture 4 Digital Line Coding and other...
HUFFMAN CODES.
Introduction to Information theory
Analysis & Design of Algorithms (CSCE 321)
COT 5611 Operating Systems Design Principles Spring 2012
COT 5611 Operating Systems Design Principles Spring 2014
Distributed Compression For Binary Symetric Channels
Sampling Theorems- Nyquist Theorem and Shannon-Hartley Theorem
Presentation transcript:

Noise, Information Theory, and Entropy CS414 – Spring 2007 By Roger Cheng, Karrie Karahalios, Brian Bailey

Communication system abstraction Information source Encoder Modulator Sender side Channel Receiver side Decoder Demodulator Output signal

The additive noise channel Transmitted signal s(t) is corrupted by noise source n(t), and the resulting received signal is r(t) Noise could result form many sources, including electronic components and transmission interference s(t) + r(t) n(t)

Random processes A random variable is the result of a single measurement A random process is a indexed collection of random variables, or equivalently a non-deterministic signal that can be described by a probability distribution Noise can be modeled as a random process

WGN (White Gaussian Noise) Properties At each time instant t = t0, the value of n(t) is normally distributed with mean 0, variance σ2 (ie E[n(t0)] = 0, E[n(t0)2] = σ2) At any two different time instants, the values of n(t) are uncorrelated (ie E[n(t0)n(tk)] = 0) The power spectral density of n(t) has equal power in all frequency bands

WGN continued When an additive noise channel has a white Gaussian noise source, we call it an AWGN channel Most frequently used model in communications Reasons why we use this model It’s easy to understand and compute It applies to a broad class of physical channels

Signal energy and power Energy is defined as Power is defined as Most signals are either finite energy and zero power, or infinite energy and finite power Noise power is hard to compute in time domain Power of WGN is its variance σ2

Signal to Noise Ratio (SNR) Defined as the ratio of signal power to the noise power corrupting the signal Usually more practical to measure SNR on a dB scale Obviously, want as high an SNR as possible

Analog vs. Digital Analog system Digital system Any amount of noise will create distortion at the output Digital system A relatively small amount of noise will cause no harm at all Too much noise will make decoding of received signal impossible Both - Goal is to limit effects of noise to a manageable/satisfactory amount

Information theory and entropy Information theory tries to solve the problem of communicating as much data as possible over a noisy channel Measure of data is entropy Claude Shannon first demonstrated that reliable communication over a noisy channel is possible (jump-started digital age)

Review of Entropy Coding Alphabet: finite, non-empty set A = {a, b, c, d, e…} Symbol (S): element from the set String: sequence of symbols from A Codeword: sequence representing coded string 0110010111101001010 Probability of symbol in string Li: length of codeword of symbol I in bits

"The fundamental problem of communication is that of reproducing at one point, either exactly or approximately, a message selected at another point." -Shannon, 1944

Measure of Information Information content of symbol si (in bits) –log2p(si) Examples p(si) = 1 has no information smaller p(si) has more information, as it was unexpected or surprising

Entropy Weigh information content of each source symbol by its probability of occurrence: value is called Entropy (H) Produces lower bound on number of bits needed to represent the information with code words

Entropy Example Alphabet = {A, B} Compute Entropy (H) p(A) = 0.4; p(B) = 0.6 Compute Entropy (H) -0.4*log2 0.4 + -0.6*log2 0.6 = .97 bits Maximum uncertainty (gives largest H) occurs when all probabilities are equal

Entropy definitions Shannon entropy Binary entropy formula Differential entropy

Properties of entropy Can be defined as the expectation of log p(x) (ie H(X) = E[-log p(x)]) Is not a function of a variable’s values, is a function of the variable’s probabilities Usually measured in “bits” (using logs of base 2) or “nats” (using logs of base e) Maximized when all values are equally likely (ie uniform distribution) Equal to 0 when only one value is possible

Joint and conditional entropy Joint entropy is the entropy of the pairing (X,Y) Conditional entropy is the entropy of X if the value of Y was known Relationship between the two

Mutual information Mutual information is how much information about X can be obtained by observing Y

Mathematical model of a channel Assume that our input to the channel is X, and the output is Y Then the characteristics of the channel can be defined by its conditional probability distribution p(y|x)

Channel capacity and rate Channel capacity is defined as the maximum possible value of the mutual information We choose the best f(x) to maximize C For any rate R < C, we can transmit information with arbitrarily small probability of error

Binary symmetric channel Correct bit transmitted with probability 1-p Wrong bit transmitted with probability p Sometimes called “cross-over probability” Capacity C = 1 - H(p,1-p)

Binary erasure channel Correct bit transmitted with probability 1-p “Erasure” transmitted with probability p Capacity C = 1 - p

Coding theory Information theory only gives us an upper bound on communication rate Need to use coding theory to find a practical method to achieve a high rate 2 types Source coding - Compress source data to a smaller size Channel coding - Adds redundancy bits to make transmission across noisy channel more robust

Source-channel separation theorem Shannon showed that when dealing with one transmitter and one receiver, we can break up source coding and channel coding into separate steps without loss of optimality Does not apply when there are multiple transmitters and/or receivers Need to use network information theory principles in those cases

Coding Intro Assume alphabet K of {A, B, C, D, E, F, G, H} In general, if we want to distinguish n different symbols, we will need to use, log2n bits per symbol, i.e. 3. Can code alphabet K as: A 000 B 001 C 010 D 011 E 100 F 101 G 110 H 111

Coding Intro “BACADAEAFABBAAAGAH” is encoded as the string of 54 bits 001000010000011000100000101000001001000000000110000111 (fixed length code)

Coding Intro With this coding: A 0 B 100 C 1010 D 1011 E 1100 F 1101 G 1110 H 1111 100010100101101100011010100100000111001111 42 bits, saves more than 20% in space

Huffman Tree A (8), B (3), C(1), D(1), E(1), F(1), G(1), H(1)

Huffman Encoding Use probability distribution to determine how many bits to use for each symbol higher-frequency assigned shorter codes entropy-based, block-variable coding scheme

Huffman Encoding Produces a code which uses a minimum number of bits to represent each symbol cannot represent same sequence using fewer real bits per symbol when using code words optimal when using code words, but this may differ slightly from the theoretical lower limit lossless Build Huffman tree to assign codes

Informal Problem Description Given a set of symbols from an alphabet and their probability distribution assumes distribution is known and stable Find a prefix free binary code with minimum weighted path length prefix free means no codeword is a prefix of any other codeword

Huffman Algorithm Construct a binary tree of codes leaf nodes represent symbols to encode interior nodes represent cumulative probability edges assigned 0 or 1 output code Construct the tree bottom-up connect the two nodes with the lowest probability until no more nodes to connect

Huffman Example Construct the Huffman coding tree (in class) Symbol (S) P (S) A 0.25 B 0.30 C 0.12 D 0.15 E 0.18

Characteristics of Solution Lowest probability symbol is always furthest from root Assignment of 0/1 to children edges arbitrary other solutions possible; lengths remain the same If two nodes have equal probability, can select any two Notes prefix free code O(nlgn) complexity Symbol (S) Code A 11 B 00 C 010 D 011 E 10

Example Encoding/Decoding Encode “BEAD” 001011011 Decode “0101100” Symbol (S) Code A 11 B 00 C 010 D 011 E 10

Entropy (Theoretical Limit) = -.25 * log2 .25 + -.30 * log2 .30 + -.12 * log2 .12 + -.15 * log2 .15 + -.18 * log2 .18 H = 2.24 bits Symbol P (S) Code A 0.25 11 B 0.30 00 C 0.12 010 D 0.15 011 E 0.18 10

Average Codeword Length = .25(2) + .30(2) + .12(3) + .15(3) + .18(2) L = 2.27 bits Symbol P (S) Code A 0.25 11 B 0.30 00 C 0.12 010 D 0.15 011 E 0.18 10

Code Length Relative to Entropy Huffman reaches entropy limit when all probabilities are negative powers of 2 i.e., 1/2; 1/4; 1/8; 1/16; etc. H <= Code Length <= H + 1

Example H = -.01*log2.01 + -.99*log2.99 = .08 L = .01(1) + .99(1) = 1 Symbol P (S) Code A 0.01 1 B 0.99

Exercise Compute Entropy (H) Build Huffman tree Compute average code length Code “BCCADE” Symbol (S) P (S) A 0.1 B 0.2 C 0.4 D E

Solution Compute Entropy (H) Build Huffman tree Compute code length H = 2.1 bits Build Huffman tree Compute code length L = 2.2 bits Code “BCCADE” => 10000111101110 Symbol P(S) Code A 0.1 111 B 0.2 100 C 0.4 D 101 E 110

Limitations Diverges from lower limit when probability of a particular symbol becomes high always uses an integral number of bits Must send code book with the data lowers overall efficiency Must determine frequency distribution must remain stable over the data set

Error detection and correction Error detection is the ability to detect errors that are made due to noise or other impairments during transmission from the transmitter to the receiver. Error correction has the additional feature that enables localization of the errors and correcting them. Error detection always precedes error correction. (more next week)