Lecture 2: Basic Information Theory Thinh Nguyen Oregon State University.

Slides:



Advertisements
Similar presentations
DCSP-8: Minimal length coding I Jianfeng Feng Department of Computer Science Warwick Univ., UK
Advertisements

Lecture 2: Basic Information Theory TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency (FOI)
Information Theory EE322 Al-Sanie.
AP STATISTICS Simulating Experiments. Steps for simulation Simulation: The imitation of chance behavior, based on a model that accurately reflects the.
Random Variable A random variable X is a function that assign a real number, X(ζ), to each outcome ζ in the sample space of a random experiment. Domain.
Lecture 10 – Introduction to Probability Topics Events, sample space, random variables Examples Probability distribution function Conditional probabilities.
Position-specific scoring matrices Decrease complexity through info analysis Training set including sequences from two Nostocs 71-devB CATTACTCCTTCAATCCCTCGCCCCTCATTTGTACAGTCTGTTACCTTTACCTGAAACAGATGAATGTAGAATTTA.
Chapter 6 Information Theory
Dependent and Independent Events. If you have events that occur together or in a row, they are considered to be compound events (involve two or more separate.
. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.
Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. 15 Chances, Probabilities, and Odds 15.1Random Experiments and Sample.
Probability And Expected Value ————————————
Chapter 8 Loops while loop syntax while ;... end;.
1 Copyright M.R.K. Krishna Rao 2003 Chapter 5. Discrete Probability Everything you have learned about counting constitutes the basis for computing the.
Information Theory and Security. Lecture Motivation Up to this point we have seen: –Classical Crypto –Symmetric Crypto –Asymmetric Crypto These systems.
1 Chapter 5 A Measure of Information. 2 Outline 5.1 Axioms for the uncertainty measure 5.2 Two Interpretations of the uncertainty function 5.3 Properties.
Shannon ’ s theory part II Ref. Cryptography: theory and practice Douglas R. Stinson.
1 Chapter 1 Introduction. 2 Outline 1.1 A Very Abstract Summary 1.2 History 1.3 Model of the Signaling System 1.4 Information Source 1.5 Encoding a Source.
CryptographyPerfect secrecySlide 1 Today What does it mean for a cipher to be: –Computational secure? Unconditionally secure? Perfect secrecy –Conditional.
Information Theory and Security
Review of Probability and Binomial Distributions
Bioinformatics lectures at Rice University Lecture 4: Shannon entropy and mutual information.
Probability Distributions: Finite Random Variables.
Some basic concepts of Information Theory and Entropy
Introduction to AEP In information theory, the asymptotic equipartition property (AEP) is the analog of the law of large numbers. This law states that.
§1 Entropy and mutual information
STATISTIC & INFORMATION THEORY (CSNB134)
Information Theory & Coding…
INFORMATION THEORY BYK.SWARAJA ASSOCIATE PROFESSOR MREC.
All of Statistics Chapter 5: Convergence of Random Variables Nick Schafer.
Simple Mathematical Facts for Lecture 1. Conditional Probabilities Given an event has occurred, the conditional probability that another event occurs.
Channel Capacity.
Prepared by: Amit Degada Teaching Assistant, ECED, NIT Surat
1.3 Simulations and Experimental Probability (Textbook Section 4.1)
JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Essential Information Theory I AI-lab
DCSP-8: Minimal length coding I Jianfeng Feng Department of Computer Science Warwick Univ., UK
Summer 2004CS 4953 The Hidden Art of Steganography A Brief Introduction to Information Theory  Information theory is a branch of science that deals with.
1 Information Theory Nathanael Paul Oct. 09, 2002.
Outline Transmitters (Chapters 3 and 4, Source Coding and Modulation) (week 1 and 2) Receivers (Chapter 5) (week 3 and 4) Received Signal Synchronization.
Prepared by: Engr. Jo-Ann C. Viñas 1 MODULE 2 ENTROPY.
Slide 14-1 Copyright © 2004 Pearson Education, Inc.
Copyright © 2010 Pearson Education, Inc. Unit 4 Chapter 14 From Randomness to Probability.
Basic Principles (continuation) 1. A Quantitative Measure of Information As we already have realized, when a statistical experiment has n eqiuprobable.
Sixth lecture Concepts of Probabilities. Random Experiment Can be repeated (theoretically) an infinite number of times Has a well-defined set of possible.
Natural Language Processing Giuseppe Attardi Introduction to Probability IP notice: some slides from: Dan Jurafsky, Jim Martin, Sandiway Fong, Dan Klein.
Conditional Probability Mass Function. Introduction P[A|B] is the probability of an event A, giving that we know that some other event B has occurred.
Expected values of discrete Random Variables. The function that maps S into S X in R and which is denoted by X(.) is called a random variable. The name.
Information Theory and Games (Ch. 16). Information Theory Information theory studies information flow Information has no meaning –As opposed to daily.
Entropy (YAC- Ch. 6)  Introduce the thermodynamic property called Entropy (S)  Entropy is defined using the Clausius inequality  Introduce the Increase.
Presented by Minkoo Seo March, 2006
The Channel and Mutual Information
Basic Concepts of Information Theory A measure of uncertainty. Entropy. 1.
Channel Coding Theorem (The most famous in IT) Channel Capacity; Problem: finding the maximum number of distinguishable signals for n uses of a communication.
Huffman Coding (2 nd Method). Huffman coding (2 nd Method)  The Huffman code is a source code. Here word length of the code word approaches the fundamental.
3/7/20161 Now it’s time to look at… Discrete Probability.
SEAC-3 J.Teuhola Information-Theoretic Foundations Founder: Claude Shannon, 1940’s Gives bounds for:  Ultimate data compression  Ultimate transmission.
Probability Distribution. Probability Distributions: Overview To understand probability distributions, it is important to understand variables and random.
(C) 2000, The University of Michigan 1 Language and Information Handout #2 September 21, 2000.
CHAPTER 2 RANDOM VARIABLES.
Conditional Probability
COT 5611 Operating Systems Design Principles Spring 2012
COT 5611 Operating Systems Design Principles Spring 2014
A Brief Introduction to Information Theory
Probability And Expected Value ————————————
Probability, Part I.
Probability And Expected Value ————————————
Entropy CSCI284/162 Spring 2009 GWU.
Lecture 2: Basic Information Theory
Lecture 9 Entropy (Section 3.1, a bit of 3.2)
Lecture 12 More Probability (Section 0.2)
Presentation transcript:

Lecture 2: Basic Information Theory Thinh Nguyen Oregon State University

What is information?  Can we measure information?  Consider the two following sentences: 1. There is a traffic jam on I5 2. There is a traffic jam on I5 near Exit 234 Sentence 2 seems to have more information than that of sentence 1. From the semantic viewpoint, sentence 2 provides more useful information.

What is information?  It is hard to measure the “semantic” information!  Consider the following two sentences 1. There is a traffic jam on I5 near Exit There is a traffic jam on I5 near Exit 234 It’s not clear whether sentence 1 or 2 would have more information!

What is information?  Let’s attempt at a different definition of information. How about counting the number of letters in the two sentences: 1. There is a traffic jam on I5 (22 letters) 2. There is a traffic jam on I5 near Exit 234 (33 letters) Definitely something we can measure and compare!

What is information?  First attempt to quantify information by Hartley (1928). Every symbol of the message has a choice of possibilities. A message of length, therefore can have distinguishable possibilities. Information measure is then the logarithm of Intuitively, this definition makes sense: one symbol (letter) has the information of then a sentence of length should have times more information, i.e. It’s interesting to know that log is the only function that satisfies

How about we measure information as the number of Yes/No questions one has to ask to get the correct answer to a simple game below How many questions? 2 4 Randomness due to uncerntainty of where the circle is!

Shannon’s Information Theory Claude Shannon Claude Shannon: A Mathematical Theory of Communication The Bell System Technical Journal, 1948 Where there are symbols 1, 2, …, each with probability of occurrence of  Shannon’s measure of information is the number of bits to represent the amount of uncertainty (randomness) in a data source, and is defined as entropy

Shannon’s Entropy  Consider the following string consisting of symbols a and b: abaabaababbbaabbabab… …. On average, there are equal number of a and b. The string can be considered as an output of a below source with equal probability of outputting symbol a or b: source 0.5 a b We want to characterize the average information generated by the source!

Intuition on Shannon’s Entropy Suppose you have a long random string of two binary symbols 0 and 1, and the probability of symbols 1 and 0 are and Ex: …. If any string is long enough say, it is likely to contain 0’s and 1’s. The probability of this string pattern occurs is equal to Why Hence, # of possible patterns is # bits to represent all possible patterns is The average # of bits to represent the symbol is therefore

More Intuition on Entropy  Assume a binary memoryless source, e.g., a flip of a coin. How much information do we receive when we are told that the outcome is heads? If it’s a fair coin, i.e., P (heads) = P (tails) = 0.5, we say that the amount of information is 1 bit. If we already know that it will be (or was) heads, i.e., P (heads) = 1, the amount of information is zero! If the coin is not fair, e.g., P (heads) = 0.9, the amount of information is more than zero but less than one bit! Intuitively, the amount of information received is the same if P (heads) = 0.9 or P (heads) = 0.1.

Self Information  So, let’s look at it the way Shannon did.  Assume a memoryless source with alphabet A = (a 1, …, a n ) symbol probabilities (p 1, …, p n ).  How much information do we get when finding out that the next symbol is a i ?  According to Shannon the self information of a i is

Why? Assume two independent events A and B, with probabilities P(A) = p A and P(B) = p B. For both the events to happen, the probability is p A ¢ p B. However, the amount of information should be added, not multiplied. Logarithms satisfy this! No, we want the information to increase with decreasing probabilities, so let’s use the negative logarithm.

Self Information Example 1: Example 2: Which logarithm? Pick the one you like! If you pick the natural log, you’ll measure in nats, if you pick the 10-log, you’ll get Hartleys, if you pick the 2-log (like everyone else), you’ll get bits.

Self Information H(X) is called the first order entropy of the source. This can be regarded as the degree of uncertainty about the following symbol. On average over all the symbols, we get:

Entropy Example: Example: Binary Memoryless Source BMS … The uncertainty (information) is greatest when Often denoted Then Let

Example P = {0.5, 0.25, 0.25} Three symbols a, b, c with corresponding probabilities: What is H(P)? Q = {0.48, 0.32, 0.20} Three weather conditions in Corvallis: Rain, sunny, cloudy with corresponding probabilities: What is H(Q)?

Entropy: Three properties 1. It can be shown that 0 · H · log N. 2. Maximum entropy ( H = log N ) is reached when all symbols are equiprobable, i.e., p i = 1/N. 3. The difference log N – H is called the redundancy of the source.