Today: Entropy Information Theory. Claude Shannon Ph.D. 1916-2001.

Today: Entropy Information Theory

Claude Shannon Ph.D. 1916-2001

Entropy

A measure of the disorder in a system

Entropy The (average) number of yes/no questions needed to completely specify the state of a system

What if there were two coins?

2 states.1 question. 4 states.2 questions. 8 states.3 questions. 16 states.4 questions. number of states = 2 number of yes-no questions

number of states = 2 number of yes-no questions log 2 (number of states) = number of yes-no questions

H is entropy, the number of yes-no questions required to specify the state of the system n is the number of states of the system, assumed (for now) to be equally likely

Consider Dice

The Six Sided Die H = log 2 (6) = 2.585 bits

The Four Sided Die H = log 2 (4) = 2.000 bits

The Twenty Sided Die H = log 2 (20) = 4.322 bits

What about all three dice? H = log 2 (4  6  20)

What about all three dice? H = log 2 (4)+log 2 (6)+log 2 (20)

What about all three dice? H = 8.907 bits

What about all three dice? Entropy, from independent elements of a system, adds

Let’s the rewrite this a bit... Trivial Fact 1: log 2 (x) = - log 2 (1/x)

Trivial Fact 1: log 2 (x) = - log 2 (1/x) Trivial Fact 2: if there are n equally likely possibilites p = (1/n)

Trivial Fact 2: if there are n equally likely possibilites p = (1/n)

What if the n states are not equally probable? Maybe we should use the expected value of the entropies, a weighted average by probability

Let’s do a simple example: n = 2, how does H change as we vary p 1 and p 2 ?

n = 2 p 1 + p 2 = 1

how about n = 3 n = 3 p 1 + p 2 + p 3 = 1

The bottom line intuitions for Entropy: Entropy is a statistic for describing a probability distribution. Probabilities distributions which are flat, broad, sparse, etc. have HIGH entropy. Probability distributions which are peaked, sharp, narrow, compact etc. have LOW entropy. Entropy adds for independent elements of a system, thus entropy grows with the dimensionality of the probability distribution. Entropy is zero IFF the system is in a definite state, i.e. p = 1 somewhere and 0 everywhere else.

Pop Quiz: 1. 2. 3.4.

Entropy The (average) number of yes/no questions needed to completely specify the state of a system

11:16 am (Pacific) on June 29th of the year 2001, there were approximately 816,119 words in the English Language H(english) = 19.6 bits Twenty Questions: 2 20 = 1,048,576 What’s a winning 20 Questions Strategy?

So, what is information? It’s a change in what you don’t know. It’s a change in the entropy.

xy Information as a measure of correlation

heads tails probability 0 1 1/21/2 P(Y) I (X;Y) = H(Y) - H(Y|X) = 0 bits heads tails probability 0 1 1/21/2 P(Y|x=heads ) H(Y|x=heads) = 1H(Y) = 1

xy Information as a measure of correlation

heads tails probability 0 1 1/21/2 P(Y) I (X;Y) = H(Y) - H(Y|X) ~ 1 bit heads tails probability 0 1 1/21/2 P(Y|x=heads ) H(Y|x=heads) ~ 0H(Y) = 1

xy Information Theory in Neuroscience

The Critical Observation: Information is Mutual I(X;Y) = I(Y;X) H(Y)-H(Y|X) = H(X)-H(X|Y)

The Critical Observation: What a spike tells the Brain about the stimulus, is the same as what our stimulus choice tells us about the likelihood of a spike. I(Stimulus;Spike) = I(Spike;Stimulus)

The Critical Observation: What our stimulus choice tells us about the likelihood of a spike. stimulusresponse This, we can measure....

Show your system stimuli. Measure neural responses. P( neural response | stimulus presented ) Estimate: P( neural repsones ) From that, Estimate: Compute: H(neural response) and H(neural response | stimulus presented) Calculate: I(response ; stimulus) How to use Information Theory:

Choose stimuli which are not representative. Measure the “wrong” aspect of the response. Don’t take enough data to estimate P( ) well. Use a crappy method of computing H( ). Calculate I( ) and report it without comparing it to anything... How to screw it up:

Here’s an example of Information Theory applied appropriately Temporal Coding of Visual Information in the Thalamus Pamela Reinagel and R. Clay Reid J. Neurosci. 20(14):5392-5400. (2000)

LGN responses are very reliable. Is there information in the temporal pattern of spikes?

xy Patterns of Spikes in the LGN …….0…….. …….1…….. spikes

xy Patterns of Spikes in the LGN …….00…….. …….10…….. …….01…….. …….11…….. spikes

P( spike pattern)

P( spike pattern | stimulus )

There is some extra Information in Temporal Patterns of spikes.

Claude Shannon Ph.D. 1916-2001

Prof. Tom Cover EE376A & B

Today: Entropy Information Theory. Claude Shannon Ph.D. 1916-2001.

Similar presentations

Presentation on theme: "Today: Entropy Information Theory. Claude Shannon Ph.D. 1916-2001."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Today: Entropy Information Theory. Claude Shannon Ph.D. 1916-2001.

Similar presentations

Presentation on theme: "Today: Entropy Information Theory. Claude Shannon Ph.D. 1916-2001."— Presentation transcript:

Similar presentations

About project

Feedback