Today: Entropy Information Theory
Claude Shannon Ph.D
Entropy
A measure of the disorder in a system
Entropy The (average) number of yes/no questions needed to completely specify the state of a system
What if there were two coins?
2 states.1 question. 4 states.2 questions. 8 states.3 questions. 16 states.4 questions. number of states = 2 number of yes-no questions
number of states = 2 number of yes-no questions log 2 (number of states) = number of yes-no questions
H is entropy, the number of yes-no questions required to specify the state of the system n is the number of states of the system, assumed (for now) to be equally likely
Consider Dice
The Six Sided Die H = log 2 (6) = bits
The Four Sided Die H = log 2 (4) = bits
The Twenty Sided Die H = log 2 (20) = bits
What about all three dice? H = log 2 (4 6 20)
What about all three dice? H = log 2 (4)+log 2 (6)+log 2 (20)
What about all three dice? H = bits
What about all three dice? Entropy, from independent elements of a system, adds
Let’s the rewrite this a bit... Trivial Fact 1: log 2 (x) = - log 2 (1/x)
Trivial Fact 1: log 2 (x) = - log 2 (1/x) Trivial Fact 2: if there are n equally likely possibilites p = (1/n)
Trivial Fact 2: if there are n equally likely possibilites p = (1/n)
What if the n states are not equally probable? Maybe we should use the expected value of the entropies, a weighted average by probability
Let’s do a simple example: n = 2, how does H change as we vary p 1 and p 2 ?
n = 2 p 1 + p 2 = 1
how about n = 3 n = 3 p 1 + p 2 + p 3 = 1
The bottom line intuitions for Entropy: Entropy is a statistic for describing a probability distribution. Probabilities distributions which are flat, broad, sparse, etc. have HIGH entropy. Probability distributions which are peaked, sharp, narrow, compact etc. have LOW entropy. Entropy adds for independent elements of a system, thus entropy grows with the dimensionality of the probability distribution. Entropy is zero IFF the system is in a definite state, i.e. p = 1 somewhere and 0 everywhere else.
Pop Quiz:
Entropy The (average) number of yes/no questions needed to completely specify the state of a system
11:16 am (Pacific) on June 29th of the year 2001, there were approximately 816,119 words in the English Language H(english) = 19.6 bits Twenty Questions: 2 20 = 1,048,576 What’s a winning 20 Questions Strategy?
So, what is information? It’s a change in what you don’t know. It’s a change in the entropy.
xy Information as a measure of correlation
xy
heads tails probability 0 1 1/21/2 P(Y) I (X;Y) = H(Y) - H(Y|X) = 0 bits heads tails probability 0 1 1/21/2 P(Y|x=heads ) H(Y|x=heads) = 1H(Y) = 1
xy Information as a measure of correlation
xy
heads tails probability 0 1 1/21/2 P(Y) I (X;Y) = H(Y) - H(Y|X) ~ 1 bit heads tails probability 0 1 1/21/2 P(Y|x=heads ) H(Y|x=heads) ~ 0H(Y) = 1
xy Information Theory in Neuroscience
The Critical Observation: Information is Mutual I(X;Y) = I(Y;X) H(Y)-H(Y|X) = H(X)-H(X|Y)
The Critical Observation: What a spike tells the Brain about the stimulus, is the same as what our stimulus choice tells us about the likelihood of a spike. I(Stimulus;Spike) = I(Spike;Stimulus)
The Critical Observation: What our stimulus choice tells us about the likelihood of a spike. stimulusresponse This, we can measure....
Show your system stimuli. Measure neural responses. P( neural response | stimulus presented ) Estimate: P( neural repsones ) From that, Estimate: Compute: H(neural response) and H(neural response | stimulus presented) Calculate: I(response ; stimulus) How to use Information Theory:
Choose stimuli which are not representative. Measure the “wrong” aspect of the response. Don’t take enough data to estimate P( ) well. Use a crappy method of computing H( ). Calculate I( ) and report it without comparing it to anything... How to screw it up:
Here’s an example of Information Theory applied appropriately Temporal Coding of Visual Information in the Thalamus Pamela Reinagel and R. Clay Reid J. Neurosci. 20(14): (2000)
LGN responses are very reliable. Is there information in the temporal pattern of spikes?
xy Patterns of Spikes in the LGN …….0…….. …….1…….. spikes
xy Patterns of Spikes in the LGN …….00…….. …….10…….. …….01…….. …….11…….. spikes
xy Patterns of Spikes in the LGN …….000…….. …….101…….. …….011…….. …….100…….. spikes
xy Patterns of Spikes in the LGN …… …….. …… …….. …… …….. …… …….. spikes
P( spike pattern)
P( spike pattern | stimulus )
There is some extra Information in Temporal Patterns of spikes.
Claude Shannon Ph.D
Prof. Tom Cover EE376A & B