Download presentation
Presentation is loading. Please wait.
1
Today: Entropy Information Theory
3
Claude Shannon Ph.D. 1916-2001
6
Entropy
7
A measure of the disorder in a system
8
Entropy The (average) number of yes/no questions needed to completely specify the state of a system
10
What if there were two coins?
14
2 states.1 question. 4 states.2 questions. 8 states.3 questions. 16 states.4 questions. number of states = 2 number of yes-no questions
15
number of states = 2 number of yes-no questions log 2 (number of states) = number of yes-no questions
16
H is entropy, the number of yes-no questions required to specify the state of the system n is the number of states of the system, assumed (for now) to be equally likely
18
Consider Dice
19
The Six Sided Die H = log 2 (6) = 2.585 bits
20
The Four Sided Die H = log 2 (4) = 2.000 bits
21
The Twenty Sided Die H = log 2 (20) = 4.322 bits
22
What about all three dice? H = log 2 (4 6 20)
23
What about all three dice? H = log 2 (4)+log 2 (6)+log 2 (20)
24
What about all three dice? H = 8.907 bits
25
What about all three dice? Entropy, from independent elements of a system, adds
26
Let’s the rewrite this a bit... Trivial Fact 1: log 2 (x) = - log 2 (1/x)
27
Trivial Fact 1: log 2 (x) = - log 2 (1/x) Trivial Fact 2: if there are n equally likely possibilites p = (1/n)
28
Trivial Fact 2: if there are n equally likely possibilites p = (1/n)
30
What if the n states are not equally probable? Maybe we should use the expected value of the entropies, a weighted average by probability
31
Let’s do a simple example: n = 2, how does H change as we vary p 1 and p 2 ?
32
n = 2 p 1 + p 2 = 1
33
how about n = 3 n = 3 p 1 + p 2 + p 3 = 1
34
The bottom line intuitions for Entropy: Entropy is a statistic for describing a probability distribution. Probabilities distributions which are flat, broad, sparse, etc. have HIGH entropy. Probability distributions which are peaked, sharp, narrow, compact etc. have LOW entropy. Entropy adds for independent elements of a system, thus entropy grows with the dimensionality of the probability distribution. Entropy is zero IFF the system is in a definite state, i.e. p = 1 somewhere and 0 everywhere else.
35
Pop Quiz: 1. 2. 3.4.
36
Entropy The (average) number of yes/no questions needed to completely specify the state of a system
37
11:16 am (Pacific) on June 29th of the year 2001, there were approximately 816,119 words in the English Language H(english) = 19.6 bits Twenty Questions: 2 20 = 1,048,576 What’s a winning 20 Questions Strategy?
39
So, what is information? It’s a change in what you don’t know. It’s a change in the entropy.
40
xy Information as a measure of correlation
41
xy
42
heads tails probability 0 1 1/21/2 P(Y) I (X;Y) = H(Y) - H(Y|X) = 0 bits heads tails probability 0 1 1/21/2 P(Y|x=heads ) H(Y|x=heads) = 1H(Y) = 1
43
xy Information as a measure of correlation
44
xy
45
heads tails probability 0 1 1/21/2 P(Y) I (X;Y) = H(Y) - H(Y|X) ~ 1 bit heads tails probability 0 1 1/21/2 P(Y|x=heads ) H(Y|x=heads) ~ 0H(Y) = 1
46
xy Information Theory in Neuroscience
47
The Critical Observation: Information is Mutual I(X;Y) = I(Y;X) H(Y)-H(Y|X) = H(X)-H(X|Y)
48
The Critical Observation: What a spike tells the Brain about the stimulus, is the same as what our stimulus choice tells us about the likelihood of a spike. I(Stimulus;Spike) = I(Spike;Stimulus)
49
The Critical Observation: What our stimulus choice tells us about the likelihood of a spike. stimulusresponse This, we can measure....
50
Show your system stimuli. Measure neural responses. P( neural response | stimulus presented ) Estimate: P( neural repsones ) From that, Estimate: Compute: H(neural response) and H(neural response | stimulus presented) Calculate: I(response ; stimulus) How to use Information Theory:
51
Choose stimuli which are not representative. Measure the “wrong” aspect of the response. Don’t take enough data to estimate P( ) well. Use a crappy method of computing H( ). Calculate I( ) and report it without comparing it to anything... How to screw it up:
52
Here’s an example of Information Theory applied appropriately Temporal Coding of Visual Information in the Thalamus Pamela Reinagel and R. Clay Reid J. Neurosci. 20(14):5392-5400. (2000)
53
LGN responses are very reliable. Is there information in the temporal pattern of spikes?
54
xy Patterns of Spikes in the LGN …….0…….. …….1…….. spikes
55
xy Patterns of Spikes in the LGN …….00…….. …….10…….. …….01…….. …….11…….. spikes
56
xy Patterns of Spikes in the LGN …….000…….. …….101…….. …….011…….. …….100…….. spikes
57
xy Patterns of Spikes in the LGN …….000100…….. …….101101…….. …….011110…….. …….010001…….. spikes
58
P( spike pattern)
59
P( spike pattern | stimulus )
60
There is some extra Information in Temporal Patterns of spikes.
61
Claude Shannon Ph.D. 1916-2001
62
Prof. Tom Cover EE376A & B
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.