Presentation is loading. Please wait.

Presentation is loading. Please wait.

Today: Entropy Information Theory. Claude Shannon Ph.D. 1916-2001.

Similar presentations


Presentation on theme: "Today: Entropy Information Theory. Claude Shannon Ph.D. 1916-2001."— Presentation transcript:

1 Today: Entropy Information Theory

2

3 Claude Shannon Ph.D. 1916-2001

4

5

6 Entropy

7 A measure of the disorder in a system

8 Entropy The (average) number of yes/no questions needed to completely specify the state of a system

9

10 What if there were two coins?

11

12

13

14 2 states.1 question. 4 states.2 questions. 8 states.3 questions. 16 states.4 questions. number of states = 2 number of yes-no questions

15 number of states = 2 number of yes-no questions log 2 (number of states) = number of yes-no questions

16 H is entropy, the number of yes-no questions required to specify the state of the system n is the number of states of the system, assumed (for now) to be equally likely

17

18 Consider Dice

19 The Six Sided Die H = log 2 (6) = 2.585 bits

20 The Four Sided Die H = log 2 (4) = 2.000 bits

21 The Twenty Sided Die H = log 2 (20) = 4.322 bits

22 What about all three dice? H = log 2 (4  6  20)

23 What about all three dice? H = log 2 (4)+log 2 (6)+log 2 (20)

24 What about all three dice? H = 8.907 bits

25 What about all three dice? Entropy, from independent elements of a system, adds

26 Let’s the rewrite this a bit... Trivial Fact 1: log 2 (x) = - log 2 (1/x)

27 Trivial Fact 1: log 2 (x) = - log 2 (1/x) Trivial Fact 2: if there are n equally likely possibilites p = (1/n)

28 Trivial Fact 2: if there are n equally likely possibilites p = (1/n)

29

30 What if the n states are not equally probable? Maybe we should use the expected value of the entropies, a weighted average by probability

31 Let’s do a simple example: n = 2, how does H change as we vary p 1 and p 2 ?

32 n = 2 p 1 + p 2 = 1

33 how about n = 3 n = 3 p 1 + p 2 + p 3 = 1

34 The bottom line intuitions for Entropy: Entropy is a statistic for describing a probability distribution. Probabilities distributions which are flat, broad, sparse, etc. have HIGH entropy. Probability distributions which are peaked, sharp, narrow, compact etc. have LOW entropy. Entropy adds for independent elements of a system, thus entropy grows with the dimensionality of the probability distribution. Entropy is zero IFF the system is in a definite state, i.e. p = 1 somewhere and 0 everywhere else.

35 Pop Quiz: 1. 2. 3.4.

36 Entropy The (average) number of yes/no questions needed to completely specify the state of a system

37 11:16 am (Pacific) on June 29th of the year 2001, there were approximately 816,119 words in the English Language H(english) = 19.6 bits Twenty Questions: 2 20 = 1,048,576 What’s a winning 20 Questions Strategy?

38

39 So, what is information? It’s a change in what you don’t know. It’s a change in the entropy.

40 xy Information as a measure of correlation

41 xy

42 heads tails probability 0 1 1/21/2 P(Y) I (X;Y) = H(Y) - H(Y|X) = 0 bits heads tails probability 0 1 1/21/2 P(Y|x=heads ) H(Y|x=heads) = 1H(Y) = 1

43 xy Information as a measure of correlation

44 xy

45 heads tails probability 0 1 1/21/2 P(Y) I (X;Y) = H(Y) - H(Y|X) ~ 1 bit heads tails probability 0 1 1/21/2 P(Y|x=heads ) H(Y|x=heads) ~ 0H(Y) = 1

46 xy Information Theory in Neuroscience

47 The Critical Observation: Information is Mutual I(X;Y) = I(Y;X) H(Y)-H(Y|X) = H(X)-H(X|Y)

48 The Critical Observation: What a spike tells the Brain about the stimulus, is the same as what our stimulus choice tells us about the likelihood of a spike. I(Stimulus;Spike) = I(Spike;Stimulus)

49 The Critical Observation: What our stimulus choice tells us about the likelihood of a spike. stimulusresponse This, we can measure....

50 Show your system stimuli. Measure neural responses. P( neural response | stimulus presented ) Estimate: P( neural repsones ) From that, Estimate: Compute: H(neural response) and H(neural response | stimulus presented) Calculate: I(response ; stimulus) How to use Information Theory:

51 Choose stimuli which are not representative. Measure the “wrong” aspect of the response. Don’t take enough data to estimate P( ) well. Use a crappy method of computing H( ). Calculate I( ) and report it without comparing it to anything... How to screw it up:

52 Here’s an example of Information Theory applied appropriately Temporal Coding of Visual Information in the Thalamus Pamela Reinagel and R. Clay Reid J. Neurosci. 20(14):5392-5400. (2000)

53 LGN responses are very reliable. Is there information in the temporal pattern of spikes?

54 xy Patterns of Spikes in the LGN …….0…….. …….1…….. spikes

55 xy Patterns of Spikes in the LGN …….00…….. …….10…….. …….01…….. …….11…….. spikes

56 xy Patterns of Spikes in the LGN …….000…….. …….101…….. …….011…….. …….100…….. spikes

57 xy Patterns of Spikes in the LGN …….000100…….. …….101101…….. …….011110…….. …….010001…….. spikes

58 P( spike pattern)

59 P( spike pattern | stimulus )

60 There is some extra Information in Temporal Patterns of spikes.

61 Claude Shannon Ph.D. 1916-2001

62 Prof. Tom Cover EE376A & B

63


Download ppt "Today: Entropy Information Theory. Claude Shannon Ph.D. 1916-2001."

Similar presentations


Ads by Google