Download presentation
Presentation is loading. Please wait.
Published byFay Stewart Modified over 9 years ago
1
Recognition stimulus input Observer (information transmission channel) response Response: which category the stimulus belongs to ? What is the “information value” of recognizing the category?
2
Information area reduced to 63/64 area reduced to 1/64 NOT HERE NOT HERE area reduced to 1/2
3
The amount of information gained by receiving the signal is proportional to ratio of these two areas Prior information (possible space of signals) Posterior (possible space after the signal is received) The less likely the outcome, the more information is gained! The information in a symbol s should be inversely proportional to the probability of the symbol p.
4
Also a juggling machine, rocket-powered Frisbees, motorized Pogo sticks, a device that could solve the Rubik's Cube puzzle,….. Basics of Information Theory Claude Elwood Shannon (1916-2001) Observe output message Try to make up the input message (gain new information)
5
Measuring the information Multiplication turns to addition Is always positive (since p<1) information in an event
7
1 bit of information reduces the area of possible messages to half When log 2, then entropy is in bits Information gained when deciding among N (equally likely) alternatives Number of stimulus alternatives N Number of Bits (log 2 N) 2 1 =21 2 2 = 42 83 164 325 646 1287 2 8 = 2568
8
experiments with two possible outcomes with probabilities p 1 and p 2 total probability must be 1, so p 2 =1- p 1 H=-p 1 log 2 p 1 – (1– p 1 ) log 2 (1-p 1 ) i.e. H=0 for p 1 =0 (the second outcome certain) or p 1 =1 (the first outcome certain) for p 1 = 0.5, p 2 =0.5 H=-0.5 log 2 0.5 - 0.5 log 2 0.5 = log 2 0.5 = 1 Entropy H (information) is maximum when the outcome is the least predictable !
9
1 st or 2 nd half ? Equal prior probability of each category. need 3 binary numbers (3 bits) to describe 2 3 = 8 categories need more bits when dealing with symbols that are not all equally likely 5 bits
10
The Bar Code
11
With no noise in the channel, p(x i |y i )=1 and p(x i,y j ) = 0 p(x) p(y|x) p(y) 1 1 0 0 p(x 1 )p(y 1 )=p(x 1 ) p(x 2 )p(y 2 )=p(x 2 ) With noise, p(x i |y i ) 0 5/8 3/4 1/4 3/8 0.80.55 0.20.45 p(y 1 )=(5/8x0.8)+(1/4x0.2)=0.55 p(y 2 )=(3/8x0.8)+(3/4x0.2)=0.45 transmitter (source) channelreceiver p(X) p(Y|X) p(Y) noise p(y 1 |x 1 ) p(x 1 )p(y 1 ) p(x 2 )p(y 2 ) p(y 2 |x 2 ) p(y 2 |x 1 ) p(y 1 |x 2 ) Two element (binary) channel Information transfer through a communication channel
12
p(y 1 |x 1 ) p(x 1 )p(y 1 ) p(x 2 )p(y 2 ) p(y 1 |x 1 ) p(y | |x 1 ) p(y 1 |x 2 ) Binary Channel N 11 N 12 N stim 1 N 21 N 22 N stim 2 N res 1 N res 2 N stimulus 1 stimulus 2 number of responses response 1 response 2 number of stimuli total number of stimuli (or responses) p( x j ) = N stim j / N joint probability that both x j and y k happen is p( x j,y k ) = N jk / N p( x j |y k ) = N jk / N res k p( y k ) = N res k / N
13
y1y1 y2y2 ynyn total x1x1 N 11 N 12 N 1n N stim 1 x2x2 N 21 N 22 xnxn N n1 N nn N stim n totalN 1row N nrow N called stimulus received response Stimulus-Response Confusion Matrix number of j-th stimuli Σ k N jk =N stim j number of k-th responses Σ j N jk =N res k number of called stimuli = number of responses = Σ k N res k = Σ j N stim j = N probability of x j th symbol p( x j ) = N stim j / N joint probability that both x j and y k happen p( x j,y k ) = N jk / N conditional probability that x j was sent when y k was received p( x j |y k ) = N jk / N res k probability of y k th symbol p( y k ) = N res k / N
14
This happens when the input and the output are independent (joint probabilities are given by products of the individual probabilities). There is no relation of the output to the input, i.e. no information transfer) information transferred by the system I (X|Y) = H max (X,Y)-H(X,Y)
15
stim 1stim 2 resp 1100 resp 2010 20 run experiment 20 times get it always RIGHT 0.50 0 input probabilities p(x 1 )=0.5 p(x 2 )=0.5 output probabilities p(y 1 )=0.5 p(x 2 )=0.5 joint probabilities p(x j,y k ) transferred information I(X|Y)=H max (X,Y)-H(X,Y) =2-1=1 bit 0.25 probabilities of independent events
16
stim 1stim 2 resp 1010 resp 2100 20 run experiment 20 times get it always WRONG 00.5 0 input probabilities p(x 1 )=0.5 p(x 2 )=0.5 output probabilities p(y 1 )=0.5 p(x 2 )=0.5 joint probabilities p(x j,y k ) transferred information I(X;Y)=H max (X,Y)-H(X,Y) =2-1=1 bit 0.25 probabilities of independent events
17
stim 1stim 2 resp 15510 resp 25510 20 run experiment 20 times get it 10 times right and 10 times wrong 0.25 02.50.25 input probabilities p(x 1 )=0.5 p(x 2 )=0.5 output probabilities p(y 1 )=0.5 p(x 2 )=0.5 joint probabilities p(x j,y k ) transferred information I(X;Y)=H max (X,Y)-H(X,Y) =2-2=0 bit 0.25 probabilities of independent events
18
response categoriesnumber of stimuli stimuli categories y1y1 y2y2 y3y3 y4y4 y5y5 x1x1 20500025 x2x2 51550025 x3x3 06172025 x4x4 00512825 x5x5 00061925 number of responses 2526272027125
19
y1y1 y2y2 ynyn x1x1 N 11 N 12 N 1n x2x2 N 21 N 22 N 2n xnxn N n1 N nn Matrix of Joint Probabilities (stimulus-response matrix divided by total number of stimuli) y1y1 y2y2 ynyn x1x1 p(x 1, y 1 )p(x 1,y 2 )p(x 1,y n ) x2x2 p(x 2, y 1 )p(x 2,y 2 )p(x 2,y n ) xnxn p(x n, y 1 )p(x n,y 2 )p(x n,y n ) joint probabilitiesstimuli-responses number of called stimuli=number of responses=N p(x i,y j ) = N ij /N
20
responsesnumb er of stimuli probability of stimulus stimuliy1y1 y2y2 y3y3 y4y4 y5y5 x1x1 2050002525/125= 0.2 x2x2 5155002525/125=0.2 x3x3 0617202525/125=0.2 x4x4 0051282525/125=0.2 x5x5 0006192525/125=0.2 number of responses 2526272027125 probability of response 25/125 = 0.2 26/125 =0.208 27/125 =0.216 20/125 =0.16 27/125 =0.216 stimulus/response confusion matrix
21
y1y1 y2y2 y3y3 y4y4 y5y5 x1x1 20/125 =0.16 5/125 =0.04 000 x2x2 5/125 =0.04 15/125 =0.12 5/125 =0.04 00 x3x3 06/125 =0.048 17/125 =0.136 2/125 =0.016 0 x4x4 005/125 =0.04 12/125 =0.096 8/125 =0.064 x5x5 0006/125 =0.048 19/125 =0.152 matrix of joint probabilities p(x j,y k ) total number of stimuli (responses) N = 125 joint probability p( x\ x j,y k ) = x i y j /N
22
when x i and y j are independent events (i.e. output does not depend on input), the joint probability would be given by a product of probabilities of these independent events P(x i,y j ) = p(x i ) p(y j ), and the entropy of the system would be maximum H max (the system would be entirely useless for transmission of the information, since its output would not depend on its input) y1y1 y2y2 y3y3 y4y4 y5y5 x1x1 20/125 =0.16 5/125 =0.04 000 x2x2 5/125 =0.04 15/125 =0.12 5/125 =0.04 00 x3x3 06/125 =0.048 17/125 =0.136 2/125 =0.016 0 x4x4 005/125 =0.04 12/125 =0.096 8/125 =0.064 x5x5 0006/125 =0.048 19/125 =0.152
23
The information that is transmitted by the system is given by a difference between the maximum joint entropy of the matrix of independent events H max (X,Y) and the joint entropy of the real system (derived from the confusion matrix H(X,Y). I(X;Y) =H max (X,Y) – X(X,Y) = 4.63 – 3.41 = 1.2 bits
24
Capacity of human channel for one- dimensional stimuli
25
Magic number 7±2 (between 2-3 bits) (George Miller 1956)
26
Human perception seems to distinguish only among 7 (plus or minus 2) different entities along one perceptual dimension To recognize more items – long training (musicians) – use more than one perceptual dimension (e.g. pitch and loudness) – chunk the items into larger chunks (phonemes to words, words to phrases,..) Magic number 7±2 (between 2-3 bits) (George Miller 1956)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.