Recognition stimulus input Observer (information transmission channel) response Response: which category the stimulus belongs to ? What is the “information value” of recognizing the category?
Information area reduced to 63/64 area reduced to 1/64 NOT HERE NOT HERE area reduced to 1/2
The amount of information gained by receiving the signal is proportional to ratio of these two areas Prior information (possible space of signals) Posterior (possible space after the signal is received) The less likely the outcome, the more information is gained! The information in a symbol s should be inversely proportional to the probability of the symbol p.
Also a juggling machine, rocket-powered Frisbees, motorized Pogo sticks, a device that could solve the Rubik's Cube puzzle,….. Basics of Information Theory Claude Elwood Shannon ( ) Observe output message Try to make up the input message (gain new information)
Measuring the information Multiplication turns to addition Is always positive (since p<1) information in an event
1 bit of information reduces the area of possible messages to half When log 2, then entropy is in bits Information gained when deciding among N (equally likely) alternatives Number of stimulus alternatives N Number of Bits (log 2 N) 2 1 = = = 2568
experiments with two possible outcomes with probabilities p 1 and p 2 total probability must be 1, so p 2 =1- p 1 H=-p 1 log 2 p 1 – (1– p 1 ) log 2 (1-p 1 ) i.e. H=0 for p 1 =0 (the second outcome certain) or p 1 =1 (the first outcome certain) for p 1 = 0.5, p 2 =0.5 H=-0.5 log log = log = 1 Entropy H (information) is maximum when the outcome is the least predictable !
1 st or 2 nd half ? Equal prior probability of each category. need 3 binary numbers (3 bits) to describe 2 3 = 8 categories need more bits when dealing with symbols that are not all equally likely 5 bits
The Bar Code
With no noise in the channel, p(x i |y i )=1 and p(x i,y j ) = 0 p(x) p(y|x) p(y) p(x 1 )p(y 1 )=p(x 1 ) p(x 2 )p(y 2 )=p(x 2 ) With noise, p(x i |y i ) 0 5/8 3/4 1/4 3/ p(y 1 )=(5/8x0.8)+(1/4x0.2)=0.55 p(y 2 )=(3/8x0.8)+(3/4x0.2)=0.45 transmitter (source) channelreceiver p(X) p(Y|X) p(Y) noise p(y 1 |x 1 ) p(x 1 )p(y 1 ) p(x 2 )p(y 2 ) p(y 2 |x 2 ) p(y 2 |x 1 ) p(y 1 |x 2 ) Two element (binary) channel Information transfer through a communication channel
p(y 1 |x 1 ) p(x 1 )p(y 1 ) p(x 2 )p(y 2 ) p(y 1 |x 1 ) p(y | |x 1 ) p(y 1 |x 2 ) Binary Channel N 11 N 12 N stim 1 N 21 N 22 N stim 2 N res 1 N res 2 N stimulus 1 stimulus 2 number of responses response 1 response 2 number of stimuli total number of stimuli (or responses) p( x j ) = N stim j / N joint probability that both x j and y k happen is p( x j,y k ) = N jk / N p( x j |y k ) = N jk / N res k p( y k ) = N res k / N
y1y1 y2y2 ynyn total x1x1 N 11 N 12 N 1n N stim 1 x2x2 N 21 N 22 xnxn N n1 N nn N stim n totalN 1row N nrow N called stimulus received response Stimulus-Response Confusion Matrix number of j-th stimuli Σ k N jk =N stim j number of k-th responses Σ j N jk =N res k number of called stimuli = number of responses = Σ k N res k = Σ j N stim j = N probability of x j th symbol p( x j ) = N stim j / N joint probability that both x j and y k happen p( x j,y k ) = N jk / N conditional probability that x j was sent when y k was received p( x j |y k ) = N jk / N res k probability of y k th symbol p( y k ) = N res k / N
This happens when the input and the output are independent (joint probabilities are given by products of the individual probabilities). There is no relation of the output to the input, i.e. no information transfer) information transferred by the system I (X|Y) = H max (X,Y)-H(X,Y)
stim 1stim 2 resp 1100 resp run experiment 20 times get it always RIGHT input probabilities p(x 1 )=0.5 p(x 2 )=0.5 output probabilities p(y 1 )=0.5 p(x 2 )=0.5 joint probabilities p(x j,y k ) transferred information I(X|Y)=H max (X,Y)-H(X,Y) =2-1=1 bit 0.25 probabilities of independent events
stim 1stim 2 resp 1010 resp run experiment 20 times get it always WRONG input probabilities p(x 1 )=0.5 p(x 2 )=0.5 output probabilities p(y 1 )=0.5 p(x 2 )=0.5 joint probabilities p(x j,y k ) transferred information I(X;Y)=H max (X,Y)-H(X,Y) =2-1=1 bit 0.25 probabilities of independent events
stim 1stim 2 resp resp run experiment 20 times get it 10 times right and 10 times wrong input probabilities p(x 1 )=0.5 p(x 2 )=0.5 output probabilities p(y 1 )=0.5 p(x 2 )=0.5 joint probabilities p(x j,y k ) transferred information I(X;Y)=H max (X,Y)-H(X,Y) =2-2=0 bit 0.25 probabilities of independent events
response categoriesnumber of stimuli stimuli categories y1y1 y2y2 y3y3 y4y4 y5y5 x1x x2x x3x x4x x5x number of responses
y1y1 y2y2 ynyn x1x1 N 11 N 12 N 1n x2x2 N 21 N 22 N 2n xnxn N n1 N nn Matrix of Joint Probabilities (stimulus-response matrix divided by total number of stimuli) y1y1 y2y2 ynyn x1x1 p(x 1, y 1 )p(x 1,y 2 )p(x 1,y n ) x2x2 p(x 2, y 1 )p(x 2,y 2 )p(x 2,y n ) xnxn p(x n, y 1 )p(x n,y 2 )p(x n,y n ) joint probabilitiesstimuli-responses number of called stimuli=number of responses=N p(x i,y j ) = N ij /N
responsesnumb er of stimuli probability of stimulus stimuliy1y1 y2y2 y3y3 y4y4 y5y5 x1x /125= 0.2 x2x /125=0.2 x3x /125=0.2 x4x /125=0.2 x5x /125=0.2 number of responses probability of response 25/125 = /125 = /125 = /125 = /125 =0.216 stimulus/response confusion matrix
y1y1 y2y2 y3y3 y4y4 y5y5 x1x1 20/125 =0.16 5/125 = x2x2 5/125 = /125 =0.12 5/125 = x3x3 06/125 = /125 = /125 = x4x4 005/125 = /125 = /125 =0.064 x5x5 0006/125 = /125 =0.152 matrix of joint probabilities p(x j,y k ) total number of stimuli (responses) N = 125 joint probability p( x\ x j,y k ) = x i y j /N
when x i and y j are independent events (i.e. output does not depend on input), the joint probability would be given by a product of probabilities of these independent events P(x i,y j ) = p(x i ) p(y j ), and the entropy of the system would be maximum H max (the system would be entirely useless for transmission of the information, since its output would not depend on its input) y1y1 y2y2 y3y3 y4y4 y5y5 x1x1 20/125 =0.16 5/125 = x2x2 5/125 = /125 =0.12 5/125 = x3x3 06/125 = /125 = /125 = x4x4 005/125 = /125 = /125 =0.064 x5x5 0006/125 = /125 =0.152
The information that is transmitted by the system is given by a difference between the maximum joint entropy of the matrix of independent events H max (X,Y) and the joint entropy of the real system (derived from the confusion matrix H(X,Y). I(X;Y) =H max (X,Y) – X(X,Y) = 4.63 – 3.41 = 1.2 bits
Capacity of human channel for one- dimensional stimuli
Magic number 7±2 (between 2-3 bits) (George Miller 1956)
Human perception seems to distinguish only among 7 (plus or minus 2) different entities along one perceptual dimension To recognize more items – long training (musicians) – use more than one perceptual dimension (e.g. pitch and loudness) – chunk the items into larger chunks (phonemes to words, words to phrases,..) Magic number 7±2 (between 2-3 bits) (George Miller 1956)