Recognition stimulus input Observer (information transmission channel) response Response: which category the stimulus belongs to ? What is the “information.

Slides:



Advertisements
Similar presentations
Lecture 2: Basic Information Theory TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency (FOI)
Advertisements

Chapter 1: Information and information-processing
Binary Symmetric channel (BSC) is idealised model used for noisy channel. symmetric p( 01) =p(10)
Information theory Multi-user information theory A.J. Han Vinck Essen, 2004.
Information Theory EE322 Al-Sanie.
Center for Computational Biology Department of Mathematical Sciences Montana State University Collaborators: Alexander Dimitrov Tomas Gedeon John P. Miller.
Protein- Cytokine network reconstruction using information theory-based analysis Farzaneh Farhangmehr UCSD Presentation#3 July 25, 2011.
The University of Manchester Introducción al análisis del código neuronal con métodos de la teoría de la información Dr Marcelo A Montemurro
Chapter 6 Information Theory
Fundamental limits in Information Theory Chapter 10 :
Today: Entropy Information Theory. Claude Shannon Ph.D
Today Linear Regression Logistic Regression Bayesians v. Frequentists
Learning decision trees derived from Hwee Tou Ng, slides for Russell & Norvig, AI a Modern Approachslides Tom Carter, “An introduction to information theory.
Information Theory Rong Jin. Outline  Information  Entropy  Mutual information  Noisy channel model.
Learning decision trees derived from Hwee Tou Ng, slides for Russell & Norvig, AI a Modern Approachslides Tom Carter, “An introduction to information theory.
Lecture 2: Basic Information Theory Thinh Nguyen Oregon State University.
Reliability and Channel Coding
Mario Vodisek 1 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Erasure Codes for Reading and Writing Mario Vodisek ( joint work.
Information Theory and Security
Noise, Information Theory, and Entropy
X= {x 0, x 1,….,x J-1 } Y= {y 0, y 1, ….,y K-1 } Channel Finite set of input (X= {x 0, x 1,….,x J-1 }), and output (Y= {y 0, y 1,….,y K-1 }) alphabet.
Noise, Information Theory, and Entropy
1 Statistical NLP: Lecture 5 Mathematical Foundations II: Information Theory.
Crash Course on Machine Learning
Some basic concepts of Information Theory and Entropy
§1 Entropy and mutual information
STATISTIC & INFORMATION THEORY (CSNB134)
INFORMATION THEORY BYK.SWARAJA ASSOCIATE PROFESSOR MREC.
1. Entropy as an Information Measure - Discrete variable definition Relationship to Code Length - Continuous Variable Differential Entropy 2. Maximum Entropy.
1 Quasi-Anonymous Channels Ira S. Moskowitz --- NRL Richard E. Newman --- UF Paul F. Syverson --- NRL Center for High Assurance Computer Systems Code 5540.
Sept. 25, 2006 Assignment #1 Assignment #2 and Lab #3 Now Online Formula Cheat Sheet Cheat SheetCheat Sheet Review Time, Frequency, Fourier Bandwidth Bandwidth.
Chapter 11: The Data Survey Supplemental Material Jussi Ahola Laboratory of Computer and Information Science.
Channel Capacity.
Grasshopper communication What do they have to talk about? Are you a male or a female? Are you receptive to mating? Are you a grasshopper? Do you belong.
MIMO continued and Error Correction Code. 2 by 2 MIMO Now consider we have two transmitting antennas and two receiving antennas. A simple scheme called.
1.Check if channel capacity* can cope with source information rate, if yes, source coding can proceed. Understand why. (i) Calculate Source entropy from.
COMMUNICATION NETWORK. NOISE CHARACTERISTICS OF A CHANNEL 1.
1 Information in Continuous Signals f(t) t 0 In practice, many signals are essentially analogue i.e. continuous. e.g. speech signal from microphone, radio.
Introduction to Digital and Analog Communication Systems
Communication System A communication system can be represented as in Figure. A message W, drawn from the index set {1, 2,..., M}, results in the signal.
Outline Transmitters (Chapters 3 and 4, Source Coding and Modulation) (week 1 and 2) Receivers (Chapter 5) (week 3 and 4) Received Signal Synchronization.
Information Theory The Work of Claude Shannon ( ) and others.
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
How Computer Work Lecture 10 Page 1 How Computer Work Lecture 10 Introduction to the Physics of Communication.
Outline Transmitters (Chapters 3 and 4, Source Coding and Modulation) (week 1 and 2) Receivers (Chapter 5) (week 3 and 4) Received Signal Synchronization.
Coding Theory Efficient and Reliable Transfer of Information
Mathematical Foundations Elementary Probability Theory Essential Information Theory Updated 11/11/2005.
Limits On Wireless Communication In Fading Environment Using Multiple Antennas Presented By Fabian Rozario ECE Department Paper By G.J. Foschini and M.J.
Basic Concepts of Information Theory Entropy for Two-dimensional Discrete Finite Probability Schemes. Conditional Entropy. Communication Network. Noise.
1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Lecture 10 Rate-Distortion.
1 Lecture 7 System Models Attributes of a man-made system. Concerns in the design of a distributed system Communication channels Entropy and mutual information.
Presented by Minkoo Seo March, 2006
INFORMATION THEORY Pui-chor Wong.
Source Encoder Channel Encoder Noisy channel Source Decoder Channel Decoder Figure 1.1. A communication system: source and channel coding.
Basic Concepts of Information Theory A measure of uncertainty. Entropy. 1.
Channel Coding Theorem (The most famous in IT) Channel Capacity; Problem: finding the maximum number of distinguishable signals for n uses of a communication.
Mutual Information, Joint Entropy & Conditional Entropy
UNIT I. Entropy and Uncertainty Entropy is the irreducible complexity below which a signal cannot be compressed. Entropy is the irreducible complexity.
(C) 2000, The University of Michigan 1 Language and Information Handout #2 September 21, 2000.
Chapter 4: Information Theory. Learning Objectives LO 4.1 – Understand discrete and continuous messages, message sources, amount of information and its.
Linear Algebra: What are we going to learn? 李宏毅 Hung-yi Lee.
Basic Concepts of Information Theory Entropy for Two-dimensional Discrete Finite Probability Schemes. Conditional Entropy. Communication Network. Noise.
Introduction to Information theory
(information transmission channel)
Subject Name: Information Theory Coding Subject Code: 10EC55
Quantum Information Theory Introduction
Machine learning overview
MIMO (Multiple Input Multiple Output)
Lecture 2: Basic Information Theory
Watermarking with Side Information
Presentation transcript:

Recognition stimulus input Observer (information transmission channel) response Response: which category the stimulus belongs to ? What is the “information value” of recognizing the category?

Information area reduced to 63/64 area reduced to 1/64 NOT HERE NOT HERE area reduced to 1/2

The amount of information gained by receiving the signal is proportional to ratio of these two areas Prior information (possible space of signals) Posterior (possible space after the signal is received) The less likely the outcome, the more information is gained! The information in a symbol s should be inversely proportional to the probability of the symbol p.

Also a juggling machine, rocket-powered Frisbees, motorized Pogo sticks, a device that could solve the Rubik's Cube puzzle,….. Basics of Information Theory Claude Elwood Shannon ( ) Observe output message Try to make up the input message (gain new information)

Measuring the information Multiplication turns to addition Is always positive (since p<1) information in an event

1 bit of information reduces the area of possible messages to half When log 2, then entropy is in bits Information gained when deciding among N (equally likely) alternatives Number of stimulus alternatives N Number of Bits (log 2 N) 2 1 = = = 2568

experiments with two possible outcomes with probabilities p 1 and p 2 total probability must be 1, so p 2 =1- p 1 H=-p 1 log 2 p 1 – (1– p 1 ) log 2 (1-p 1 ) i.e. H=0 for p 1 =0 (the second outcome certain) or p 1 =1 (the first outcome certain) for p 1 = 0.5, p 2 =0.5 H=-0.5 log log = log = 1 Entropy H (information) is maximum when the outcome is the least predictable !

1 st or 2 nd half ? Equal prior probability of each category. need 3 binary numbers (3 bits) to describe 2 3 = 8 categories need more bits when dealing with symbols that are not all equally likely 5 bits

The Bar Code

With no noise in the channel, p(x i |y i )=1 and p(x i,y j ) = 0 p(x) p(y|x) p(y) p(x 1 )p(y 1 )=p(x 1 ) p(x 2 )p(y 2 )=p(x 2 ) With noise, p(x i |y i ) 0 5/8 3/4 1/4 3/ p(y 1 )=(5/8x0.8)+(1/4x0.2)=0.55 p(y 2 )=(3/8x0.8)+(3/4x0.2)=0.45 transmitter (source) channelreceiver p(X) p(Y|X) p(Y) noise p(y 1 |x 1 ) p(x 1 )p(y 1 ) p(x 2 )p(y 2 ) p(y 2 |x 2 ) p(y 2 |x 1 ) p(y 1 |x 2 ) Two element (binary) channel Information transfer through a communication channel

p(y 1 |x 1 ) p(x 1 )p(y 1 ) p(x 2 )p(y 2 ) p(y 1 |x 1 ) p(y | |x 1 ) p(y 1 |x 2 ) Binary Channel N 11 N 12 N stim 1 N 21 N 22 N stim 2 N res 1 N res 2 N stimulus 1 stimulus 2 number of responses response 1 response 2 number of stimuli total number of stimuli (or responses) p( x j ) = N stim j / N joint probability that both x j and y k happen is p( x j,y k ) = N jk / N p( x j |y k ) = N jk / N res k p( y k ) = N res k / N

y1y1 y2y2 ynyn total x1x1 N 11 N 12 N 1n N stim 1 x2x2 N 21 N 22 xnxn N n1 N nn N stim n totalN 1row N nrow N called stimulus received response Stimulus-Response Confusion Matrix number of j-th stimuli Σ k N jk =N stim j number of k-th responses Σ j N jk =N res k number of called stimuli = number of responses = Σ k N res k = Σ j N stim j = N probability of x j th symbol p( x j ) = N stim j / N joint probability that both x j and y k happen p( x j,y k ) = N jk / N conditional probability that x j was sent when y k was received p( x j |y k ) = N jk / N res k probability of y k th symbol p( y k ) = N res k / N

This happens when the input and the output are independent (joint probabilities are given by products of the individual probabilities). There is no relation of the output to the input, i.e. no information transfer) information transferred by the system I (X|Y) = H max (X,Y)-H(X,Y)

stim 1stim 2 resp 1100 resp run experiment 20 times get it always RIGHT input probabilities p(x 1 )=0.5 p(x 2 )=0.5 output probabilities p(y 1 )=0.5 p(x 2 )=0.5 joint probabilities p(x j,y k ) transferred information I(X|Y)=H max (X,Y)-H(X,Y) =2-1=1 bit 0.25 probabilities of independent events

stim 1stim 2 resp 1010 resp run experiment 20 times get it always WRONG input probabilities p(x 1 )=0.5 p(x 2 )=0.5 output probabilities p(y 1 )=0.5 p(x 2 )=0.5 joint probabilities p(x j,y k ) transferred information I(X;Y)=H max (X,Y)-H(X,Y) =2-1=1 bit 0.25 probabilities of independent events

stim 1stim 2 resp resp run experiment 20 times get it 10 times right and 10 times wrong input probabilities p(x 1 )=0.5 p(x 2 )=0.5 output probabilities p(y 1 )=0.5 p(x 2 )=0.5 joint probabilities p(x j,y k ) transferred information I(X;Y)=H max (X,Y)-H(X,Y) =2-2=0 bit 0.25 probabilities of independent events

response categoriesnumber of stimuli stimuli categories y1y1 y2y2 y3y3 y4y4 y5y5 x1x x2x x3x x4x x5x number of responses

y1y1 y2y2 ynyn x1x1 N 11 N 12 N 1n x2x2 N 21 N 22 N 2n xnxn N n1 N nn Matrix of Joint Probabilities (stimulus-response matrix divided by total number of stimuli) y1y1 y2y2 ynyn x1x1 p(x 1, y 1 )p(x 1,y 2 )p(x 1,y n ) x2x2 p(x 2, y 1 )p(x 2,y 2 )p(x 2,y n ) xnxn p(x n, y 1 )p(x n,y 2 )p(x n,y n ) joint probabilitiesstimuli-responses number of called stimuli=number of responses=N p(x i,y j ) = N ij /N

responsesnumb er of stimuli probability of stimulus stimuliy1y1 y2y2 y3y3 y4y4 y5y5 x1x /125= 0.2 x2x /125=0.2 x3x /125=0.2 x4x /125=0.2 x5x /125=0.2 number of responses probability of response 25/125 = /125 = /125 = /125 = /125 =0.216 stimulus/response confusion matrix

y1y1 y2y2 y3y3 y4y4 y5y5 x1x1 20/125 =0.16 5/125 = x2x2 5/125 = /125 =0.12 5/125 = x3x3 06/125 = /125 = /125 = x4x4 005/125 = /125 = /125 =0.064 x5x5 0006/125 = /125 =0.152 matrix of joint probabilities p(x j,y k ) total number of stimuli (responses) N = 125 joint probability p( x\ x j,y k ) = x i y j /N

when x i and y j are independent events (i.e. output does not depend on input), the joint probability would be given by a product of probabilities of these independent events P(x i,y j ) = p(x i ) p(y j ), and the entropy of the system would be maximum H max (the system would be entirely useless for transmission of the information, since its output would not depend on its input) y1y1 y2y2 y3y3 y4y4 y5y5 x1x1 20/125 =0.16 5/125 = x2x2 5/125 = /125 =0.12 5/125 = x3x3 06/125 = /125 = /125 = x4x4 005/125 = /125 = /125 =0.064 x5x5 0006/125 = /125 =0.152

The information that is transmitted by the system is given by a difference between the maximum joint entropy of the matrix of independent events H max (X,Y) and the joint entropy of the real system (derived from the confusion matrix H(X,Y). I(X;Y) =H max (X,Y) – X(X,Y) = 4.63 – 3.41 = 1.2 bits

Capacity of human channel for one- dimensional stimuli

Magic number 7±2 (between 2-3 bits) (George Miller 1956)

Human perception seems to distinguish only among 7 (plus or minus 2) different entities along one perceptual dimension To recognize more items – long training (musicians) – use more than one perceptual dimension (e.g. pitch and loudness) – chunk the items into larger chunks (phonemes to words, words to phrases,..) Magic number 7±2 (between 2-3 bits) (George Miller 1956)