Measures of Information Hartley defined the first information measure: –H = n log s –n is the length of the message and s is the number of possible values.

Slides:



Advertisements
Similar presentations
Component Analysis (Review)
Advertisements

Pattern Recognition and Machine Learning
Integration of sensory modalities
Chapter 6 Information Theory
DEPARTMENT OF HEALTH SCIENCE AND TECHNOLOGY STOCHASTIC SIGNALS AND PROCESSES Lecture 1 WELCOME.
Middle Term Exam 03/04, in class. Project It is a team work No more than 2 people for each team Define a project of your own Otherwise, I will assign.
Mutual Information Mathematical Biology Seminar
Fundamental limits in Information Theory Chapter 10 :
L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
Statistical Inference Chapter 12/13. COMP 5340/6340 Statistical Inference2 Statistical Inference Given a sample of observations from a population, the.
Mutual Information for Image Registration and Feature Selection
Machine Learning CMPT 726 Simon Fraser University
Information Theory Rong Jin. Outline  Information  Entropy  Mutual information  Noisy channel model.
Lecture 2: Basic Information Theory Thinh Nguyen Oregon State University.
Mutual Information Narendhran Vijayakumar 03/14/2008.
Lecture II-2: Probability Review
Modern Navigation Thomas Herring
Albert Gatt Corpora and Statistical Methods. Probability distributions Part 2.
If we measured a distribution P, what is the tree- dependent distribution P t that best approximates P? Search Space: All possible trees Goal: From all.
Basic Concepts in Information Theory
Some basic concepts of Information Theory and Entropy
Prof. SankarReview of Random Process1 Probability Sample Space (S) –Collection of all possible outcomes of a random experiment Sample Point –Each outcome.
Entropy and some applications in image processing Neucimar J. Leite Institute of Computing
STATISTIC & INFORMATION THEORY (CSNB134)
2. Mathematical Foundations
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
INFORMATION THEORY BYK.SWARAJA ASSOCIATE PROFESSOR MREC.
1  The goal is to estimate the error probability of the designed classification system  Error Counting Technique  Let classes  Let data points in class.
1. Entropy as an Information Measure - Discrete variable definition Relationship to Code Length - Continuous Variable Differential Entropy 2. Maximum Entropy.
Machine Vision for Robots
Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.
1 Institute of Engineering Mechanics Leopold-Franzens University Innsbruck, Austria, EU H.J. Pradlwarter and G.I. Schuëller Confidence.
Ch2: Probability Theory Some basics Definition of Probability Characteristics of Probability Distributions Descriptive statistics.
ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.
Module 1: Statistical Issues in Micro simulation Paul Sousa.
Basic Concepts of Encoding Codes, their efficiency and redundancy 1.
Channel Capacity.
EDGE DETECTION IN COMPUTER VISION SYSTEMS PRESENTATION BY : ATUL CHOPRA JUNE EE-6358 COMPUTER VISION UNIVERSITY OF TEXAS AT ARLINGTON.
Mathematics and Statistics Boot Camp II David Siroky Duke University.
9/26 디지털 영상통신 Mathematical Preliminaries Math Background Predictive Coding Huffman Coding Matrix Computation.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
2. Introduction to Probability. What is a Probability?
Presented by Minkoo Seo March, 2006
Uses of Information Theory in Medical Imaging Wang Zhan, Ph.D. Center for Imaging of Neurodegenerative Diseases Tel: x2454,
1 Probability: Introduction Definitions,Definitions, Laws of ProbabilityLaws of Probability Random VariablesRandom Variables DistributionsDistributions.
Basic Concepts of Information Theory A measure of uncertainty. Entropy. 1.
Oliver Schulte Machine Learning 726 Decision Tree Classifiers.
Mutual Information Brian Dils I590 – ALife/AI
Mutual Information and Channel Capacity Multimedia Security.
Essential Probability & Statistics (Lecture for CS397-CXZ Algorithms in Bioinformatics) Jan. 23, 2004 ChengXiang Zhai Department of Computer Science University.
Dimension reduction (1) Overview PCA Factor Analysis Projection persuit ICA.
UNIT I. Entropy and Uncertainty Entropy is the irreducible complexity below which a signal cannot be compressed. Entropy is the irreducible complexity.
CSC321: Lecture 8: The Bayesian way to fit models Geoffrey Hinton.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
CSC2535: Computation in Neural Networks Lecture 11 Extracting coherent properties by maximizing mutual information across space or time Geoffrey Hinton.
Discrete Probability Distributions
LECTURE 10: DISCRIMINANT ANALYSIS
LECTURE 03: DECISION SURFACES
Image Compression The still image and motion images can be compressed by lossless coding or lossy coding. Principle of compression: - reduce the redundant.
Corpora and Statistical Methods
Multi-modality image registration using mutual information based on gradient vector flow Yujun Guo May 1,2006.
Quantum Information Theory Introduction
Integration of sensory modalities
EE513 Audio Signals and Systems
Image Registration 박성진.
Generally Discriminant Analysis
LECTURE 09: DISCRIMINANT ANALYSIS
Parametric Methods Berlin Chen, 2005 References:
Mathematical Foundations of BME Reza Shadmehr
Presentation transcript:

Measures of Information Hartley defined the first information measure: –H = n log s –n is the length of the message and s is the number of possible values for each symbol in the message –Assumes all symbols equally likely to occur Shannon proposed variant (Shannon’s Entropy) weighs the information based on the probability that an outcome will occur second term shows the amount of information an event provides is inversely proportional to its prob of occurring

Three Interpretations of Entropy The amount of information an event provides –An infrequently occurring event provides more information than a frequently occurring event The uncertainty in the outcome of an event –Systems with one very common event have less entropy than systems with many equally probable events The dispersion in the probability distribution –An image of a single amplitude has a less disperse histogram than an image of many greyscales the lower dispersion implies lower entropy

Definitions of Mutual Information Three commonly used definitions: –1) I(A,B) = H(B) - H(B|A) = H(A) - H(A|B) Mutual information is the amount that the uncertainty in B (or A) is reduced when A (or B) is known. –2) I(A,B) = H(A) + H(B) - H(A,B) Maximizing the mutual info is equivalent to minimizing the joint entropy (last term) Advantage in using mutual info over joint entropy is it includes the individual input’s entropy Works better than simply joint entropy in regions of image background (low contrast) where there will be low joint entropy but this is offset by low individual entropies as well so the overall mutual information will be low

Definitions of Mutual Information II This definition is related to the Kullback-Leibler distance between two distributions Measures the dependence of the two distributions In image registration I(A,B) will be maximized when the images are aligned In feature selection choose the features that minimize I(A,B) to ensure they are not related.

Additional Definitions of Mutual Information Two definitions exist for normalizing Mutual information: –Normalized Mutual Information: –Entropy Correlation Coefficient:

Derivation of M. I. Definitions

Properties of Mutual Information MI is symmetric: I(A,B) = I(B,A) I(A,A) = H(A) I(A,B) <= H(A), I(A,B) <= H(B) –info each image contains about the other cannot be greater than the info they themselves contain I(A,B) >= 0 –Cannot increase uncertainty in A by knowing B If A, B are independent then I(A,B) = 0 If A, B are Gaussian then:

Mutual Information based Feature Selection Tested using 2-class Occupant sensing problem –Classes are RFIS and everything else (children, adults, etc). –Use edge map of imagery and compute features Legendre Moments to order 36 Generates 703 features, we select best 51 features. Tested 3 filter-based methods: –Mann-Whitney statistic –Kullback-Leibler statistic –Mutual Information criterion Tested both single M.I., and Joint M.I. (JMI)

Mutual Information based Feature Selection Method M.I. tests a feature’s ability to separate two classes. –Based on definition 3) for M.I. –Here A is the feature vector and B is the classification Note that A is continuous but B is discrete –By maximizing the M.I. We maximize the separability of the feature Note this method only tests each feature individually