Mutual Information Brian Dils I590 – ALife/AI 02.28.05.

Slides:



Advertisements
Similar presentations
Probabilistic models Haixu Tang School of Informatics.
Advertisements

Component Analysis (Review)
Unsupervised Learning
Pattern Recognition and Machine Learning
Improved Neural Network Based Language Modelling and Adaptation J. Park, X. Liu, M.J.F. Gales and P.C. Woodland 2010 INTERSPEECH Bang-Xuan Huang Department.
An Overview of Machine Learning
Artificial Neural Networks
Artificial Spiking Neural Networks
2004/11/161 A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition LAWRENCE R. RABINER, FELLOW, IEEE Presented by: Chi-Chun.
Measures of Information Hartley defined the first information measure: –H = n log s –n is the length of the message and s is the number of possible values.
Mutual Information for Image Registration and Feature Selection
Machine Learning CMPT 726 Simon Fraser University
Abdallah Kassir 1. Information Theory Entropy: Conditional Entropy: Mutual Information: 2.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Pattern Recognition Applications Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
Bioinformatics lectures at Rice University Lecture 4: Shannon entropy and mutual information.
ECSE 6610 Pattern Recognition Professor Qiang Ji Spring, 2011.
1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu.
Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.
7-Speech Recognition Speech Recognition Concepts
CSC2535: Computation in Neural Networks Lecture 11: Conditional Random Fields Geoffrey Hinton.
NEURAL NETWORKS FOR DATA MINING
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.
1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Mathematical Foundations Elementary Probability Theory Essential Information Theory Updated 11/11/2005.
Computer Vision Lecture 6. Probabilistic Methods in Segmentation.
ECE 8443 – Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional Likelihood Mutual Information Estimation (CMLE) Maximum MI Estimation.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.
NTU & MSRA Ming-Feng Tsai
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Computacion Inteligente Least-Square Methods for System Identification.
Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Outline Time series prediction Find k-nearest neighbors Lag selection Weighted LS-SVM.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Yun, Hyuk Jin. Theory A.Nonuniformity Model where at location x, v is the measured signal, u is the true signal emitted by the tissue, is an unknown.
Probability Theory and Parameter Estimation I
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Intro to Machine Learning
LECTURE 10: DISCRIMINANT ANALYSIS
Statistical Models for Automatic Speech Recognition
LECTURE 10: EXPECTATION MAXIMIZATION (EM)
Neural Networks Dr. Peter Phillips.
Data Mining Lecture 11.
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Going Backwards In The Procedure and Recapitulation of System Identification By Ali Pekcan 65570B.
What is Pattern Recognition?
Multi-modality image registration using mutual information based on gradient vector flow Yujun Guo May 1,2006.
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Information Based Criteria for Design of Experiments
Hidden Markov Autoregressive Models
REMOTE SENSING Multispectral Image Classification
Statistical Models for Automatic Speech Recognition
Lecture 5 Unsupervised Learning in fully Observed Directed and Undirected Graphical Models.
LECTURE 23: INFORMATION THEORY REVIEW
LECTURE 15: REESTIMATION, EM AND MIXTURES
LECTURE 09: DISCRIMINANT ANALYSIS
Text Categorization Berlin Chen 2003 Reference:
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Random Neural Network Texture Model
What is Artificial Intelligence?
Presentation transcript:

Mutual Information Brian Dils I590 – ALife/AI

Mutual Information Brian Dils I590 – ALife/AI

Mutual Information Brian Dils I590 – ALife/AI What is Mutual Information? Essential to Probability and Information Theory MI is concerned with quantifying the independence of two variables

Mutual Information Brian Dils I590 – ALife/AI What is Mutual Information? MI measures the amount of information in variable x that is shared by variable y MI quantifies the distance between the joint distribution of x and y

Mutual Information Brian Dils I590 – ALife/AI When is MI important? Suppose we know y. If x contains no shared information with y, then the variables are totally independent –Mutual Information: 0 – Entropy of x is very high –However x is not important since it’s not informative about y

Mutual Information Brian Dils I590 – ALife/AI When is MI important? Again we know y, but this time all the information conveyed in x is also conveyed in y –Mutual Information: 100 –Nothing surprising about x, so entropy is very low –x not important because we could simply study y

Mutual Information Brian Dils I590 – ALife/AI When is MI important? MI is important (and powerful) when two variables are not independent and are not identical in the information they convey

Mutual Information Brian Dils I590 – ALife/AI

Mutual Information Brian Dils I590 – ALife/AI Why Apply MI? If mutual information is maximized (dependencies increased), conditional entropy can be minimized Reducing conditional entropy makes the behavior of random variables more predictable because their values are more dependent on one another

Mutual Information Brian Dils I590 – ALife/AI MI Applications Discriminative training procedures for hidden Markov models have been proposed based on the maximum mutual information (MMI) criterion. –Hidden parameters predicted from known –Applicable to speech recognition, character recognition, natural language processing

Mutual Information Brian Dils I590 – ALife/AI MI Applications Mutual information is often used as a significance function for the computation of collocations in corpus linguistics. –Essential to coherent speak –Easy for humans, hard to artificial systems –MI has been shown improve connections in AI systems

Mutual Information Brian Dils I590 – ALife/AI MI Applications Mutual information is used in medical imaging for image registration. –Given a reference image (for example, a brain scan), and a second image which needs to be put the same coordinate system as the reference image, this image is deformed until the mutual information between it and the reference image is maximized.

Mutual Information Brian Dils I590 – ALife/AI MI Applications Mutual information has been used as a criterion for feature selection and feature transformations in machine learning and agent-based learning. –Using MI criteria, it was found that the more input variables available, the lower the conditional entropy become –MI-based criteria could effectively select features AND roughly estimate optimal feature subsets, classic problems in feature selection

Mutual Information Brian Dils I590 – ALife/AI References Huang, D., & Chow, T.W.S. (2003). Searching optimal feature subset using mutual information. Proceedings of the 2003 International Symposium on Artificial Neural Networks (pp ). Bruges, Belgium. Battiti, R. (1994) Using mutual information for selecting features in supervised neural net learning. Neural Networks, 5, Bonnlander, B., & Weigend, A.S. (1994). Selecting input variables using mutual information and nonparametric density estimations. Proceedings of the 1994 International Symposium on Artificial Neural Networks (pp ). Tainan, Taiwan. Wikipedia entries on “Mutual Information”, “Probability Theory”, and “Information Theory”