MML Inference of RBFs Enes Makalic Lloyd Allison Andrew Paplinski.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
CHAPTER 13 M ODELING C ONSIDERATIONS AND S TATISTICAL I NFORMATION “All models are wrong; some are useful.”  George E. P. Box Organization of chapter.
Biointelligence Laboratory, Seoul National University
Pattern Recognition and Machine Learning
Computer vision: models, learning and inference Chapter 8 Regression.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Chapter 4: Linear Models for Classification
Visual Recognition Tutorial
A Bayesian Approach to Joint Feature Selection and Classifier Design Balaji Krishnapuram, Alexander J. Hartemink, Lawrence Carin, Fellow, IEEE, and Mario.
Pattern Recognition and Machine Learning
Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Maximum likelihood (ML) and likelihood ratio (LR) test
Radial Basis Functions
Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.
Maximum likelihood (ML)
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
Bayesian Learning Rong Jin.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Maximum likelihood (ML)
Classification and Prediction: Regression Analysis
CSC2535: 2013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton.
PATTERN RECOGNITION AND MACHINE LEARNING
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
沈致远. Test error(generalization error): the expected prediction error over an independent test sample Training error: the average loss over the training.
Biointelligence Laboratory, Seoul National University
Model Inference and Averaging
Least-Mean-Square Training of Cluster-Weighted-Modeling National Taiwan University Department of Computer Science and Information Engineering.
Combining Regression Trees and Radial Basis Function Networks paper by: M. Orr, J. Hallam, K. Takezawa, A. Murray, S. Ninomiya, M. Oide, T. Leonard presentation.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
High-Dimensional Unsupervised Selection and Estimation of a Finite Generalized Dirichlet Mixture model Based on Minimum Message Length by Nizar Bouguila.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
Population coding Population code formulation Methods for decoding: population vector Bayesian inference maximum a posteriori maximum likelihood Fisher.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
INTRODUCTION TO Machine Learning 3rd Edition
Linear Models for Classification
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
CSC321: Lecture 7:Ways to prevent overfitting
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Occam’s Razor No Free Lunch Theorem Minimum.
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Learning Theory Reza Shadmehr Distribution of the ML estimates of model parameters Signal dependent noise models.
Computacion Inteligente Least-Square Methods for System Identification.
Linear Models Tony Dodd. 21 January 2008Mathematics for Data Modelling: Linear Models Overview Linear models. Parameter estimation. Linear in the parameters.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
Chapter 3: Maximum-Likelihood Parameter Estimation
DEEP LEARNING BOOK CHAPTER to CHAPTER 6
CEE 6410 Water Resources Systems Analysis
12. Principles of Parameter Estimation
Neural Networks Winter-Spring 2014
Model Inference and Averaging
Ch3: Model Building through Regression
Machine learning, pattern recognition and statistical data modelling
Chapter 2 Minimum Variance Unbiased estimation
Lecture 25 Radial Basis Network (II)
10701 / Machine Learning Today: - Cross validation,
Biointelligence Laboratory, Seoul National University
Robust Full Bayesian Learning for Neural Networks
Parametric Methods Berlin Chen, 2005 References:
12. Principles of Parameter Estimation
Presentation transcript:

MML Inference of RBFs Enes Makalic Lloyd Allison Andrew Paplinski

Presentation Outline RBF architecture selection Existing methods Overview of MML MML87 MML inference of RBFs MML estimators for RBF parameters Results Conclusion Future work

RBF Architecture Selection (1) Determine optimal network architecture for a given problem Involves choosing: Number and type of basis functions Influences the success of the training process If we choose a RBF that is: Too small: poor performance Too large: overfitting

RBF Architecture Selection (2) Poor Performance Overfitting

RBF Architecture Selection (2) Architecture selection solutions Use as many basis functions as there is data Expectation Maximization (EM)  K-means clustering Regression trees (M. Orr)  BIC, GPE, etc. Bayesian inference  Reversible jump MCMC

Overview of MML (1) Objective function to estimate the goodness of a model A sender wishes to send data, x, to a receiver How well is the data encoded? Message length (for example, in bits) SenderReceiver Transmission channel ( noiseless )

Overview of MML (2) Transmit the data in two parts: Part 1: encoding of the model Part 2: encoding of the data given the model Quantitative form of Occam’s razor Hypothesis Data given Hypothesis - log Pr(H)- log Pr(D|H)

Overview of MML (3) MML87 Efficient approximation to strict MML Total message length for a model with parameters :

Overview of MML (4) MML87 is the prior information is the likelihood function is the number of parameters is a dimension constant is the determinant of the expected Fisher information matrix with entries (i, j):

Overview of MML (5) MML87 Fisher Information:  Sensitivity of likelihood function to parameters  Determines the accuracy of stating the model  Small second derivatives state parameters less precisely  Large second derivatives state parameters more accurately A model that minimises the total message length is optimal

MML Inference of RBFs (1) Regression problems We require: A likelihood function Fisher information Priors on all model parameters

MML Inference of RBFs (2) Notation

MML Inference of RBFs (3) RBF Network m inputs, n parameters, o outputs Mapping from parameters to outputs  w: vector of network parameters Network output implicitly depends on the network input vector, Define output non-linearity

MML Inference of RBFs (4) Likelihood function Learning: minimisation of a scalar function We define L as the negative log likelihood  L implicitly depends on given targets, z, for network outputs  Different input-target pairs are considered independent

MML Inference of RBFs (5) Likelihood function Regression problems The network error,, is assumed Gaussian with a mean and variance

MML Inference of RBFs (6) Fisher information Expected Hessian matrix, Jacobian matrix of L Hessian matrix of L

MML Inference of RBFs (7) Fisher information Taking expectations and simplifying we obtain  Positive semi-definite  Complete Fisher includes a summation over the whole data set D We used an approximation to F  Block-diagonal  Hidden basis functions assumed to be independent  Simplified determinant – product of determinants for each block

MML Inference of RBFs (8) Priors Must specify a prior density for each parameter  Centres: uniform  Radii: uniform (log-scale)  Weights: Gaussian Zero mean and standard deviation is usually taken to be large (vague prior)

MML Inference of RBFs (9) Message length of a RBF where:  denotes the cost of transmitting the number of basis functions  F(w) is the determinant of the expected Fisher information  L is the negative log-likelihood  C is a dimension constant Independent of w

MML Inference of RBFs (10) MML estimators for parameters Standard unbiased estimator for the error s.d. Numerical optimisation using  Differentiation of the expected Fisher information determinant

Results (1) MML inference criterion is compared to: Conventional MATLAB RBF implementation M. Orr’s regression tree method Functions used for criteria evaluation Correct answer known Correct answer not known

Results (2) Correct answer known Generate data from a known RBF (one, three and five basis functions respectively) Inputs uniformly sampled in the range (-8,8)  1D and 2D inputs were considered Gaussian noise N(0,0.1) added to the network outputs Training set and test set comprise 100 and 1000 patterns respectively

Results (3) MSE Correct answer known (1D input)

Results (4) MSE Correct answer known (2D inputs)

Results (5) Correct answer not known The following functions were used:

Results (6) Correct answer not known Gaussian noise N(0,0.1) added to the network outputs Training set and test set comprise 100 and 1000 patterns respectively

Results (7)

Results (8)

Results (9) MSE Correct answer not known

Results (10) Sensitivity of criteria to noise

Results (11) Sensitivity of criteria to data set size

Conclusion (1) Novel approach to architecture selection in RBF networks MML87 Block-diagonal Fisher information matrix approximation MATLAB code available from: 

Conclusion (2) Results Initial testing Good performance when level of noise and dataset size is varied No over-fitting Future work Further testing Examine if MML parameter estimators improve performance MML and regularization

Conclusion (3) Questions?

Conclusion (4)