Supervised learning: Mixture Of Experts (MOE) Network.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

K Means Clustering , Nearest Cluster and Gaussian Mixture
Supervised Learning Recap
Naïve Bayes Classifier
Supervised Learning: Linear Perceptron NN. Distinction Between Approximation- Based vs. Decision-Based NNs Teacher in Approximation-Based NN are quantitative.
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Neural Networks II CMPUT 466/551 Nilanjan Ray. Outline Radial basis function network Bayesian neural network.
INTRODUCTION TO Machine Learning 2nd Edition
Pattern Recognition and Machine Learning
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
Object Class Recognition Using Discriminative Local Features Gyuri Dorko and Cordelia Schmid.
Lecture #1COMP 527 Pattern Recognition1 Pattern Recognition Why? To provide machines with perception & cognition capabilities so that they could interact.
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
Competent Undemocratic Committees Włodzisław Duch, Łukasz Itert and Karol Grudziński Department of Informatics, Nicholas Copernicus University, Torun,
Kernel Methods Part 2 Bing Han June 26, Local Likelihood Logistic Regression.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Supervised Learning Networks. Linear perceptron networks Multi-layer perceptrons Mixture of experts Decision-based neural networks Hierarchical neural.
Aula 4 Radial Basis Function Networks
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Introduction to machine learning
8/10/ RBF NetworksM.W. Mak Radial Basis Function Networks 1. Introduction 2. Finding RBF Parameters 3. Decision Surface of RBF Networks 4. Comparison.
Review of Lecture Two Linear Regression Normal Equation
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
This week: overview on pattern recognition (related to machine learning)
EM and expected complete log-likelihood Mixture of Experts
Data Dependence in Combining Classifiers Mohamed Kamel PAMI Lab University of Waterloo.
The Perceptron. Perceptron Pattern Classification One of the purposes that neural networks are used for is pattern classification. Once the neural network.
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
Overview of the final test for CSC Overview PART A: 7 easy questions –You should answer 5 of them. If you answer more we will select 5 at random.
Bayesian Generalized Kernel Mixed Models Zhihua Zhang, Guang Dai and Michael I. Jordan JMLR 2011.
Other NN Models Reinforcement learning (RL)
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.
NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Intro. ANN & Fuzzy Systems Lecture 15. Pattern Classification (I): Statistical Formulation.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
Intro. ANN & Fuzzy Systems Lecture 13. MLP (V): Speed Up Learning.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.
Intro. ANN & Fuzzy Systems Lecture 38 Mixture of Experts Neural Network.
Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.
Pattern Recognition NTUEE 高奕豪 2005/4/14. Outline Introduction Definition, Examples, Related Fields, System, and Design Approaches Bayesian, Hidden Markov.
Hierarchical Mixture of Experts Presented by Qi An Machine learning reading group Duke University 07/15/2005.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
Machine Learning for Computer Security
Lecture 15. Pattern Classification (I): Statistical Formulation
CEE 6410 Water Resources Systems Analysis
Overview of Supervised Learning
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Course Outline MODEL INFORMATION COMPLETE INCOMPLETE
Multitask Learning Using Dirichlet Process
Statistical Models for Automatic Speech Recognition
NORPIE 2004 Trondheim, 14 June Automatic bearing fault classification combining statistical classification and fuzzy logic Tuomo Lindh Jero Ahola Petr.
The Naïve Bayes (NB) Classifier
Introduction to Radial Basis Function Networks
Parametric Methods Berlin Chen, 2005 References:
Mathematical Foundations of BME
Presentation transcript:

Supervised learning: Mixture Of Experts (MOE) Network

MOE Module Local Expert Gating Network Local Expert Local Expert P(y|x,  P(y|x,    P(y|x,    P(y|x,    x a  (x  a  (x  a  (x 

P( y | x, Φ) = Σ j P( y | x, Θ j ) a j ( x ) For a given input x, the posterior probability of generating class y given x using K experts can be computed as The objective is to estimate the model parameters so as to attain the highest probability of the training set given the estimated parameters.

Each RBF Gaussian kernel can be viewed as an local expert. MAXNET GatingNET

MAXNET MOE Classifier P(ω c |x,E k ) Σ k P(E k |x)P(ω c |x,E k ) ω winner P(E k |x,)

Given a pattern, each expert network estimates the pattern's conditional a posteriori probability on the (adaptively-tuned or pre- assigned) feature space. Each local expert network performs multi-way classification over K classes by using either K independent binomial model, each modeling only one class, or one multinomial model for all classes. Mixture of Experts The MOE [Jacobs91] exhibits an explicit relationship with statistical pattern classification methods as well as a close resemblance to fuzzy inference systems.

Two Components of MOE local experts: gating network:

The design of modular neural networks hinges upon the choice of local experts. Usually, a local expert is adaptively trained to extract a certain it local feature particularly relevant to its local decision. Sometimes, a local expert can be assigned a predetermined feature space. Based on the local feature, a local expert gives its local recommendation. Local Experts

LBF vs RBF Local Expertss MLP RBF HyperplaneKernel function

Mixture of Experts Class 1 Class 2

Mixture of Experts Expert #1 Expert #2

The gating network serves the function of computing the proper weights to be used for the final weighted decision. A probabilistic rule is used to integrate recommendations from several local experts taking into account the experts' confidence levels. Gating Network

The training of the local experts as well as (the confidence levels in) the gating network of the MOE network is based on the expectation- maximization (EM) algorithm.