Download presentation
Presentation is loading. Please wait.
Published byShannon West Modified over 8 years ago
1
Hierarchical Mixture of Experts Presented by Qi An Machine learning reading group Duke University 07/15/2005
2
Outline Background Hierarchical tree structure Gating networks Expert networks E-M algorithm Experimental results Conclusions
3
Background The idea of mixture of experts First presented by Jacobs and Hintons in 1988 Hierarchical mixture of experts Proposed by Jordan and Jacobs in 1994 Difference from previous mixture model Mixing weights depends on both the input and the output
4
Example (ME)
5
One-layer structure Expert Network Gating Network x xxx μ μ1μ1 μ2μ2 μ3μ3 g1g1 g2g2 g3g3 Ellipsoidal Gating function
6
Example (HME)
7
Hierarchical tree structure Linear Gating function
8
Expert network At the leaves of trees for each expert: linear predictor output of the expert link function For example: logistic function for binary classification
9
Gating network At the nonterminal of the tree top layer: other layer:
10
Output At the non-leaves nodes top node: other nodes:
11
Probability model For each expert, assume the true output y is chosen from a distribution P with mean μ ij Therefore, the total probability of generating y from x is given by
12
Posterior probabilities Since the g ij and g i are computed based only on the input x, we refer them as prior probabilities. We can define the posterior probabilities with the knowledge of both the input x and the output y using Bayes ’ rule
13
E-M algorithm Introduce auxiliary variables z ij which have an interpretation as the labels that corresponds to the experts. The probability model can be simplified with the knowledge of auxiliary variables
14
E-M algorithm Complete-data likelihood: The E-step
15
E-M algorithm The M-step
16
IRLS Iteratively reweighted least squares alg. An iterative algorithm for computing the maximum likelihood estimates of the parameters of a generalized linear model A special case for Fisher scoring method
17
Algorithm E-step M-step
18
Online algorithm This algorithm can be used for online regression For Expert network: where R ij is the inverse covariance matrix for EN(i,j)
19
Online algorithm For Gating network: where S i is the inverse covariance matrix and where S ij is the inverse covariance matrix
20
Results Simulated data of a four-joint robot arm moving in three-dimensional space
21
Results
22
Conclusions Introduce a tree-structured architecture for supervised learning Much faster than traditional back- propagation algorithm Can be used for on-line learning
23
Thank you Questions?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.