Hierarchical Mixture of Experts Presented by Qi An Machine learning reading group Duke University 07/15/2005.

Hierarchical Mixture of Experts Presented by Qi An Machine learning reading group Duke University 07/15/2005

Outline Background Hierarchical tree structure Gating networks Expert networks E-M algorithm Experimental results Conclusions

Background The idea of mixture of experts First presented by Jacobs and Hintons in 1988 Hierarchical mixture of experts Proposed by Jordan and Jacobs in 1994 Difference from previous mixture model Mixing weights depends on both the input and the output

Example (ME)

One-layer structure Expert Network Gating Network x xxx μ μ1μ1 μ2μ2 μ3μ3 g1g1 g2g2 g3g3 Ellipsoidal Gating function

Example (HME)

Hierarchical tree structure Linear Gating function

Expert network At the leaves of trees for each expert: linear predictor output of the expert link function For example: logistic function for binary classification

Gating network At the nonterminal of the tree top layer: other layer:

Output At the non-leaves nodes top node: other nodes:

Probability model For each expert, assume the true output y is chosen from a distribution P with mean μ ij Therefore, the total probability of generating y from x is given by

Posterior probabilities Since the g ij and g i are computed based only on the input x, we refer them as prior probabilities. We can define the posterior probabilities with the knowledge of both the input x and the output y using Bayes ’ rule

E-M algorithm Introduce auxiliary variables z ij which have an interpretation as the labels that corresponds to the experts. The probability model can be simplified with the knowledge of auxiliary variables

E-M algorithm Complete-data likelihood: The E-step

E-M algorithm The M-step

IRLS Iteratively reweighted least squares alg. An iterative algorithm for computing the maximum likelihood estimates of the parameters of a generalized linear model A special case for Fisher scoring method

Algorithm E-step M-step

Online algorithm This algorithm can be used for online regression For Expert network: where R ij is the inverse covariance matrix for EN(i,j)

Online algorithm For Gating network: where S i is the inverse covariance matrix and where S ij is the inverse covariance matrix

Results Simulated data of a four-joint robot arm moving in three-dimensional space

Results

Conclusions Introduce a tree-structured architecture for supervised learning Much faster than traditional back- propagation algorithm Can be used for on-line learning

Thank you Questions?

Hierarchical Mixture of Experts Presented by Qi An Machine learning reading group Duke University 07/15/2005.

Similar presentations

Presentation on theme: "Hierarchical Mixture of Experts Presented by Qi An Machine learning reading group Duke University 07/15/2005."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Hierarchical Mixture of Experts Presented by Qi An Machine learning reading group Duke University 07/15/2005.

Similar presentations

Presentation on theme: "Hierarchical Mixture of Experts Presented by Qi An Machine learning reading group Duke University 07/15/2005."— Presentation transcript:

Similar presentations

About project

Feedback