Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hierarchical Mixture of Experts Presented by Qi An Machine learning reading group Duke University 07/15/2005.

Similar presentations


Presentation on theme: "Hierarchical Mixture of Experts Presented by Qi An Machine learning reading group Duke University 07/15/2005."— Presentation transcript:

1 Hierarchical Mixture of Experts Presented by Qi An Machine learning reading group Duke University 07/15/2005

2 Outline Background Hierarchical tree structure Gating networks Expert networks E-M algorithm Experimental results Conclusions

3 Background The idea of mixture of experts First presented by Jacobs and Hintons in 1988 Hierarchical mixture of experts Proposed by Jordan and Jacobs in 1994 Difference from previous mixture model Mixing weights depends on both the input and the output

4 Example (ME)

5 One-layer structure Expert Network Gating Network x xxx μ μ1μ1 μ2μ2 μ3μ3 g1g1 g2g2 g3g3 Ellipsoidal Gating function

6 Example (HME)

7 Hierarchical tree structure Linear Gating function

8 Expert network At the leaves of trees for each expert: linear predictor output of the expert link function For example: logistic function for binary classification

9 Gating network At the nonterminal of the tree top layer: other layer:

10 Output At the non-leaves nodes top node: other nodes:

11 Probability model For each expert, assume the true output y is chosen from a distribution P with mean μ ij Therefore, the total probability of generating y from x is given by

12 Posterior probabilities Since the g ij and g i are computed based only on the input x, we refer them as prior probabilities. We can define the posterior probabilities with the knowledge of both the input x and the output y using Bayes ’ rule

13 E-M algorithm Introduce auxiliary variables z ij which have an interpretation as the labels that corresponds to the experts. The probability model can be simplified with the knowledge of auxiliary variables

14 E-M algorithm Complete-data likelihood: The E-step

15 E-M algorithm The M-step

16 IRLS Iteratively reweighted least squares alg. An iterative algorithm for computing the maximum likelihood estimates of the parameters of a generalized linear model A special case for Fisher scoring method

17 Algorithm E-step M-step

18 Online algorithm This algorithm can be used for online regression For Expert network: where R ij is the inverse covariance matrix for EN(i,j)

19 Online algorithm For Gating network: where S i is the inverse covariance matrix and where S ij is the inverse covariance matrix

20 Results Simulated data of a four-joint robot arm moving in three-dimensional space

21 Results

22 Conclusions Introduce a tree-structured architecture for supervised learning Much faster than traditional back- propagation algorithm Can be used for on-line learning

23 Thank you Questions?


Download ppt "Hierarchical Mixture of Experts Presented by Qi An Machine learning reading group Duke University 07/15/2005."

Similar presentations


Ads by Google