Presentation is loading. Please wait.

Presentation is loading. Please wait.

Convex Point Estimation using Undirected Bayesian Transfer Hierarchies Gal Elidan, Ben Packer, Geremy Heitz, Daphne Koller Computer Science Dept. Stanford.

Similar presentations


Presentation on theme: "Convex Point Estimation using Undirected Bayesian Transfer Hierarchies Gal Elidan, Ben Packer, Geremy Heitz, Daphne Koller Computer Science Dept. Stanford."— Presentation transcript:

1 Convex Point Estimation using Undirected Bayesian Transfer Hierarchies Gal Elidan, Ben Packer, Geremy Heitz, Daphne Koller Computer Science Dept. Stanford University UAI 2008 Presented by Haojun Chen August 1 st, 2008

2 Outline Background and motivation Undirected transfer hierarchies Experiments Degree of transfer coefficients Experiments Summary

3 Background (1/2) Transfer learning Data from “similar” tasks/distributions are used to compensate for the sparsity of training data in primary class or task Example: Use rhinos to help learn elephants’ shape Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

4 Hierarchical Bayes (HB) framework Principled approach for transfer learning Background (2/2) Example of a hierarchical Bayes parameterization where : a set of related learning tasks/classes : observed data : task/class parameters Joint distribution over the observed data and all class parameters as follows: Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

5 Motivation In practice, point estimation of the MAP is desirable, for full Bayesian computations can be difficult and computationally demanding Efficient point estimation may not be achieved in many standard hierarchical Bayes models, because many common conjugate priors such as the Dirichlet or normal-inverse-Wishart are not convex with respect to the parameters In this paper, an undirected hierarchical Bayes(HB) reformulation is proposed to allow efficient point estimation

6 Undirected HB Reformulation : data-dependent objective : divergence function over child and parent parameters → 0 : encourages parameters to explain data →∞ : encourages parameters to be similar to parents Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

7 Purpose of Reformulation Easy to specify –F data can be likelihood, classification, or other objective –Divergence can be L1-norm, L2-norm,  -insensitive loss, KL divergence, etc. –No conjugacy or proper prior restrictions Easy to optimize –Convex over  if F data is concave and Divergence is convex Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

8 Bag-of-words model F data : Multinomial log likelihood (regularized) : frequency of word i Divergence: L2 norm Experiment: Text categorization Newsgroup20 Dataset Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

9 Text categorization Result 75150225300375 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 Classification Rate Total Number of Training Instances Newsgroup Topic Classification Max Likelihood (No regularization) Regularized Max Likelihood Shrinkage Undirected HB Baseline: Maximum likelihood at each node (no hierarchy) Cross-validate regularization (no hierarchy) Shrinkage (McCallum et al. ’98, with hierarchy) Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

10 (Density estimation – test likelihood) Instances represented by 60 x-y coordinates of landmarks on outline Divergence: L2 norm over mean and variance Experiment: Shape Modeling Mean landmark location Covariance over landmarks Regularization Mammals Dataset (Fink, ’05) Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

11 Undirect HB Shape Modeling Result 6102030 -350 -300 -250 -200 -150 -100 -50 0 50 Total Number of Training Instances Delta log-loss / instance Mammal Pairs Regularized Max Likelihood Elephant-Rhino Bison-Rhino Elephant-Bison Elephant-Rhino Giraffe-Bison Giraffe-Elephant Giraffe-Rhino Llama-Bison Llama-Elephant Llama-Giraffe Llama-Rhino Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

12 Problem in Transfer Not all parameters deserve equal sharing Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

13  is split into subcomponents with weights  and hence different strengths are allowed for different subcomponents, child-parent pairs Degrees of Transfer (DOT) → 0 : forces parameters to agree →∞ : allows parameters to be flexible Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

14 Estimation of DOT Parameters Hyper-prior approach Bayesian idea: Put prior on and add as parameter to optimization along with Concretely: inverse-Gamma prior (forced to be positive) Prior on Degree of Transfer Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

15 DOT Shape Modeling Result 6102030 -15 -10 -5 0 5 10 15 Total Number of Training Instances Delta log-loss / instance Mammal Pairs Bison-Rhino Elephant-Bison Elephant-Rhino Giraffe-Bison Giraffe-Elephant Giraffe-Rhino Llama-Bison Llama-Elephant Llama-Giraffe Llama-Rhino Regularized Max Likelihood Elephant-Rhino Hyperprior Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

16 Distribution of DOT coefficients 1/ Stronger transfer Weaker transfer  root 05101520253035404550 0 2 4 6 8 10 12 14 16 18 20 Distribution of DOT coefficients using Hyperprior approach Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

17 Summary Undirected reformulation of the hierarchical Bayes framework is proposed for efficient convex point estimation Different degrees of transfer for different parameters are introduced so that some parts of the distribution can be transferred to a greater extent than others


Download ppt "Convex Point Estimation using Undirected Bayesian Transfer Hierarchies Gal Elidan, Ben Packer, Geremy Heitz, Daphne Koller Computer Science Dept. Stanford."

Similar presentations


Ads by Google