Convex Point Estimation using Undirected Bayesian Transfer Hierarchies Gal Elidan, Ben Packer, Geremy Heitz, Daphne Koller Computer Science Dept. Stanford.

Convex Point Estimation using Undirected Bayesian Transfer Hierarchies Gal Elidan, Ben Packer, Geremy Heitz, Daphne Koller Computer Science Dept. Stanford University UAI 2008 Presented by Haojun Chen August 1 st, 2008

Outline Background and motivation Undirected transfer hierarchies Experiments Degree of transfer coefficients Experiments Summary

Background (1/2) Transfer learning Data from “similar” tasks/distributions are used to compensate for the sparsity of training data in primary class or task Example: Use rhinos to help learn elephants’ shape Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

Hierarchical Bayes (HB) framework Principled approach for transfer learning Background (2/2) Example of a hierarchical Bayes parameterization where : a set of related learning tasks/classes : observed data : task/class parameters Joint distribution over the observed data and all class parameters as follows: Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

Motivation In practice, point estimation of the MAP is desirable, for full Bayesian computations can be difficult and computationally demanding Efficient point estimation may not be achieved in many standard hierarchical Bayes models, because many common conjugate priors such as the Dirichlet or normal-inverse-Wishart are not convex with respect to the parameters In this paper, an undirected hierarchical Bayes(HB) reformulation is proposed to allow efficient point estimation

Undirected HB Reformulation : data-dependent objective : divergence function over child and parent parameters → 0 : encourages parameters to explain data →∞ : encourages parameters to be similar to parents Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

Purpose of Reformulation Easy to specify –F data can be likelihood, classification, or other objective –Divergence can be L1-norm, L2-norm,  -insensitive loss, KL divergence, etc. –No conjugacy or proper prior restrictions Easy to optimize –Convex over  if F data is concave and Divergence is convex Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

Bag-of-words model F data : Multinomial log likelihood (regularized) : frequency of word i Divergence: L2 norm Experiment: Text categorization Newsgroup20 Dataset Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

Text categorization Result 75150225300375 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 Classification Rate Total Number of Training Instances Newsgroup Topic Classification Max Likelihood (No regularization) Regularized Max Likelihood Shrinkage Undirected HB Baseline: Maximum likelihood at each node (no hierarchy) Cross-validate regularization (no hierarchy) Shrinkage (McCallum et al. ’98, with hierarchy) Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

(Density estimation – test likelihood) Instances represented by 60 x-y coordinates of landmarks on outline Divergence: L2 norm over mean and variance Experiment: Shape Modeling Mean landmark location Covariance over landmarks Regularization Mammals Dataset (Fink, ’05) Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

Undirect HB Shape Modeling Result 6102030 -350 -300 -250 -200 -150 -100 -50 0 50 Total Number of Training Instances Delta log-loss / instance Mammal Pairs Regularized Max Likelihood Elephant-Rhino Bison-Rhino Elephant-Bison Elephant-Rhino Giraffe-Bison Giraffe-Elephant Giraffe-Rhino Llama-Bison Llama-Elephant Llama-Giraffe Llama-Rhino Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

Problem in Transfer Not all parameters deserve equal sharing Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

 is split into subcomponents with weights  and hence different strengths are allowed for different subcomponents, child-parent pairs Degrees of Transfer (DOT) → 0 : forces parameters to agree →∞ : allows parameters to be flexible Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

Estimation of DOT Parameters Hyper-prior approach Bayesian idea: Put prior on and add as parameter to optimization along with Concretely: inverse-Gamma prior (forced to be positive) Prior on Degree of Transfer Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

DOT Shape Modeling Result 6102030 -15 -10 -5 0 5 10 15 Total Number of Training Instances Delta log-loss / instance Mammal Pairs Bison-Rhino Elephant-Bison Elephant-Rhino Giraffe-Bison Giraffe-Elephant Giraffe-Rhino Llama-Bison Llama-Elephant Llama-Giraffe Llama-Rhino Regularized Max Likelihood Elephant-Rhino Hyperprior Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

Distribution of DOT coefficients 1/ Stronger transfer Weaker transfer  root 05101520253035404550 0 2 4 6 8 10 12 14 16 18 20 Distribution of DOT coefficients using Hyperprior approach Resources: http://velblod.videolectures.net/2008/pascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_01.ppt

Summary Undirected reformulation of the hierarchical Bayes framework is proposed for efficient convex point estimation Different degrees of transfer for different parameters are introduced so that some parts of the distribution can be transferred to a greater extent than others

Convex Point Estimation using Undirected Bayesian Transfer Hierarchies Gal Elidan, Ben Packer, Geremy Heitz, Daphne Koller Computer Science Dept. Stanford.

Similar presentations

Presentation on theme: "Convex Point Estimation using Undirected Bayesian Transfer Hierarchies Gal Elidan, Ben Packer, Geremy Heitz, Daphne Koller Computer Science Dept. Stanford."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Convex Point Estimation using Undirected Bayesian Transfer Hierarchies Gal Elidan, Ben Packer, Geremy Heitz, Daphne Koller Computer Science Dept. Stanford.

Similar presentations

Presentation on theme: "Convex Point Estimation using Undirected Bayesian Transfer Hierarchies Gal Elidan, Ben Packer, Geremy Heitz, Daphne Koller Computer Science Dept. Stanford."— Presentation transcript:

Similar presentations

About project

Feedback