Presentation is loading. Please wait.

Presentation is loading. Please wait.

Guillaume Bouchard Xerox Research Centre Europe

Similar presentations


Presentation on theme: "Guillaume Bouchard Xerox Research Centre Europe"— Presentation transcript:

1 Guillaume Bouchard Xerox Research Centre Europe
Efficient Bounds for the Softmax Function Applications to Inference in Hybrid Models Guillaume Bouchard Xerox Research Centre Europe

2 Deterministic Inference in Hybrid Graphical Models
X1 X2 X3 X4 Y1 X5 Y2 Y3 X0 Discrete variables with continuous* parents No sufficient statistic No conjugate distribution Intractable inference Approximate deterministic inference Local sampling Deterministic approximations Gaussian quadrature delta method Laplace approximation Maximize a lower bound to the variational free energy Discrete variable Continuous variable Observed variable Hidden variable *or a large number of discrete parents December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe

3 Variational inference
X1i X2i β 1 β2 Yi Data i Focus on Bayesian multinomial logistic regression Mean field approximation Discrete variable Continuous variable Observed variable Hidden variable Q belongs to an approximation family upper bound? max upper bound? December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe

4 Bounding the log-partition function (1)
Binary case dimension: classical bound [Jordan and Jaakkola] We propose its multiclass extension December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe

5 Bounding the log-partition function (2)
K=2 K=10 December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe

6 Guillaume Bouchard, Xerox Research Center Europe
Other upper bounds Concavity of the log [e.g. Blei et al.] Worst curvature [Bohning] Bound using hyperbolic cosines [Jebara] Local approximation [Gibbs] not proved to be an upper bound December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe

7 Guillaume Bouchard, Xerox Research Center Europe
Proof Idea: Expand the product of inverted sigmoids Upper-bounded by K quadratic upper bounds Lower bounded by a linear function (log-convexity of f) Proof: apply Jensen inequality to December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe

8 Bounds on the Expectation
Exponential bound Quadratic bound simulations December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe

9 Bayesian multinomial logistic regression
Exponential bound Cannot be maximized in closed form gradient-based optimization Fixed point equation (unstable !) Quadratic bound Analytic update: December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe

10 Numerical experiments
Iris dataset 4 dimensions 3 classes Prior: unit variance Experiment Learning: Batch updates Compared to MCMC estimation based on 100K samples Error = Euclidian distance between the mean and variance parameters Results The “worse curvature” bound is more faster and better… December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe

11 Guillaume Bouchard, Xerox Research Center Europe
Conclusion Multinomial links in graphical models are feasible Existing bound work well We can expect further improvements Remark better bounds are only needed for the Bayesian setting For MAP estimation, even a loose bound converge Future work Application to discriminative learning Mixture-based mean-field approximation December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe

12 Guillaume Bouchard, Xerox Research Center Europe
December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe

13 Guillaume Bouchard, Xerox Research Center Europe
Backup slides December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe

14 Numerical experiments
Iris dataset 4 dimensions 3 classes Prior: unit variance Experiment Learning: Batch updates Compared to MCMC estimation based on 100K samples Error = Euclidian distance between the mean and variance parameters Results The “worse curvature” bound is more faster and better… December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe

15 Numerical experiments
Iris dataset 4 dimensions 3 classes Prior: unit variance Experiment Learning: Batch updates Compared to MCMC estimation based on 100K samples Error = Euclidian distance between the mean and variance parameters Results The “worse curvature” bound is more faster and better… December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe

16 Guillaume Bouchard, Xerox Research Center Europe
December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe

17 Guillaume Bouchard, Xerox Research Center Europe
Jebara’s bound One dimension: Hyperbolic cosine bound Multi-dimensional case December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe


Download ppt "Guillaume Bouchard Xerox Research Centre Europe"

Similar presentations


Ads by Google