Information Geometry: Duality, Convexity, and Divergences

Slides:



Advertisements
Similar presentations
Vector Spaces A set V is called a vector space over a set K denoted V(K) if is an Abelian group, is a field, and For every element vV and K there exists.
Advertisements

Common Variable Types in Elasticity
1 A camera is modeled as a map from a space pt (X,Y,Z) to a pixel (u,v) by ‘homogeneous coordinates’ have been used to ‘treat’ translations ‘multiplicatively’
Differential geometry I
Recovery of affine and metric properties from images in 2D Projective space Ko Dae-Won.
Lecture XXIII.  In general there are two kinds of hypotheses: one concerns the form of the probability distribution (i.e. is the random variable normally.
Visual Recognition Tutorial
WAGOS Conformal Changes of Divergence and Information Geometry Shun-ichi Amari RIKEN Brain Science Institute.
General Relativity Physics Honours 2008 A/Prof. Geraint F. Lewis Rm 557, A29 Lecture Notes 2.
Lecture 1 Linear Variational Problems (Part I). 1. Motivation For those participants wondering why we start a course dedicated to nonlinear problems by.
6. Connections for Riemannian Manifolds and Gauge Theories
PHY 042: Electricity and Magnetism
CS774. Markov Random Field : Theory and Application Lecture 08 Kyomin Jung KAIST Sep
Finsler Geometrical Path Integral Erico Tanaka Palacký University Takayoshi Ootsuka Ochanomizu University of Debrecen WORKSHOP ON.
Hamdy N.Abd-ellah حمدي نور الدين عبد الله Department of Mathematics, Faculty of Science, Assiut University Assiut, Egypt جامعة أم القرى قسم الرياضيات.
1 Reproducing Kernel Exponential Manifold: Estimation and Geometry Kenji Fukumizu Institute of Statistical Mathematics, ROIS Graduate University of Advanced.
KINEMATICS of a ROLLING BALL Wayne Lawton Department of Mathematics National University of Singapore Lecture based on my student’s MSc.
Probabilistic Graphical Models
Vincent Rodgers © Vincent Rodgers © A Very Brief Intro to Tensor Calculus Two important concepts:
Relativity Discussion 4/19/2007 Jim Emery. Einstein and his assistants, Peter Bergmann, and Valentin Bargmann, on there daily walk to the Institute for.
Gauge Fields, Knots and Gravity Wayne Lawton Department of Mathematics National University of Singapore (65)
Elementary Linear Algebra Anton & Rorres, 9th Edition
§1.2 Differential Calculus
RIKEN Brain Science Institute
Dr. Wang Xingbo Fall , 2005 Mathematical & Mechanical Method in Mechanical Engineering.
Chapter 4 Hilbert Space. 4.1 Inner product space.
§1.2 Differential Calculus Christopher Crawford PHY 416G
Positively Expansive Maps and Resolution of Singularities Wayne Lawton Department of Mathematics National University of Singapore
General Relativity Physics Honours 2008 A/Prof. Geraint F. Lewis Rm 560, A29 Lecture Notes 9.
Projective Geometry Hu Zhan Yi. Entities At Infinity The ordinary space in which we lie is Euclidean space. The parallel lines usually do not intersect.
Signal & Weight Vector Spaces
1 VARIOUS ISSUES IN THE LARGE STRAIN THEORY OF TRUSSES STAMM 2006 Symposium on Trends in Applications of Mathematics to Mechanics Vienna, Austria, 10–14.
Chapter 6- LINEAR MAPPINGS LECTURE 8 Prof. Dr. Zafer ASLAN.
Hodge Theory Calculus on Smooth Manifolds. by William M. Faucette Adapted from lectures by Mark Andrea A. Cataldo.
Fisher Information and Applications MLCV Reading Group 3Mar16.
Projective 2D geometry course 2 Multiple View Geometry Comp Marc Pollefeys.
LINE,SURFACE & VOLUME CHARGES
Mathematics-I J.Baskar Babujee Department of Mathematics
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Differential of a function
(MTH 250) Calculus Lecture 22.
Proving that a Valid Inequality is Facet-defining
CS Visual Recognition Projective Geometry Projective Geometry is a mathematical framework describing image formation by perspective camera. Under.
Differential Manifolds and Tensors
Polyhedron Here, we derive a representation of polyhedron and see the properties of the generators. We also see how to identify the generators. The results.
Quantum One.
Basis and Dimension Basis Dimension Vector Spaces and Linear Systems
Polyhedron Here, we derive a representation of polyhedron and see the properties of the generators. We also see how to identify the generators. The results.
Chapter IV Gauge Field Lecture 1 Books Recommended:
Quantum One.
Introduction to linear Lie groups
2.III. Basis and Dimension
Stability Analysis of Linear Systems
Enumerating All Nash Equilibria for Two-person Extensive Games
Affine Spaces Def: Suppose
I.4 Polyhedral Theory (NW)
Quantum Foundations Lecture 3
Linear Algebra Lecture 24.
Back to Cone Motivation: From the proof of Affine Minkowski, we can see that if we know generators of a polyhedral cone, they can be used to describe.
I.4 Polyhedral Theory.
Proving that a Valid Inequality is Facet-defining
Physics 451/551 Theoretical Mechanics
Differential Geometry
9. Two Functions of Two Random Variables
Chapter 5: Morse functions and function-induced persistence
Row-equivalences again
Data Exploration and Pattern Recognition © R. El-Yaniv
Physics 451/551 Theoretical Mechanics
Linear Vector Space and Matrix Mechanics
Computer Aided Geometric Design
Presentation transcript:

Information Geometry: Duality, Convexity, and Divergences   Information Geometry: Duality, Convexity, and Divergences Jun Zhang* University of Michigan Ann Arbor, Michigan 48104 junz@umich.edu *Currently on leave to AFOSR under IPA   

Lecture Plan A revisit to Bregman divergence Generalization (a-divergence on Rn) and a-Hessian geometry 3) Embedding into infinite-dimensional function space 4) Generalized Fish metric and a-connection on Banach space Clarify two senses of duality in information geometry: Reference duality: choice of the reference vs comparison point on the manifold; Representational duality: choice of a monotonic scaling of density function;

Bregman Divergence i) Quadri-lateral relation: Triangular relation (generalized cosine) as a special case: ii) Reference-representation biduality:

Canonical Divergence and Fenchel Inequality or explicitly: An alternative expression of Bregman divergence is canonical divergence That A is non-negative is a direct consequence of the Fenchel inequality for a strictly convex function: where equality holds if and only if

Convex Inequality and a-Divergence Induced by it By the definition of a strictly convex function F, It is easy to show that the following is non-negative for all , Conjugate-symmetry: Easily verifiable:

Significance of Bregman Divergence Among a-Divergence Family Proposition: For a smooth function F: Rn -> R, the following are equivalent:

Statistical Manifold Structure Induced From Expanding D(x,y) around x=y: i) 2nd order: one (and the same) metric ii) 3rd order: a pair of conjugated connections Statistical Manifold Structure Induced From Divergence Function (Eguchi, 1983) Given a divergence D(x,y), with D(x,x)=0. One can then derive the Riemannian metric and a pair of conjugate connections: In essence, is satisfied by such identification of derivatives of D.

a-Hessian Geometry (of Finite-Dimension Vector Space) Theorem. D(a) induces the a-Hessian manifold, i.e. i) The metric and conjugate affine connections are given by: ii) Riemann curvature is given by:

iii) The manifold is equi-affine, with the Tchebychev potential given by: and a-parallel volume form given by iv) There exists biorthogonal coordinates: with

From Vector Space to Function Space Question: How to extend the above analysis to infinite-dimensional function space? A General Divergence Function(al) for any two functions in some function space, and an arbitrary, strictly increasing function . Remark: Induced by convex inequality

A Special Case of D(a): Classic a-Divergence For parameterized pdf’s, such divergence induces an a-independent metric, but a-dependent dual connections:

Other Examples of D(a) Jensen Difference U-Divergence (a=1)

A Short Detour: Monotone Scaling Define monotone embedding (“scaling”) of a measurable function p as the transformation r(p), where is a strictly monotone function. A Short Detour: Monotone Scaling Therefore, monotone embeddings of a given probability density function form a group, with functional composition as group operation: ii) r(t) = t as the identity element; iii) r1, r2 are strictly monotone, so is i) r is strictly monotone iff r-1 is strictly monotone; Observe: We recall that for a strictly convex function f :

DEFINITION: r-embedding is said to be conjugated to t-embedding with respect to a strictly convex function f (whose conjugate is f*) if : Example: a-embedding

Parameterized Functions as Forming a Submanifold under Monotone Scaling A sub-manifold is said to be r-affine if there exists a countable set of linearly independent functions li(z) over a measurable space such that: Here, q is called the “natural parameter”. The “expectation parameter” is defined by projecting the conjugated t-embedding onto the li(z): Example: For log-linear model (exponential family) The expectation parameter is:

Proposition. For the r-affine submanifold: i) The following potential function is strictly convex: F(q) is called the generating (partition) functional. ii) Define, under the conjugate representations then is Fenchel conjugate of . F*(h) is called the generalized entropy functional. Theorem. The r-affine submanifold is a-Hessian manifold.

An Application: the (a,b)-Divergence a: parameter reflecting reference duality b: parameter reflecting representation duality An Application: the (a,b)-Divergence Take f=r-(b), where: called “alpha-embedding”, now denoted by b. They reduce to a-divergence proper A(a) and to Jensen difference E(a) :

Information Geometry on Banach Space Proposition 1. Denote tangent vector fields which are, at given p on the manifold, themselves functions in Banach space. The metric and dual connections induced by take the forms: Written in dually symmetric form:

The metric and dual connections associated with are given by: Corollary 1a. For a finite-dimensional submanifold (parametric model), with Remark: Choosing reduces to the forms of Fisher metric and the a-connections in classical parametric information geometry, where

Proposition 2. The curvature R(a) and torsion tensors T(a) associated with any a-connection on the infinite-dimensional function space B are identically zero. Remark: The ambient space B is flat, so it embeds, as proper submanifolds, the manifold Mm of probability density functions (constrained to be positive-valued and normalized to unit measure); the finite-dimensional manifold Mq of parameterized probability models. B (ambient manifold) Mm Mq CAVEAT: Topology? (G. Pistone and his colleagues)

Proposition 3. The (a,b)-divergence for the parametric models gives rise to the Fisher metric proper and alpha-connections proper: Remark: The (a,b)-divergence is the homogeneous f-divergence As such, it should reproduce the standard Fisher metric and the dual alpha- connections in their proper form. Again, it is the ab that takes the role of the conventional “alpha” parameter.

Summary of Current Approach Geometry Riemannian metric Fisher information Conjugate connections a-connection family Equi-affine structure cubic form, Tchebychev 1-form Curvature Divergence a-divergence equiv to d-divergence (Zhu & Rohwer, 1985) includes KL divergence as a special case f-divergence (Csiszar) Bregman divergence equivalent to the canonical divergence U-divergence (Eguchi) Summary of Current Approach Convex-based a-divergence for vector space of finite dim function space of infinite dim Generalized expressions of Fisher metric a-connections

References Zhang, J. (2004). Divergence function, duality, and convex analysis. Neural Computation, 16: 159-195. Zhang, J. (2005) Referential duality and representational duality in the scaling of multidimensional and infinite-dimensional stimulus space. In Dzhafarov, E. and Colonius, H. (Eds.) Measurement and representation of sensations: Recent progress in psychological theory. Lawrence Erlbaum Associates, Mahwah, NJ. Zhang, J. and Hasto, P. (2006) Statistical manifold as an affine space: A functional equation approach. Journal of Mathematical Psychology, 50: 60-65. Zhang, J. (2006). Referential duality and representational duality on statistical manifolds. Proceedings of the Second International Symposium on Information Geometry and Its Applications, Tokyo (pp 58-67). Zhang J. (2007). A note on curvature of a-connections of a statistical manifold. Annals of the Institute of Statistical Mathematics. 59, 161-170. Zhang, J. and Matsuzuo, H. (in press). Dualistic differential geometry associated with a convex function. To appear in a special volume in the Springer series of Advances in Mechanics and Mathematics. Zhang, J. (under review) Nonparametric information geometry: Referential duality and representational duality on statistical manifolds.

Questions?