We use Numerical continuation Bifurcation theory with symmetries to analyze a class of optimization problems of the form max F(q, )=max (G(q)+ D(q)). The goal is to solve for = B (0, ), where:. G and D are infinitely differentiable in . G is strictly concave. D is convex. G and D must be invariant under relabeling of the classes. The hessian of F is block diagonal with N blocks {B } and B =B if q(z |y)= q(z |y) for every y Y. Problems in this Class Deterministic Annealing (Rose 1998) max H(Z|Y) - D(Y,Z) Clustering Algorithm Rate Distortion Theory (Shannon ~1950) max –I(Y,Z) - D(Y,Z) Optimal Source Coding Information Distortion (Dimitrov and Miller2001) max H(Z|Y) + I(X,Z) Used in neural coding. Information Bottleneck Method (Tishby, Pereira, Bialek 2000) max –I(Y,Z) + I(X,Z) Used for document classification, gene expression, neural coding and spectral analysis A Class of Problems
2 H(X) input sequences 2 H(Y) output sequences 2 I(X,Y) distinguishable input/output classes of (x,y) pairs Y X Size of an input/output class: 2 (H(X|Y) + H(Y|X)) pairs Rate Distortion How well is the source X represented by Z? Information Distortion Goal: Determine the input/output classes of (x,y) pairs. Idea: We seek to quantize (X,Y) into clusters which correspond with the input/output classes. Method: We determine a quantizer, Q *, between X and Z, a representation of Y using N elements, such that F(Q*,B) is a maximum for some B (0, ). X Y P(Y |X) input source output source Z clustered outputs q * (Z |Y) Q * (Z |X) X p(X) Z is a representation of X using N symbols (or clusters) A good communication system has p(X,Y) like:
Some nice properties of the problem The feasible region , a product of simplices, is nice. Lemma is the convex hull of vertices ( ). The optimal quantizer q* is DETERMINISTIC. Theorem The extrema of lie generically on the vertices of .. Corollary The optimal quantizer is invariant to small perturbations in the model. Solution of the problem when p(X,Y):= 4 gaussian blobs p(X,Y)I(X,Z) vs. N
Goal: To efficiently solve max q (G(q) + D(q)) for each , incremented in sufficiently small steps, as B. Method: Study the equilibria of the of the flow The Jacobian wrt q of the K constraints { z q(z|y)-1} is J = (I K I K … I K ). The first equilibrium is q*( 0 = 0) 1/N.. determines stability and location of bifurcation. Assumptions: Let q * be a local solution to and fixed by S M. Call the M identical blocks of q F (q *, ): B. Call the other N-M blocks of q F (q *, ): {R }. At a singularity (q *, *, * ), B has a single nullvector v and R is nonsingular for every . If M<N, then B R -1 + MI K is nonsingular. Theorem: If q, L (q *, *, * ) is singular then q F (q *, * ) is singular. Theorem: (q *, *, * ) is a bifurcation of equilibria of if and only if q, L (q *, *, * ) is singular. Theorem: If (q *, *, * ) is a bifurcation of equilibria of, then * 1. Theorem: dim (ker q F (q *, * )) = M with basis vectors w 1,w 2, …, w M Theorem: dim (ker q, L (q *, *, * )) = M-1 with basis vectors The Dynamical System
Continuation A local maximum q k * ( k ) of is an equilibrium of the gradient flow. Initial condition q k+1 (0) ( k+1 (0) ) is sought in the tangent direction q k, which is found by solving the matrix system The continuation algorithm used to find q k+1 * ( k+1 ) is based on Newton’s method. How: Use numerical continuation in a constrained system to choose and to choose an initial guess to find the equilibria q*( ). Use bifurcation theory with symmetries to understand bifurcations of the equilibria. Investigating the Dynamical System
Bifurcations of q * ( ) Observed Bifurcations for the 4 Blob Problem Conceptual Bifurcation Structure q* (Y N |Y) Bifurcations with symmetry To better understand the bifurcation structure, we capitalize on the symmetries of the optimization function F(q, ). The “obvious” symmetry is that F(q, ) is invariant to relabeling of the N classes of Z The symmetry group of all permutations on N symbols is S N. The action of S N on and q, L (q,, ) is represented by the finite Lie Group where P is a “block permutation” matrix. The symmetry of is measured by its isotropy group, the subgroup of which fixes it.
The Equivariant Branching Lemma gives the existence of bifurcating solutions for every isotropy subgroup which fixes a one dimensional subspace of ker q, L (q *,, ). Theorem: Let (q *, *, * ) be a singular point of the flow such that q * is fixed by S M. Then there exists M bifurcating solutions, (q *, *, * ) + (tu k,0, (t)), each with isotropy group S M-1, where What do the bifurcations look like? Let T(q*, *) = Transcritical or Degenerate? Theorem: If T(q*, *) 0 and M>2, then the bifurcation at (q*, *) is transcritical. If T(q*, *) = 0, it is degenerate. Branch Orientation? Theorem: If T(q*, *) > 0 or if T(q*, *) < 0, then the branch is supercritical or subcritical respectively. If T(q*, *) = 0, then 4 qqqq F(q, ) dictates orientation. Branch Stability? Theorem: If T(q*, *) 0, then all branches fixed by S M-1 are unstable. Bifurcation Structure
Partial lattice of the isotropy subgroups of S 4 (and associated bifurcating directions) For the 4 blob problem: The isotropy subgroups and bifurcating directions of the observed bifurcating branches isotropy group: S 4 S 3 S 2 1 bif direction: (-v,-v,3v,-v,0) T (-v,2v,0,-v,0) T (-v,0,0,v,0) T …No more bifs!
The Smoller-Wasserman Theorem ascertains the existence of bifurcating branches for every maximal isotropy subgroup. Theorem: If M is a composite number, then there exists bifurcating solutions with isotropy group for every element of order M in and every prime p|M. The bifurcating direction is in the p-1 dimensional subspace of ker q, L (q *,, ) which is fixed by. We have never numerically observed solutions fixed by and so perhaps they are unstable. Other Branches An example of redundancy: (1423) 2 = (1324) 2 = (12)(34) The full lattice of subgroups of the group S M is not known for arbitrary M. Lattice of the maximal isotropy subgroups in S 4
The efficient algorithm to solve max F(q, ) Let q 0 be the maximizer of max q G(q), 0 =1 and s > 0. For k 0, let (q k, k ) be a solution to max q (G(q) + D(q )). Iterate the following steps until K = B for some K. 1.Perform -step: solve for and select k+1 = k + d k where d k = s /(|| q k || 2 + || k || 2 +1) 1/2. 2.The initial guess for q k+1 at k+1 is q k+1 (0) = q k + d k q k. 3.Optimization: solve max q (G(q) + k+1 D(q)) to get the maximizer q * k+1, using initial guess q k+1 (0). 4.Check for bifurcation: compare the sign of the determinant of an identical block of each of q [G(q k ) + k D(q k )] and q [G(q k+1 ) + k+1 D(q k+1 )]. If a bifurcation is detected, then set q k+1 (0) = q k + d k u where u is given by and repeat step 3.