Symmetry Breaking Bifurcations of the Information Distortion Dissertation Defense April 8, 2003 Albert E. Parker III Complex Biological Systems Department of Mathematical Sciences Center for Computational Biology Montana State University
Goal: Solve the Information Distortion Problem The goal of my thesis is to solve the Information Distortion problem, an optimization problem of the form max q G(q) constrained by D(q) D 0 where is a subset of R n. G and D are sufficiently smooth in . G and D have symmetry: they are invariant to some group action. Problems of this form arise in the study of clustering problems or optimal source coding systems.
Goal: Another Formulation Using the method Lagrange multipliers, the goal of finding solutions of the optimization problem can be rephrased as finding stationary points of the problem max q F(q, ) = max q (G(q)+ D(q)) where [0, ). is a subset of R NK. G and D are sufficiently smooth in . G and D have symmetry: they are invariant to some group action.
How: Determine the Bifurcation Structure We have described the bifurcation structure of stationary points to any problem of the form max q F(q, ) = max q (G(q)+ D(q)) where [0, ). is a linear subset of R NK. G and D are sufficiently smooth in . G and D have symmetry: they are invariant to some group action.
Thesis Topics The Data Clustering Problem The Neural Coding Problem Information Theory / Probability Theory Optimization Theory Dynamical Systems Bifurcation Theory with Symmetries Group Theory Continuation Techniques
Outline of this talk The Data Clustering Problem A Class of Optimization Problems Bifurcation with Symmetries Numerical Results
The Data Clustering Problem Data Classification: identifying all of the books printed in 2002 which address the martial art Kempo Data Compression: converting a bitmap file to a jpeg file Y YNYN q(Y N |Y) : a clustering K objects {y i } N objects {y Ni }
A Symmetry: invariance to relabelling of the clusters of Y N Y YNYN q(Y N |Y) : a clustering K objects {y i } N objects {y Ni } class 1 class 2
A Symmetry: invariance to relabelling of the clusters of Y N Y YNYN q(Y N |Y) : a clustering K objects {y i } N objects {y Ni } class 2 class 1
Requirements of a Clustering Method The original data is represented reasonably well by the clusters –Choosing a cost function, D(Y,Y N ), called a distortion function, rigorously defines what we mean by the “data is represented reasonably well”. Fast implementation
Deterministic Annealing (Rose 1998) A Fast Clustering Algorithm max H(Y N |Y) constrained by D(Y,Y N ) D 0 Rate Distortion Theory (Shannon ~1950) Minimum Informative Compression min I(Y,Y N ) constrained by D(Y,Y N ) D 0 Examples optimizing at a distortion level D(Y,Y N ) D 0
Inputs and Outputs and Clustered Outputs The Information Distortion method clusters the outputs Y into clusters Y N so that the information that one can learn about X by observing Y N, I(X;Y N ), is as close as possible to the mutual information I(X;Y) The corresponding information distortion function is D I (Y;Y N )=I(X;Y) - I(X;Y N ) X Y InputsOutputs YNYN q(Y N |Y) Clusters K objects {y i } N objects {y Ni }L objects {x i } p(X,Y)
Information Distortion Method (Dimitrov and Miller 2001) max H(Y N |Y) constrained by D I (Y,Y N ) D 0 max H(Y N |Y) + I(X;Y N ) Information Bottleneck Method (Tishby, Pereira, Bialek 1999) min I(Y,Y N ) constrained by D I (Y,Y N ) D 0 max –I(Y,Y N ) + I(X;Y N ) Two optimization problems which use the information distortion function
An annealing algorithm to solve max q F(q, ) = max q (G(q)+ D(q)) Let q 0 be the maximizer of max q G(q), and let 0 =0. For k 0, let (q k, k ) be a solution to max q G(q) + D(q ). Iterate the following steps until K = max for some K. 1.Perform -step: Let k+1 = k + d k where d k >0 2.The initial guess for q k+1 at k+1 is q k+1 (0) = q k + for some small perturbation . 3.Optimization: solve max q (G(q) + k+1 D(q)) to get the maximizer q k+1, using initial guess q k+1 (0).
Application of the annealing method to the Information Distortion problem max q (H(Y N |Y) + I(X;Y N )) when p(X,Y) is defined by four gaussian blobs Inputs Outputs X Y 52 objects p(X,Y) YYNYN q(Y N |Y) 52 objectsN objects I(X;Y N )=D(q(Y N |Y))
Observed Bifurcations for the Four Blob problem: We just saw the optimal clusterings q * at some * = max. What do the clusterings look like for < max ??
Bifurcations of q * ( ) Observed Bifurcations for the 4 Blob Problem Conceptual Bifurcation Structure q*
?????? Why are there only 3 bifurcations observed? In general, are there only N-1 bifurcations? What kinds of bifurcations do we expect: pitchfork-like, transcritical, saddle-node, or some other type? How many bifurcating solutions are there? What do the bifurcating branches look like? Are they subcritical or supercritical ? What is the stability of the bifurcating branches? Is there always a bifurcating branch which contains solutions of the optimization problem? Are there bifurcations after all of the classes have resolved ? q* Conceptual Bifurcation Structure Observed Bifurcations for the 4 Blob Problem
Bifurcations with symmetry To better understand the bifurcation structure, we capitalize on the symmetries of the function G(q)+ D(q) The “obvious” symmetry is that G(q)+ D(q) is invariant to relabelling of the N classes of Y N The symmetry group of all permutations on N symbols is S N. switch labels 1 and 3
Symmetry Breaking Bifurcations q*
Symmetry Breaking Bifurcations q*
Symmetry Breaking Bifurcations q*
Symmetry Breaking Bifurcations q*
Symmetry Breaking Bifurcations q*
Existence Theorems for Bifurcating Branches Given a bifurcation at a point fixed by S N, Equivariant Branching Lemma (Vanderbauwhede and Cicogna ) There are N bifurcating branches, each which have symmetry S N-1. The Smoller-Wasserman Theorem (Smoller and Wasserman ) There are bifurcating branches which have symmetry for every prime p|N, p<N. q*
Given a bifurcation at a point fixed by S N-1, Equivariant Branching Lemma (Vanderbauwhede and Cicogna ) Gives N-1 bifurcating branches which have symmetry S N-2. The Smoller-Wasserman Theorem (Smoller and Wasserman ) Gives bifurcating branches which have symmetry for every prime p|N-1, p<N-1. When N = 4, N-1=3, there are no bifurcating branches given by SW Theorem. q* Existence Theorems for Bifurcating Branches
Bifurcation Structure corresponds with Group Structure
A partial subgroup lattice for S 4 and the corresponding bifurcating directions given by the Equivariant Branching Lemma
A partial subgroup lattice for S 4 and the corresponding bifurcating directions given by the Smoller-Wasserman Theorem
q* Conceptual Bifurcation Structure
q* Conceptual Bifurcation Structure The Equivariant Branching Lemma shows that the bifurcation structure from S M to S M-1 is … Group Structure
q* Conceptual Bifurcation Structure q* Group Structure The Equivariant Branching Lemma shows that the bifurcation structure from S M to S M-1 is …
The Smoller-Wasserman Theorem shows additional structure … q* Conceptual Bifurcation Structure Group Structure
q* Conceptual Bifurcation Structure Group Structure q* The Smoller-Wasserman Theorem shows additional structure … 3 branches from the S 4 to S 3 bifurcation only.
q* Conceptual Bifurcation Structure q* If we stay on a branch which is fixed by S M, how many bifurcations are there?
q* Conceptual Bifurcation Structure Group Structure q* Theorem: There are at exactly K/N bifurcations on the branch (q 1/N, ) for the Information Distortion problem There are 13 bifurcations on the first branch
Bifurcation theory in the presence of symmetries enables us to answer the questions previously posed …
?????? Why are there only 3 bifurcations observed? In general, are there only N-1 bifurcations? What kinds of bifurcations do we expect: pitchfork-like, transcritical, saddle-node, or some other type? How many bifurcating solutions are there? What do the bifurcating branches look like? Are they subcritical or supercritical ? What is the stability of the bifurcating branches? Is there always a bifurcating branch which contains solutions of the optimization problem? Are there bifurcations after all of the classes have resolved ? q* Conceptual Bifurcation Structure Observed Bifurcations for the 4 Blob Problem
?????? Why are there only 3 bifurcations observed? In general, are there only N-1 bifurcations? There are N-1 symmetry breaking bifurcations from S M to S M-1 for M N. What kinds of bifurcations do we expect: pitchfork-like, transcritical, saddle-node, or some other type? How many bifurcating solutions are there? There are at least N from the first bifurcation, at least N-1 from the next one, etc. What do the bifurcating branches look like? They are subcritical or supercritical depending on the sign of the bifurcation discriminator (q *, *,u k ). What is the stability of the bifurcating branches? Is there always a bifurcating branch which contains solutions of the optimization problem? No. Are there bifurcations after all of the classes have resolved ? In general, no. Conceptual Bifurcation Structure Observed Bifurcations for the 4 Blob Problem q*
We can explain the bifurcation structure of all problems of the form max q F(q, ) = max q (G(q)+ D(q)) where [0, ). is a subset of R NK. G and D are sufficiently smooth in . G and D are invariant to relabelling of the classes of Y N The blocks of the Hessian q (G+ D) at bifurcation satisfy a set of generic conditions. This class of problems includes the Information Distortion problem.
Symmetry breaking bifurcation Impossible scenario Saddle-node bifurcation Impossible scenario Non-generic chapter 6 chapter 8chapter 4
Continuation techniques provide numerical confirmation of the theory
Previously Observed Bifurcation Structure for the Four Blob problem:
Equivariant Branching Lemma: Previous vs. Actual Bifurcation Structure We used Continuation Techniques and the Theory of Bifurcations with Symmetries on the 4 Blob Problem using the Information Distortion method to get this picture. Previous results: Actual structure: Singularity of F: Singularity of L : *
q*
Smoller-Wasserman Theorem: there are bifurcating branches with symmetry = q*
A closer look … q*
Bifurcation from S 4 to S 3 … q*
The bifurcation from S 4 to S 3 is subcritical … (the theory predicted this since the bifurcation discriminator (q 1/4, *,u)<0 )
q* Bifurcation from S 3 to S 2 …
The bifurcation from S 3 to S 2 is subcritical …
q* Bifurcation from S 2 to S 1 …
The bifurcation from S 2 to S 1 …
What are these branches ??? q*
Theorem: In general, either symmetry breaking bifurcations or saddle-node bifurcations can occur. Outline of proof: The Equivariant Branching Lemma, Smoller-Wasserman Theorem, and the following singularity structure: Conclusions Symmetry breaking bifurcation Impossible Scenario Saddle-node bifurcation Impossible scenario Non-generic
Theorem: All symmetry breaking bifurcations from S M to S M-1 are pitchfork-like, and there exists M bifurcating branches, for which we have explicit directions. Conclusions q*
Theorem: The bifurcation discriminator of the pitchfork-like branch (q *, *, * ) + (tu,0, (t)) is If (q *, *,u k ) 0, then the branch is supercritical. Conclusions
Theorem: Solutions of the optimization problem do not always persist from bifurcation. Theorem: In general, bifurcations do not occur after all of the classes have resolved. Conclusions
A numerical algorithm to solve max(G(q)+ D(q)) Let q 0 be the maximizer of max q G(q), 0 =1 and s > 0. For k 0, let (q k, k ) be a solution to max q G(q) + D(q ). Iterate the following steps until K = max for some K. 1.Perform -step: solve for and select k+1 = k + d k where d k = ( s sgn(cos )) /(|| q k || 2 + || k || 2 +1) 1/2. 2.The initial guess for (q k+1, k+1 ) at k+1 is (q k+1 (0), k+1 (0) ) = (q k, k ) + d k ( q k, k ). 3.Optimization: solve max q (G(q) + k+1 D(q)) using pseudoarclength continuation to get the maximizer q k+1, and the vector of Lagrange multipliers k+1 using initial guess (q k+1 (0), k+1 (0) ). 4.Check for bifurcation: compare the sign of the determinant of an identical block of each of q [G(q k ) + k D(q k )] and q [G(q k+1 ) + k+1 D(q k+1 )]. If a bifurcation is detected, then set q k+1 (0) = q k + d_k u where u is bifurcating direction and repeat step 3.
Details … The Dynamical System Types of Singularities Continuation Techniques The Explicit Group of Symmetries Explicit Existence Theorems for bifurcating branches
A Class of Problems max F(q, ) = max(G(q)+ D(q)) G and D are sufficiently smooth in . G and D must be invariant under relabelling of the classes.
The Dynamical System Goal: To determine the bifurcation structure of solutions to max q (G(q) + D(q)) for [0, ). Method: Study the equilibria of the of the flow The Jacobian wrt q of the K constraints { YN q(Y N |y)-1} is J=(I K I K … I K ). If w T q F(q *, ) w < 0 for every w ker J, then q * ( ) is a maximizer of. The first equilibrium is q*( 0 = 0) 1/N. If w T q F(q *, ) w < 0 for every w ker J, then q * ( ) is a maximiYNer of. The first equilibrium is q*( 0 = 0) 1/N.
In our dynamical system the hessian determines the stability of equilibria and the location of bifurcation.. Properties of the Dynamical System
Symmetry breaking bifurcation Impossible scenario Saddle-node bifurcation Impossible scenario Non-generic chapter 6 chapter 8chapter 4
The Dynamical System How: Use numerical continuation in a constrained system to choose and to choose an initial guess to find the equilibria q*( ). Use bifurcation theory with symmetries to understand bifurcations of the equilibria. Investigating the Dynamical System
Continuation A local maximum q k * ( k ) of is an equilibrium of the gradient flow. Initial condition q k+1 (0) ( k+1 (0) ) is sought in tangent direction q k, which is found by solving the matrix system The continuation algorithm used to find q k+1 * ( k+1 ) is based on Newton’s method. Parameter continuation follows the dashed (---) path, pseudoarclength continuation follows the dotted (…) path
The Groups Let P be the finite group of n ×n “block” permutation matrices which represents the action of S N on q and F(q, ). For example, if N=3, permutes q(YN 1 |y) with q(YN 2 |y) for every y F(q, ) is P -invariant means that for every P, F( q, ) = F(q, ) Let be the finite group of (n+K) × (n+K) block permutation matrices which represents the action of S N on and q, L (q,, ): q, L (q,, ) is -equivariant means that for every q, L (q,, ) = q, L ( , )
Notation and Definitions The symmetry of is measured by its isotropy subgroup An isotropy subgroup is a maximal isotropy subgroup of if there does not exist an isotropy subgroup of such that . At bifurcation, the fixed point subspace of q*, * is
Equivariant Branching Lemma One of the Existence Theorems we use to describe a bifurcation in the presence of symmetries is the Equivariant Branching Lemma (Vanderbauwhede and Cicogna ). Idea: The bifurcation structure of local solutions is described by the isotropy subgroups of which have dim Fix( )=1. System:. r(x, ) is G -equivariant for some compact Lie Group G Fix( G )={0} Let H be an isotropy subgroup of G such that dim Fix ( H ) = 1. Assume r(0,0) 0 (crossing condition). Then there is a unique smooth solution branch (tx 0, (t)) to r = 0 such that x 0 Fix ( H ) and the isotropy subgroup of each solution is H.
From bifurcation, the Equivariant Branching Lemma shows that the following solutions emerge: An stationary point q * is M-uniform if there exists 1 M N and a K x 1 vector P such that q(y Ni |Y)=P for M and only M classes, {y Ni } N i=1 of Y N. These M classes of Y N are unresolved classes. The classes of Y N that are not unresolved are called resolved. The first equilibria, q * 1/N, is N-uniform. Theorem: q * is M-uniform if and only if q * is fixed by S M. Symmetry Breaking from S M to S M-1
Theorem: dim ker q F (q *, )=M with basis vectors {v i } M i=1 Theorem: dim ker q, L (q *,, )=M-1 with basis vectors Point: Since the bifurcating solutions whose existence is guaranteed by the EBL and the SW Theorem are tangential to ker q, L (q *,, ), then we know the explicit form of the bifurcating directions. Kernel of the Hessian at Symmetry Breaking Bifurcation
Assumptions: Let q * be M-uniform Call the M identical blocks of q F (q *, ): B. Call the other N-M blocks of q F (q *, ): {R }. We assume that B has a single nullvector v and that R is nonsingular for every . If M<N, then B R -1 + MI K is nonsingular. Theorem: Let (q *, *, * ) be a singular point of the flow such that q * is M-uniform. Then there exists M bifurcating (M-1)- uniform solutions (q *, *, * ) + (tu k,0, (t)), where Symmetry Breaking Bifurcation from M-uniform solutions
Symmetry breaking bifurcation Impossible scenario Saddle-node bifurcation Impossible scenario Non-generic chapter 6 chapter 8chapter 4
Some of the bifurcating branches when N = 4 are given by the following isotropy subgroup lattice for S 4
For the 4 Blob problem: The isotropy subgroups and bifurcating directions of the observed bifurcating branches isotropy group: S 4 S 3 S 2 1 bif direction: (-v,-v,3v,-v,0) T (-v,2v,0,-v,0) T (-v,0,0,v,0) T … No more bifs!
Smoller-Wasserman Theorem The other Existence Theorem: Smoller-Wasserman Theorem (1985-6) For variational problems where there is a bifurcating solution tangential to Fix( H ) for every maximal isotropy subgroup H, not only those with dim Fix( H ) = 1. dim Fix( H ) =1 implies that H is a maximal isotropy subgroup
The Smoller-Wasserman Theorem shows that (under the same assumptions as before) if M is composite, then there exists bifurcating solutions with isotropy group for every element of order M in and every prime p|M, p<M. Furthermore, dim (Fix )=p-1 Other branches
Bifurcating branches from a 4-uniform solution are given by the following isotropy subgroup lattice for S 4
Maximal isotropy subgroup for S 4
Issues: S M The full lattice of subgroups of the group S M is not known for arbitrary M. The lattice of maximal subgroups of the group S M is not known for arbitrary M.
More about the Bifurcation Structure Theorem: All symmetry breaking bifurcations from S M to S M-1 are pitchfork-like. Outline of proof: ’(0)=0 since 2 xx r(0,0) =0. Theorem: The bifurcation discriminator of the pitchfork-like branch (q *, *, * ) + (tu k,0, (t)) is If (q *, *,u k ) 0, then the branch is supercritical. Theorem: Generically, bifurcations do not occur after all of the classes have resolved. Theorem: If dim (ker q, L (q *,, )) = 1, and if a crossing condition is satisfied, then saddle-node bifurcation must occur.