We use Numerical continuation Bifurcation theory with symmetries to analyze a class of optimization problems of the form max F(q,  )=max (G(q)+  D(q)).

Slides:



Advertisements
Similar presentations
Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.
Advertisements

Center for Computational Biology Department of Mathematical Sciences Montana State University Collaborators: Alexander Dimitrov Tomas Gedeon John P. Miller.
Optimization of thermal processes2007/2008 Optimization of thermal processes Maciej Marek Czestochowa University of Technology Institute of Thermal Machinery.
C&O 355 Mathematical Programming Fall 2010 Lecture 20 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A.
How should we define corner points? Under any reasonable definition, point x should be considered a corner point x What is a corner point?
Equilibrium Concepts in Two Player Games Kevin Byrnes Department of Applied Mathematics & Statistics.
Lecture 8 – Nonlinear Programming Models Topics General formulations Local vs. global solutions Solution characteristics Convexity and convex programming.
MATH 685/ CSI 700/ OR 682 Lecture Notes
Continuation and Symmetry Breaking Bifurcation of the Information Distortion Function September 19, 2002 Albert E. Parker Complex Biological Systems Department.
Chapter 6 Information Theory
Visual Recognition Tutorial
1cs542g-term Notes. 2 Solving Nonlinear Systems  Most thoroughly explored in the context of optimization  For systems arising in implicit time.
Modelling and Control Issues Arising in the Quest for a Neural Decoder Computation, Control, and Biological Systems Conference VIII, July 30, 2003 Albert.
OPTIMAL CONTROL SYSTEMS
Symmetry Breaking Bifurcations of the Information Distortion Dissertation Defense April 8, 2003 Albert E. Parker III Complex Biological Systems Department.
Center for Computational Biology Department of Mathematical Sciences Montana State University Collaborators: Alexander Dimitrov John P. Miller Zane Aldworth.
Symmetry Breaking Bifurcation of the Distortion Problem Albert E. Parker Complex Biological Systems Department of Mathematical Sciences Center for Computational.
Correspondence & Symmetry
Nov 2003Group Meeting #2 Distributed Optimization of Power Allocation in Interference Channel Raul Etkin, Abhay Parekh, and David Tse Spectrum Sharing.
We use Numerical continuation Bifurcation theory with symmetries to analyze a class of optimization problems of the form max F(q,  )=max (G(q)+  D(q)).
Symmetry breaking clusters when deciphering the neural code September 12, 2005 Albert E. Parker Department of Mathematical Sciences Center for Computational.
Center for Computational Biology Department of Mathematical Sciences Montana State University Collaborators: Alexander Dimitrov John P. Miller Zane Aldworth.
A Bifurcation Theoretical Approach to the Solving the Neural Coding Problem June 28 Albert E. Parker Complex Biological Systems Department of Mathematical.
Collaborators: Tomas Gedeon Alexander Dimitrov John P. Miller Zane Aldworth Information Theory and Neural Coding PhD Oral Examination November 29, 2001.
Sufficient Dimensionality Reduction with Irrelevance Statistics Amir Globerson 1 Gal Chechik 2 Naftali Tishby 1 1 Center for Neural Computation and School.
EE462 MLCV 1 Lecture 3-4 Clustering (1hr) Gaussian Mixture and EM (1hr) Tae-Kyun Kim.
Phase Transitions in the Information Distortion NIPS 2003 workshop on Information Theory and Learning: The Bottleneck and Distortion Approach December.
KKT Practice and Second Order Conditions from Nash and Sofer
§1 Entropy and mutual information
INFORMATION THEORY BYK.SWARAJA ASSOCIATE PROFESSOR MREC.
11. Cost minimization Econ 494 Spring 2013.
Biointelligence Laboratory, Seoul National University
§4 Continuous source and Gaussian channel
Information and Coding Theory Linear Block Codes. Basic definitions and some examples. Juris Viksna, 2015.
Chapter 2: Vector spaces
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
Nonlinear programming Unconstrained optimization techniques.
NETWORK CODING. Routing is concerned with establishing end to end paths between sources and sinks of information. In existing networks each node in a.
COMMUNICATION NETWORK. NOISE CHARACTERISTICS OF A CHANNEL 1.
The Information Bottleneck Method clusters the response space, Y, into a much smaller space, T. In order to informatively cluster the response space, the.
EASTERN MEDITERRANEAN UNIVERSITY Department of Industrial Engineering Non linear Optimization Spring Instructor: Prof.Dr.Sahand Daneshvar Submited.
HMM - Part 2 The EM algorithm Continuous density HMM.
Basic Concepts of Information Theory Entropy for Two-dimensional Discrete Finite Probability Schemes. Conditional Entropy. Communication Network. Noise.
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
METHOD OF STEEPEST DESCENT ELE Adaptive Signal Processing1 Week 5.
Linear & Nonlinear Programming -- Basic Properties of Solutions and Algorithms.
Rate Distortion Theory. Introduction The description of an arbitrary real number requires an infinite number of bits, so a finite representation of a.
Channel Coding Theorem (The most famous in IT) Channel Capacity; Problem: finding the maximum number of distinguishable signals for n uses of a communication.
1 Introduction Optimization: Produce best quality of life with the available resources Engineering design optimization: Find the best system that satisfies.
Linear Programming Chap 2. The Geometry of LP  In the text, polyhedron is defined as P = { x  R n : Ax  b }. So some of our earlier results should.
Center for Computational Biology Department of Mathematical Sciences Montana State University Collaborators: Alexander Dimitrov John P. Miller Zane Aldworth.
Amir Yavariabdi Introduction to the Calculus of Variations and Optical Flow.
Numerical Methods for Inverse Kinematics Kris Hauser ECE 383 / ME 442.
Basic Concepts of Information Theory Entropy for Two-dimensional Discrete Finite Probability Schemes. Conditional Entropy. Communication Network. Noise.
Perturbation method, lexicographic method
Lecture 8 – Nonlinear Programming Models
Polyhedron Here, we derive a representation of polyhedron and see the properties of the generators. We also see how to identify the generators. The results.
Subject Name: Information Theory Coding Subject Code: 10EC55
Polyhedron Here, we derive a representation of polyhedron and see the properties of the generators. We also see how to identify the generators. The results.
Part 3. Linear Programming
Finding Functionally Significant Structural Motifs in Proteins
L5 Optimal Design concepts pt A
I.4 Polyhedral Theory (NW)
I.4 Polyhedral Theory.
Group theory 101 Suggested reading:
Part 3. Linear Programming
Chapter 5: Morse functions and function-induced persistence
Non-Negative Matrix Factorization
CS5321 Numerical Optimization
Presentation transcript:

We use Numerical continuation Bifurcation theory with symmetries to analyze a class of optimization problems of the form max F(q,  )=max (G(q)+  D(q)). The goal is to solve for  = B  (0,  ), where:. G and D are infinitely differentiable in . G is strictly concave. D is convex. G and D must be invariant under relabeling of the classes. The hessian of F is block diagonal with N blocks {B  } and B  =B  if q(z  |y)= q(z  |y) for every y  Y. Problems in this Class Deterministic Annealing (Rose 1998) max H(Z|Y) -  D(Y,Z) Clustering Algorithm Rate Distortion Theory (Shannon ~1950) max –I(Y,Z) -  D(Y,Z) Optimal Source Coding Information Distortion (Dimitrov and Miller2001) max H(Z|Y) +  I(X,Z) Used in neural coding. Information Bottleneck Method (Tishby, Pereira, Bialek 2000) max –I(Y,Z) +  I(X,Z) Used for document classification, gene expression, neural coding and spectral analysis A Class of Problems

2 H(X) input sequences 2 H(Y) output sequences 2 I(X,Y) distinguishable input/output classes of (x,y) pairs Y X Size of an input/output class: 2 (H(X|Y) + H(Y|X)) pairs Rate Distortion How well is the source X represented by Z? Information Distortion Goal: Determine the input/output classes of (x,y) pairs. Idea: We seek to quantize (X,Y) into clusters which correspond with the input/output classes. Method: We determine a quantizer, Q *, between X and Z, a representation of Y using N elements, such that F(Q*,B) is a maximum for some B  (0,  ). X Y P(Y |X) input source output source Z clustered outputs q * (Z |Y) Q * (Z |X) X p(X) Z is a representation of X using N symbols (or clusters) A good communication system has p(X,Y) like:

Some nice properties of the problem The feasible region , a product of simplices, is nice. Lemma  is the convex hull of vertices (  ). The optimal quantizer q* is DETERMINISTIC. Theorem The extrema of lie generically on the vertices of .. Corollary The optimal quantizer is invariant to small perturbations in the model. Solution of the problem when p(X,Y):= 4 gaussian blobs p(X,Y)I(X,Z) vs. N

Goal: To efficiently solve max q  (G(q) +  D(q)) for each , incremented in sufficiently small steps, as   B. Method: Study the equilibria of the of the flow The Jacobian wrt q of the K constraints {  z q(z|y)-1} is J = (I K I K … I K ). The first equilibrium is q*(  0 = 0)  1/N.. determines stability and location of bifurcation. Assumptions: Let q * be a local solution to and fixed by S M. Call the M identical blocks of  q F (q *,  ): B. Call the other N-M blocks of  q F (q *,  ): {R  }. At a singularity (q *, *,  * ), B has a single nullvector v and R  is nonsingular for every . If M<N, then B  R  -1 + MI K is nonsingular. Theorem: If  q, L (q *, *,  * ) is singular then  q F (q *,  * ) is singular. Theorem: (q *, *,  * ) is a bifurcation of equilibria of if and only if  q, L (q *, *,  * ) is singular. Theorem: If (q *, *,  * ) is a bifurcation of equilibria of, then  *  1. Theorem: dim (ker  q F (q *,  * )) = M with basis vectors w 1,w 2, …, w M Theorem: dim (ker  q, L (q *, *,  * )) = M-1 with basis vectors The Dynamical System

Continuation A local maximum q k * (  k ) of is an equilibrium of the gradient flow. Initial condition q k+1 (0) (  k+1 (0) ) is sought in the tangent direction   q k, which is found by solving the matrix system The continuation algorithm used to find q k+1 * (  k+1 ) is based on Newton’s method. How: Use numerical continuation in a constrained system to choose  and to choose an initial guess to find the equilibria q*(  ). Use bifurcation theory with symmetries to understand bifurcations of the equilibria. Investigating the Dynamical System

Bifurcations of q * (  ) Observed Bifurcations for the 4 Blob Problem Conceptual Bifurcation Structure  q*  (Y N |Y) Bifurcations with symmetry To better understand the bifurcation structure, we capitalize on the symmetries of the optimization function F(q,  ). The “obvious” symmetry is that F(q,  ) is invariant to relabeling of the N classes of Z The symmetry group of all permutations on N symbols is S N. The action of S N on and  q, L (q,,  ) is represented by the finite Lie Group where P is a “block permutation” matrix. The symmetry of is measured by its isotropy group, the subgroup of  which fixes it.

The Equivariant Branching Lemma gives the existence of bifurcating solutions for every isotropy subgroup which fixes a one dimensional subspace of ker  q, L (q *,,  ). Theorem: Let (q *, *,  * ) be a singular point of the flow such that q * is fixed by S M. Then there exists M bifurcating solutions, (q *, *,  * ) + (tu k,0,  (t)), each with isotropy group S M-1, where What do the bifurcations look like? Let T(q*,  *) = Transcritical or Degenerate? Theorem: If T(q*,  *)  0 and M>2, then the bifurcation at (q*,  *) is transcritical. If T(q*,  *) = 0, it is degenerate. Branch Orientation? Theorem: If T(q*,  *) > 0 or if T(q*,  *) < 0, then the branch is supercritical or subcritical respectively. If T(q*,  *) = 0, then  4 qqqq F(q,  ) dictates orientation. Branch Stability? Theorem: If T(q*,  *)  0, then all branches fixed by S M-1 are unstable. Bifurcation Structure 

Partial lattice of the isotropy subgroups of S 4 (and associated bifurcating directions) For the 4 blob problem: The isotropy subgroups and bifurcating directions of the observed bifurcating branches isotropy group: S 4 S 3 S 2 1 bif direction: (-v,-v,3v,-v,0) T (-v,2v,0,-v,0) T (-v,0,0,v,0) T …No more bifs!

The Smoller-Wasserman Theorem ascertains the existence of bifurcating branches for every maximal isotropy subgroup. Theorem: If M is a composite number, then there exists bifurcating solutions with isotropy group for every element  of order M in  and every prime p|M. The bifurcating direction is in the p-1 dimensional subspace of ker  q, L (q *,,  ) which is fixed by. We have never numerically observed solutions fixed by and so perhaps they are unstable. Other Branches An example of redundancy: (1423) 2 = (1324) 2 = (12)(34) The full lattice of subgroups of the group S M is not known for arbitrary M. Lattice of the maximal isotropy subgroups in S 4

The efficient algorithm to solve max F(q,  ) Let q 0 be the maximizer of max q G(q),  0 =1 and  s > 0. For k  0, let (q k,  k ) be a solution to max q (G(q) +  D(q )). Iterate the following steps until  K = B for some K. 1.Perform  -step: solve for and select  k+1 =  k + d k where d k =  s /(||   q k || 2 + ||   k || 2 +1) 1/2. 2.The initial guess for q k+1 at  k+1 is q k+1 (0) = q k + d k   q k. 3.Optimization: solve max q (G(q) +  k+1 D(q)) to get the maximizer q * k+1, using initial guess q k+1 (0). 4.Check for bifurcation: compare the sign of the determinant of an identical block of each of  q [G(q k ) +  k D(q k )] and  q [G(q k+1 ) +  k+1 D(q k+1 )]. If a bifurcation is detected, then set q k+1 (0) = q k + d k u where u is given by  and repeat step 3.