We use Numerical continuation Bifurcation theory with symmetries to analyze a class of optimization problems of the form max F(q,  )=max (G(q)+  D(q)).

Slides:



Advertisements
Similar presentations
Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.
Advertisements

Nash Game and Mixed H 2 /H  Control by H. de O. Florentino, R.M. Sales, 1997 and by D.J.N. Limebeer, B.D.O. Anderson, and Hendel, 1994 Presensted by Hui-Hung.
Approximations of points and polygonal chains
6.896: Topics in Algorithmic Game Theory Lecture 11 Constantinos Daskalakis.
1 Dynamic Programming Week #4. 2 Introduction Dynamic Programming (DP) –refers to a collection of algorithms –has a high computational complexity –assumes.
Center for Computational Biology Department of Mathematical Sciences Montana State University Collaborators: Alexander Dimitrov Tomas Gedeon John P. Miller.
How should we define corner points? Under any reasonable definition, point x should be considered a corner point x What is a corner point?
Lecture 8 – Nonlinear Programming Models Topics General formulations Local vs. global solutions Solution characteristics Convexity and convex programming.
CWIT Robust Entropy Rate for Uncertain Sources: Applications to Communication and Control Systems Charalambos D. Charalambous Dept. of Electrical.
Continuation and Symmetry Breaking Bifurcation of the Information Distortion Function September 19, 2002 Albert E. Parker Complex Biological Systems Department.
Chapter 6 Information Theory
Visual Recognition Tutorial
Modelling and Control Issues Arising in the Quest for a Neural Decoder Computation, Control, and Biological Systems Conference VIII, July 30, 2003 Albert.
Symmetry Breaking Bifurcations of the Information Distortion Dissertation Defense April 8, 2003 Albert E. Parker III Complex Biological Systems Department.
Center for Computational Biology Department of Mathematical Sciences Montana State University Collaborators: Alexander Dimitrov John P. Miller Zane Aldworth.
L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
Symmetry Breaking Bifurcation of the Distortion Problem Albert E. Parker Complex Biological Systems Department of Mathematical Sciences Center for Computational.
Optimization Methods One-Dimensional Unconstrained Optimization
Correspondence & Symmetry
Structural Stability, Catastrophe Theory, and Applied Mathematics
Nov 2003Group Meeting #2 Distributed Optimization of Power Allocation in Interference Channel Raul Etkin, Abhay Parekh, and David Tse Spectrum Sharing.
Symmetry breaking clusters when deciphering the neural code September 12, 2005 Albert E. Parker Department of Mathematical Sciences Center for Computational.
Center for Computational Biology Department of Mathematical Sciences Montana State University Collaborators: Alexander Dimitrov John P. Miller Zane Aldworth.
A Bifurcation Theoretical Approach to the Solving the Neural Coding Problem June 28 Albert E. Parker Complex Biological Systems Department of Mathematical.
Collaborators: Tomas Gedeon Alexander Dimitrov John P. Miller Zane Aldworth Information Theory and Neural Coding PhD Oral Examination November 29, 2001.
We use Numerical continuation Bifurcation theory with symmetries to analyze a class of optimization problems of the form max F(q,  )=max (G(q)+  D(q)).
Sequence labeling and beam search LING 572 Fei Xia 2/15/07.
Sufficient Dimensionality Reduction with Irrelevance Statistics Amir Globerson 1 Gal Chechik 2 Naftali Tishby 1 1 Center for Neural Computation and School.
EE462 MLCV 1 Lecture 3-4 Clustering (1hr) Gaussian Mixture and EM (1hr) Tae-Kyun Kim.
NIPS 2003 Workshop on Information Theory and Learning: The Bottleneck and Distortion Approach Organizers: Thomas Gedeon Naftali Tishby
Optimization Methods One-Dimensional Unconstrained Optimization
Phase Transitions in the Information Distortion NIPS 2003 workshop on Information Theory and Learning: The Bottleneck and Distortion Approach December.
KKT Practice and Second Order Conditions from Nash and Sofer
§1 Entropy and mutual information
INFORMATION THEORY BYK.SWARAJA ASSOCIATE PROFESSOR MREC.
§4 Continuous source and Gaussian channel
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
Nonlinear programming Unconstrained optimization techniques.
NETWORK CODING. Routing is concerned with establishing end to end paths between sources and sinks of information. In existing networks each node in a.
§2 Discrete memoryless channels and their capacity function
Communication System A communication system can be represented as in Figure. A message W, drawn from the index set {1, 2,..., M}, results in the signal.
The Information Bottleneck Method clusters the response space, Y, into a much smaller space, T. In order to informatively cluster the response space, the.
EASTERN MEDITERRANEAN UNIVERSITY Department of Industrial Engineering Non linear Optimization Spring Instructor: Prof.Dr.Sahand Daneshvar Submited.
HMM - Part 2 The EM algorithm Continuous density HMM.
Basic Concepts of Information Theory Entropy for Two-dimensional Discrete Finite Probability Schemes. Conditional Entropy. Communication Network. Noise.
1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Lecture 10 Rate-Distortion.
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
METHOD OF STEEPEST DESCENT ELE Adaptive Signal Processing1 Week 5.
Rate Distortion Theory. Introduction The description of an arbitrary real number requires an infinite number of bits, so a finite representation of a.
Channel Coding Theorem (The most famous in IT) Channel Capacity; Problem: finding the maximum number of distinguishable signals for n uses of a communication.
1 Introduction Optimization: Produce best quality of life with the available resources Engineering design optimization: Find the best system that satisfies.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Center for Computational Biology Department of Mathematical Sciences Montana State University Collaborators: Alexander Dimitrov John P. Miller Zane Aldworth.
Numerical Methods for Inverse Kinematics Kris Hauser ECE 383 / ME 442.
Optimal Control.
deterministic operations research
Introduction to Information theory
Lecture 8 – Nonlinear Programming Models
Polyhedron Here, we derive a representation of polyhedron and see the properties of the generators. We also see how to identify the generators. The results.
Subject Name: Information Theory Coding Subject Code: 10EC55
Polyhedron Here, we derive a representation of polyhedron and see the properties of the generators. We also see how to identify the generators. The results.
Finding Functionally Significant Structural Motifs in Proteins
§2-3 Observability of Linear Dynamical Equations
§2-2 Controllability of Linear Dynamical Equations
I.4 Polyhedral Theory (NW)
I.4 Polyhedral Theory.
Part 3. Linear Programming
Chapter 5: Morse functions and function-induced persistence
CS5321 Numerical Optimization
Chapter 2. Simplex method
Presentation transcript:

We use Numerical continuation Bifurcation theory with symmetries to analyze a class of optimization problems of the form max F(q,  )=max (G(q)+  D(q)). The goal is to solve for  = B  (0,  ), where:. G and D are infinitely differentiable in interior of . G has a known local maximum G and D must be invariant under relabeling of the classes. Problems in this Class Deterministic Annealing (Rose 1998) max H(Z|Y) -  D(Y,Z) Clustering Algorithm Rate Distortion Theory (Shannon ~1950) max –I(Y,Z) -  D(Y,Z) Optimal Source Coding Information Distortion (Dimitrov and Miller2001) max H(Z|Y) +  I(X,Z) Used in neural coding. Information Bottleneck Method (Tishby, Pereira, Bialek 2000) max –I(Y,Z) +  I(X,Z) Used for document classification, gene expression, neural coding and spectral analysis A Class of Problems

2 H(X) input sequences 2 H(Y) output sequences 2 I(X,Y) distinguishable input/output classes of (x,y) pairs Y X Size of an input/output class: 2 (H(X|Y) + H(Y|X)) pairs Rate Distortion How well is the source X represented by Z? Information Distortion Goal: Determine the input/output classes of (x,y) pairs. Idea: We seek to quantize (X,Y) into clusters which correspond with the input/output classes. Method: We determine a quantizer, Q *, between X and Z, a representation of Y using N elements, such that the cost function F(Q*,B) is a maximum for some B  (0,  ). X Y P(Y |X) input source output source Z clustered outputs q * (Z |Y) Q * (Z |X) X p(X) Z is a representation of X using N symbols (or clusters) A good communication system has p(X,Y) like:

Some nice properties of the problem The feasible region , a product of simplices, is nice. Lemma  is the convex hull of vertices (  ). When D is convex, the optimal quantizer q* is DETERMINISTIC. Theorem The extrema of lie generically on the vertices of .. Corollary The optimal quantizer is invariant to small perturbations in the model. Solution of the problem when p(X,Y):= 4 gaussian blobs p(X,Y)I(X,Z) vs. N

Goal: To efficiently solve max q  (G(q) +  D(q)) for each , incremented in sufficiently small steps, as   B. Method: Study the equilibria of the of the flow The Jacobian wrt q of the K constraints {  z q(z|y)-1} is J = (I K I K … I K ). The equilibrium at  =0 is q*(0)  1/N.. determines stability and location of bifurcation. Assumptions: Let q * be a local solution to and fixed by S M. Call the M identical blocks of  q F (q *,  ): B. Call the other N-M blocks of  q F (q *,  ): {R  }. At a singularity (q *, *,  * ), B has a single nullvector v and R  is nonsingular for every . If M<N, then B  R  -1 + MI K is nonsingular. Theorem: If (q *, *,  * ) is a bifurcation of equilibria of, then  *  1. For the four Blob Problem when N >2, the first bifurcation is subcritical (a first order phase transition): The Dynamical System

Continuation A local maximum q k * (  k ) of is an equilibrium of the gradient flow. Initial condition q k+1 (0) (  k+1 (0) ) is sought in the tangent direction   q k, which is found by solving the matrix system The continuation algorithm used to find q k+1 * (  k+1 ) is based on Newton’s method. How: Use numerical continuation in a constrained system to choose  and to choose an initial guess to find the equilibria q*(  ). Use bifurcation theory with symmetries to understand bifurcations of the equilibria. Investigating the Dynamical System

Bifurcations of q * (  ) Conceptual Bifurcation Structure  q*  (Y N |Y) Bifurcations with symmetry To better understand the bifurcation structure, we use the symmetries of the cost function F(q,  ). The symmetry is that F(q,  ) is invariant to relabeling of the N classes of Z The symmetry group of all permutations on N symbols is S N. The action of S N on and  q, L (q,,  ) is represented by the finite Lie Group where P is a “block permutation” matrix. The symmetry of is measured by its isotropy group, the subgroup of  which fixes it. Observed Bifurcations for the 4 Blob Problem

The Equivariant Branching Lemma gives the existence of bifurcating solutions for every isotropy subgroup which fixes a one dimensional subspace of ker  q, L (q *,,  ). Theorem: Let (q *, *,  * ) be a singular point of the flow such that q * is fixed by S M. Then there exists M bifurcating solutions, (q *, *,  * ) + (tu k,0,  (t)), each with isotropy group S M-1, where and v is a nullvector of an unresolved block of the Hessian. What do the bifurcations look like? Let T(q*,  *) = Pitchform Like Bifurcations. Theorem: All bifurcations “pitchfork like”. Branch Orientation? Theorem: If T(q*,  *) > 0, then the branch is supercritical. If T(q*,  *) < 0, then the branch is subcritical. Branch Stability? Theorem: If T(q*,  *) < 0, then all branches fixed by S M-1 are unstable. Bifurcation Structure 

Partial lattice of the isotropy subgroups of S 4 (and associated bifurcating directions) For the 4 blob problem: The isotropy subgroups and bifurcating directions of the observed bifurcating branches isotropy group: S 4 S 3 S 2 1 bif direction: (-v,-v,3v,-v,0) T (-v,2v,0,-v,0) T (-v,0,0,v,0) T …No more bifs!

The Smoller-Wasserman Theorem ascertains the existence of bifurcating branches for every maximal isotropy subgroup. Theorem: If M is a composite number, then there exists bifurcating solutions with isotropy group for every element  of order M in  and every prime p|M. The bifurcating direction is in the p-1 dimensional subspace of ker  q, L (q *,,  ) which is fixed by. Other Branches The above theorem states that there are bifurcating solutions from q 1/4 with symmetry,,. The full lattice of subgroups of the group S M is not known for arbitrary M. Lattice of the maximal isotropy subgroups in S 4

A numerical algorithm to solve max F(q,  ) Let q 0 be the maximizer of max q G(q),  0 =1 and  s > 0. For k  0, let (q k,  k ) be a solution to max q (G(q) +  D(q )). Iterate the following steps until  K = B for some K. 1.Perform  -step: solve for and select  k+1 =  k + d k where d k =  s /(||   q k || 2 + ||   k || 2 +1) 1/2. 2.The initial guess for q k+1 at  k+1 is q k+1 (0) = q k + d k   q k. 3.Optimization: solve max q (G(q) +  k+1 D(q)) to get the maximizer q * k+1, using initial guess q k+1 (0). 4.Check for bifurcation: compare the sign of the determinant of an identical block of each of  q [G(q k ) +  k D(q k )] and  q [G(q k+1 ) +  k+1 D(q k+1 )]. If a bifurcation is detected, then set q k+1 (0) = q k + d k u where u is given by  and repeat step 3.