Modelling and Control Issues Arising in the Quest for a Neural Decoder Computation, Control, and Biological Systems Conference VIII, July 30, 2003 Albert.

Slides:



Advertisements
Similar presentations
Quantum Computing and Dynamical Quantum Models ( quant-ph/ ) Scott Aaronson, UC Berkeley QC Seminar May 14, 2002.
Advertisements

Modeling and Simulation By Lecturer: Nada Ahmed. Introduction to simulation and Modeling.
Incremental Linear Programming Linear programming involves finding a solution to the constraints, one that maximizes the given linear function of variables.
Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.
Experiments and Variables
Statistical Techniques I EXST7005 Sample Size Calculation.
Center for Computational Biology Department of Mathematical Sciences Montana State University Collaborators: Alexander Dimitrov Tomas Gedeon John P. Miller.
Acoustic design by simulated annealing algorithm
Continuation and Symmetry Breaking Bifurcation of the Information Distortion Function September 19, 2002 Albert E. Parker Complex Biological Systems Department.
Hidden Markov Models Ellen Walker Bioinformatics Hiram College, 2008.
Visual Recognition Tutorial
1 Introduction to Computability Theory Lecture12: Reductions Prof. Amos Israeli.
Symmetry Breaking Bifurcations of the Information Distortion Dissertation Defense April 8, 2003 Albert E. Parker III Complex Biological Systems Department.
Center for Computational Biology Department of Mathematical Sciences Montana State University Collaborators: Alexander Dimitrov John P. Miller Zane Aldworth.
Symmetry Breaking Bifurcation of the Distortion Problem Albert E. Parker Complex Biological Systems Department of Mathematical Sciences Center for Computational.
A general approximation technique for constrained forest problems Michael X. Goemans & David P. Williamson Presented by: Yonatan Elhanani & Yuval Cohen.
We use Numerical continuation Bifurcation theory with symmetries to analyze a class of optimization problems of the form max F(q,  )=max (G(q)+  D(q)).
Symmetry breaking clusters when deciphering the neural code September 12, 2005 Albert E. Parker Department of Mathematical Sciences Center for Computational.
Center for Computational Biology Department of Mathematical Sciences Montana State University Collaborators: Alexander Dimitrov John P. Miller Zane Aldworth.
A Bifurcation Theoretical Approach to the Solving the Neural Coding Problem June 28 Albert E. Parker Complex Biological Systems Department of Mathematical.
Code and Decoder Design of LDPC Codes for Gbps Systems Jeremy Thorpe Presented to: Microsoft Research
Collaborators: Tomas Gedeon Alexander Dimitrov John P. Miller Zane Aldworth Information Theory and Neural Coding PhD Oral Examination November 29, 2001.
We use Numerical continuation Bifurcation theory with symmetries to analyze a class of optimization problems of the form max F(q,  )=max (G(q)+  D(q)).
Neural Coding Through The Ages February 1, 2002 Albert E. Parker Complex Biological Systems Department of Mathematical Sciences Center for Computational.
Sufficient Dimensionality Reduction with Irrelevance Statistics Amir Globerson 1 Gal Chechik 2 Naftali Tishby 1 1 Center for Neural Computation and School.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
MD4 1 MD4. MD4 2 MD4  Message Digest 4  Invented by Rivest, ca 1990  Weaknesses found by 1992 o Rivest proposed improved version (MD5), 1992  Dobbertin.
Phase Transitions in the Information Distortion NIPS 2003 workshop on Information Theory and Learning: The Bottleneck and Distortion Approach December.
Dan Simon Cleveland State University
Review of Lecture Two Linear Regression Normal Equation
Professor Walter W. Olson Department of Mechanical, Industrial and Manufacturing Engineering University of Toledo Solving ODE.
Frame by Frame Bit Allocation for Motion-Compensated Video Michael Ringenburg May 9, 2003.
John J. Tyson Virginia Polytechnic Institute
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
Approximation Algorithms Pages ADVANCED TOPICS IN COMPLEXITY THEORY.
Problem Solving Strategies EDU 412/413 Special Thanks to: Matthieu Petit.
Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.
LURE 2009 SUMMER PROGRAM John Alford Sam Houston State University.
Modelling Language Evolution Lecture 1: Introduction to Learning Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit.
A Faster Approximation Scheme for Timing Driven Minimum Cost Layer Assignment Shiyan Hu*, Zhuo Li**, and Charles J. Alpert** *Dept of ECE, Michigan Technological.
Great Theoretical Ideas in Computer Science.
The Information Bottleneck Method clusters the response space, Y, into a much smaller space, T. In order to informatively cluster the response space, the.
Advanced Operations Research Models Instructor: Dr. A. Seifi Teaching Assistant: Golbarg Kazemi 1.
Chapter 1 Introduction n Introduction: Problem Solving and Decision Making n Quantitative Analysis and Decision Making n Quantitative Analysis n Model.
BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) Direction (deg) Activity
Optimal Component Analysis Optimal Linear Representations of Images for Object Recognition X. Liu, A. Srivastava, and Kyle Gallivan, “Optimal linear representations.
BCS547 Neural Decoding.
Introduction to Neural Networks. Biological neural activity –Each neuron has a body, an axon, and many dendrites Can be in one of the two states: firing.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Lecture 10 Rate-Distortion.
John Lafferty Andrew McCallum Fernando Pereira
Regress-itation Feb. 5, Outline Linear regression – Regression: predicting a continuous value Logistic regression – Classification: predicting a.
Bayesian Brain: Probabilistic Approaches to Neural Coding Chapter 12: Optimal Control Theory Kenju Doya, Shin Ishii, Alexandre Pouget, and Rajesh P.N.Rao.
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
DEPARTMENT/SEMESTER ME VII Sem COURSE NAME Operation Research Manav Rachna College of Engg.
Fidelities of Quantum ARQ Protocol Alexei Ashikhmin Bell Labs  Classical Automatic Repeat Request (ARQ) Protocol  Qubits, von Neumann Measurement, Quantum.
Channel Coding Theorem (The most famous in IT) Channel Capacity; Problem: finding the maximum number of distinguishable signals for n uses of a communication.
Intro. ANN & Fuzzy Systems Lecture 38 Mixture of Experts Neural Network.
Inequality Constraints Lecture 7. Inequality Contraints (I) n A Review of Lagrange Multipliers –As we discussed last time, the first order necessary conditions.
Flows and Networks Plan for today (lecture 6): Last time / Questions? Kelly / Whittle network Optimal design of a Kelly / Whittle network: optimisation.
Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.
Center for Computational Biology Department of Mathematical Sciences Montana State University Collaborators: Alexander Dimitrov John P. Miller Zane Aldworth.
Advanced Algorithms Analysis and Design By Dr. Nazir Ahmad Zafar Dr Nazir A. Zafar Advanced Algorithms Analysis and Design.
Dr. Arslan Ornek IMPROVING SEARCH
One- and Two-Dimensional Flows
Boltzmann Machine (BM) (§6.4)
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Presentation transcript:

Modelling and Control Issues Arising in the Quest for a Neural Decoder Computation, Control, and Biological Systems Conference VIII, July 30, 2003 Albert E. Parker Complex Biological Systems Department of Mathematical Sciences Center for Computational Biology Montana State University Collaborators: Tomas Gedeon, Alex Dimitrov, John Miller, and Zane Aldworth

 The Neural Coding Problem  A Clustering Problem  The Dynamical System  The Role of Bifurcation Theory  A new algorithm to solve the Neural Coding Problem Talk Outline

The Neural Coding Problem GOAL: To understand the neural code. EASIER GOAL: We seek an answer to the question, How does neural activity represent information about environmental stimuli? “The little fly sitting in the fly’s brain trying to fly the fly”

inputs: stimuli X outputs: neural responses Y Looking for the dictionary to the neural code … decoding encoding

… but the dictionary is not deterministic! Given a stimulus, an experimenter observes many different neural responses: X Y i | X i = 1, 2, 3, 4

… but the dictionary is not deterministic! Given a stimulus, an experimenter observes many different neural responses: Neural coding is stochastic!! X Y i | X i = 1, 2, 3, 4

Similarly, neural decoding is stochastic: Y X i |Y i = 1, 2, …, 9

Probability Framework X Y environmental stimuli neural responses decoder: P(X|Y) encoder: P(Y|X)

The Neural Coding Problem: How to determine the encoder P(Y|X) or the decoder P(X|Y)? Common Approaches: parametric estimations, linear methods Difficulty: There is never enough data.

One Approach: Cluster the responses X Y StimuliResponses YNYN q(Y N |Y) Clustered Responses K objects {y i } N objects {y Ni }L objects {x i } p(X,Y)

One Approach: Cluster the responses X Y StimuliResponses YNYN q(Y N |Y) Clustered Responses K objects {y i } N objects {y Ni }L objects {x i } p(X,Y)

One Approach: Cluster the responses X Y StimuliResponses YNYN q(Y N |Y) Clustered Responses K objects {y i } N objects {y Ni }L objects {x i } p(X,Y) P(Y|X) P(X|Y)

One Approach: Cluster the responses X Y StimuliResponses YNYN q(Y N |Y) Clustered Responses K objects {y i } N objects {y Ni }L objects {x i } p(X,Y) P(Y|X) P(X|Y)

One Approach: Cluster the responses X Y StimuliResponses YNYN q(Y N |Y) Clustered Responses K objects {y i } N objects {y Ni }L objects {x i } p(X,Y) P(Y|X) P(Y N |X) P(X|Y)P(X|Y N )

One Approach: Cluster the responses q(Y N |Y) is a stochastic clustering of the responses To address the insufficient data problem, one clusters the outputs Y into clusters Y N so that the information that one can learn about X by observing Y N, I(X;Y N ), is as close as possible to the mutual information I(X;Y) X Y StimuliResponses YNYN q(Y N |Y) K objects {y i } N objects {y Ni }L objects {x i } p(X,Y) Clustered Responses

Information Bottleneck Method (Tishby, Pereira, Bialek 1999) min I(Y,Y N ) constrained by I(X;Y N )  I 0 max –I(Y,Y N ) +  I(X;Y N ) Information Distortion Method (Dimitrov and Miller 2001) max H(Y N |Y) constrained by I(X;Y N )  I 0 max H(Y N |Y) +  I(X;Y N ) Two optimization problems which use this approach

In General: We have developed an approach to solve optimization problems of the form max q  G(q) constrained by D(q)  D 0 or (using the method of Lagrange multipliers) max q  F(q,  ) = max q  (G(q)+  D(q)) where   [0,  ).  is a subset of valid stochastic clusterings in R NK. G and D are sufficiently smooth in . G and D have symmetry: they are invariant to relabelling of the classes of Y N.

Symmetry: invariance to relabelling of the clusters of Y N Y YNYN q(Y N |Y) : a clustering K objects {y i } N objects {y Ni } class 1 class 2

Symmetry: invariance to relabelling of the clusters of Y N Y YNYN q(Y N |Y) : a clustering K objects {y i } N objects {y Ni } class 2 class 1

An annealing algorithm to solve max q  (G(q)+  D(q)) Let q 0 be the maximizer of max q G(q), and let  0 =0. For k  0, let (q k,  k ) be a solution to max q G(q) +  D(q ). Iterate the following steps until  K =  max for some K. 1.Perform  -step: Let  k+1 =  k + d k where d k >0 2.The initial guess for q k+1 at  k+1 is q k+1 (0) = q k +  for some small perturbation . 3.Optimization: solve max q (G(q) +  k+1 D(q)) to get the maximizer q k+1, using initial guess q k+1 (0).

Application of the annealing method to the Information Distortion problem max q  (H(Y N |Y) +  I(X;Y N )) when p(X,Y) is defined by four gaussian blobs Stimuli Responses X Y 52 responses 52 stimuli p(X,Y) YYNYN q(Y N |Y) 52 responses4 clusters

Evolution of the optimal clustering: Observed Bifurcations for the Four Blob problem: We just saw the optimal clusterings q * at some  * =  max. What do the clusterings look like for  <  max ??

?????? Why are there only 3 bifurcations observed? In general, are there only N-1 bifurcations? What kinds of bifurcations do we expect: pitchfork-like, transcritical, saddle-node, or some other type? How many bifurcating solutions are there? What do the bifurcating branches look like? Are they subcritical or supercritical ? What is the stability of the bifurcating branches? Is there always a bifurcating branch which contains solutions of the optimization problem? Are there bifurcations after all of the classes have resolved ?  q* Conceptual Bifurcation Structure Observed Bifurcations for the 4 Blob Problem

Bifurcation theory in the presence of symmetries enables us to answer the questions previously posed …

Recall the Symmetries: To better understand the bifurcation structure, we capitalize on the symmetries of the function G(q)+  D(q) Y YNYN q(Y N |Y) : a clustering K objects {y i } N objects {y Ni } class 1 class 3

Y YNYN q(Y N |Y) : a clustering K objects {y i } N objects {y Ni } class 3 class 1 Recall the Symmetries: To better understand the bifurcation structure, we capitalize on the symmetries of the function G(q)+  D(q)

The symmetry group of all permutations on N symbols is S N.

Formulate a Dynamical System Goal: To solve max q  (G(q) +  D(q)) for each , incremented in sufficiently small steps, as   . Method: Study the equilibria of the of the gradient flow Equilibria of this system are possible solutions of the the maximization problem (satisfy the necessary conditions of constrained optimality) The Jacobian  q, L (q *,  * ) is symmetric, and so only bifurcations of equilibria can occur.

Observed Bifurcation Structure

Observed Bifurcation Structure Group Structure

 q* Observed Bifurcation Structure The Equivariant Branching Lemma shows that the bifurcation structure contains the branches …

Group Structure  q* Observed Bifurcation Structure The Smoller-Wasserman Theorem shows additional structure …

 q* Theorem: There are at exactly K/N bifurcations on the branch (q 1/N,  ) for the Information Distortion problem There are 13 bifurcations on the first branch Observed Bifurcation Structure

?????? Why are there only 3 bifurcations observed? In general, are there only N-1 bifurcations? What kinds of bifurcations do we expect: pitchfork-like, transcritical, saddle-node, or some other type? How many bifurcating solutions are there? What do the bifurcating branches look like? Are they subcritical or supercritical ? What is the stability of the bifurcating branches? Is there always a bifurcating branch which contains solutions of the optimization problem? Are there bifurcations after all of the classes have resolved ?  q* Conceptual Bifurcation Structure Observed Bifurcations for the 4 Blob Problem

?????? Why are there only 3 bifurcations observed? In general, are there only N-1 bifurcations? There are N-1 symmetry breaking bifurcations from S M to S M-1 for M  N. What kinds of bifurcations do we expect: pitchfork-like, transcritical, saddle-node, or some other type? How many bifurcating solutions are there? There are at least N from the first bifurcation, at least N-1 from the next one, etc. What do the bifurcating branches look like? They are subcritical or supercritical depending on the sign of the bifurcation discriminator  (q *,  *,u k ). What is the stability of the bifurcating branches? Is there always a bifurcating branch which contains solutions of the optimization problem? No. Are there bifurcations after all of the classes have resolved ? In general, no. Conceptual Bifurcation Structure Observed Bifurcations for the 4 Blob Problem  q*

Continuation techniques provide numerical confirmation of the theory

A closer look …  q*

Bifurcation from S 4 to S 3 …  q*

The bifurcation from S 4 to S 3 is subcritical … (the theory predicted this since the bifurcation discriminator  (q 1/4,  *,u)<0 )

Additional structure!!

Conclusions … We have a complete theoretical picture of how the clusterings evolve for any problem of the form max q  (G(q)+  D(q)) subject to the assumptions stated earlier. oWhen clustering to N classes, there are N-1 bifurcations. oIn general, there are only pitchfork and saddle-node bifurcations. oWe can determine whether pitchfork bifurcations are either subcritical or supercritical (1 st or 2 nd order phase transitions) oWe know the explicit bifurcating directions SO WHAT?? There are theoretical consequences … This yields a new and improved algorithm for solving the neural coding problem …

A numerical algorithm to solve max(G(q)+  D(q)) Let q 0 be the maximizer of max q G(q),  0 =1 and  s > 0. For k  0, let (q k,  k ) be a solution to max q G(q) +  D(q ). Iterate the following steps until  K =  max for some K. 1.Perform  -step: solve for and select  k+1 =  k + d k where d k = (  s sgn(cos  )) /(||   q k || 2 + ||   k || 2 +1) 1/2. 2.The initial guess for (q k+1, k+1 ) at  k+1 is (q k+1 (0), k+1 (0) ) = (q k, k ) + d k (   q k,   k ). 3.Optimization: solve max q (G(q) +  k+1 D(q)) using pseudoarclength continuation to get the maximizer q k+1, and the vector of Lagrange multipliers k+1 using initial guess (q k+1 (0), k+1 (0) ). 4.Check for bifurcation: compare the sign of the determinant of an identical block of each of  q [G(q k ) +  k D(q k )] and  q [G(q k+1 ) +  k+1 D(q k+1 )]. If a bifurcation is detected, then set q k+1 (0) = q k + d_k u where u is bifurcating direction and repeat step 3.

Application to cricket sensory data E(X|Y N ): stimulus means conditioned on each of the classes typical spike patterns optimal quantizer