A Constraint Generation Approach to Learning Stable Linear Dynamical Systems Sajid M. Siddiqi Byron Boots Geoffrey J. Gordon Carnegie Mellon University.

Slides:



Advertisements
Similar presentations
Eigen Decomposition and Singular Value Decomposition
Advertisements

Eigen Decomposition and Singular Value Decomposition
Engineering Optimization
MS&E 211 Quadratic Programming Ashish Goel. A simple quadratic program Minimize (x 1 ) 2 Subject to: -x 1 + x 2 ≥ 3 -x 1 – x 2 ≥ -2.
Support Vector Machines Instructor Max Welling ICS273A UCIrvine.
Image Congealing (batch/multiple) image (alignment/registration) Advanced Topics in Computer Vision (048921) Boris Kimelman.
PCA + SVD.
Optimization of thermal processes2007/2008 Optimization of thermal processes Maciej Marek Czestochowa University of Technology Institute of Thermal Machinery.
Graph Laplacian Regularization for Large-Scale Semidefinite Programming Kilian Weinberger et al. NIPS 2006 presented by Aggeliki Tsoli.
Semi-Definite Algorithm for Max-CUT Ran Berenfeld May 10,2005.
Robust Multi-Kernel Classification of Uncertain and Imbalanced Data
1 Learning Dynamic Models from Unsequenced Data Jeff Schneider School of Computer Science Carnegie Mellon University joint work with Tzu-Kuo Huang, Le.
Optimization & Learning for Registration of Moving Dynamic Textures Junzhou Huang 1, Xiaolei Huang 2, Dimitris Metaxas 1 Rutgers University 1, Lehigh University.
Bayesian Robust Principal Component Analysis Presenter: Raghu Ranganathan ECE / CMR Tennessee Technological University January 21, 2011 Reading Group (Xinghao.
Robust Network Compressive Sensing Lili Qiu UT Austin NSF Workshop Nov. 12, 2014.
Visual Recognition Tutorial
1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Hilbert Space Embeddings of Hidden Markov Models Le Song, Byron Boots, Sajid Siddiqi, Geoff Gordon and Alex Smola 1.
Performance Optimization
Motion Analysis Slides are from RPI Registration Class.
Some useful linear algebra. Linearly independent vectors span(V): span of vector space V is all linear combinations of vectors v i, i.e.
Curve-Fitting Regression
Chebyshev Estimator Presented by: Orr Srour. References Yonina Eldar, Amir Beck and Marc Teboulle, "A Minimax Chebyshev Estimator for Bounded Error Estimation"
Real-time Combined 2D+3D Active Appearance Models Jing Xiao, Simon Baker,Iain Matthew, and Takeo Kanade CVPR 2004 Presented by Pat Chan 23/11/2004.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.
A Constraint Generation Approach to Learning Stable Linear Dynamical Systems Sajid M. Siddiqi Byron Boots Geoffrey J. Gordon Carnegie Mellon University.
Fast Temporal State-Splitting for HMM Model Selection and Learning Sajid Siddiqi Geoffrey Gordon Andrew Moore.

1cs542g-term Notes  Extra class next week (Oct 12, not this Friday)  To submit your assignment: me the URL of a page containing (links to)
Low-Rank Solution of Convex Relaxation for Optimal Power Flow Problem
Introduction The central problems of Linear Algebra are to study the properties of matrices and to investigate the solutions of systems of linear equations.
Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004.
Summarized by Soo-Jin Kim
Presented By Wanchen Lu 2/25/2013
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
LOGO A Path –Following Method for solving BMI Problems in Control Author: Arash Hassibi Jonathan How Stephen Boyd Presented by: Vu Van PHong American Control.
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
Nonlinear Programming.  A nonlinear program (NLP) is similar to a linear program in that it is composed of an objective function, general constraints,
Learning Stable Multivariate Baseline Models for Outbreak Detection Sajid M. Siddiqi, Byron Boots, Geoffrey J. Gordon, Artur W. Dubrawski The Auton Lab.
Progress in identification of damping: Energy-based method with incomplete and noisy data Marco Prandina University of Liverpool.
Multifactor GPs Suppose now we wish to model different mappings for different styles. We will add a latent style vector s along with x, and define the.
Orthogonalization via Deflation By Achiya Dax Hydrological Service Jerusalem, Israel
Low-Rank Kernel Learning with Bregman Matrix Divergences Brian Kulis, Matyas A. Sustik and Inderjit S. Dhillon Journal of Machine Learning Research 10.
A Note on Rectangular Quotients By Achiya Dax Hydrological Service Jerusalem, Israel
Affine Structure from Motion
EECS 274 Computer Vision Affine Structure from Motion.
A Convergent Solution to Tensor Subspace Learning.
BART VANLUYTEN, JAN C. WILLEMS, BART DE MOOR 44 th IEEE Conference on Decision and Control December 2005 Model Reduction of Systems with Symmetries.
Large-Scale Matrix Factorization with Missing Data under Additional Constraints Kaushik Mitra University of Maryland, College Park, MD Sameer Sheoreyy.
Mathematical Analysis of MaxEnt for Mixed Pixel Decomposition
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
Stats & Summary. The Woodbury Theorem where the inverses.
Improving Performance of The Interior Point Method by Preconditioning Project Proposal by: Ken Ryals For: AMSC Fall 2007-Spring 2008.
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
Matrix Factorization & Singular Value Decomposition Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Dimension reduction (2) EDR space Sliced inverse regression Multi-dimensional LDA Partial Least Squares Network Component analysis.
Lecture XXVII. Orthonormal Bases and Projections Suppose that a set of vectors {x 1,…,x r } for a basis for some space S in R m space such that r  m.
Tijl De Bie John Shawe-Taylor ECS, ISIS, University of Southampton
Introduction The central problems of Linear Algebra are to study the properties of matrices and to investigate the solutions of systems of linear equations.
Introduction The central problems of Linear Algebra are to study the properties of matrices and to investigate the solutions of systems of linear equations.
Optical Flow Estimation and Segmentation of Moving Dynamic Textures
Chapter 6. Large Scale Optimization
Principal Component Analysis
Feature space tansformation methods
A particular discrete dynamical program used as a model for one specific situation in chess involving knights and rooks 22nd EUROPEAN CONFERENCE ON OPERATIONS.
Maths for Signals and Systems Linear Algebra in Engineering Lecture 6, Friday 21st October 2016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR) IN SIGNAL.
Chapter 6. Large Scale Optimization
Presentation transcript:

A Constraint Generation Approach to Learning Stable Linear Dynamical Systems Sajid M. Siddiqi Byron Boots Geoffrey J. Gordon Carnegie Mellon University NIPS 2007 poster W22

steam

Application: Dynamic Textures 1 Videos of moving scenes that exhibit stationarity properties Dynamics can be captured by a low-dimensional model Learned models can efficiently simulate realistic sequences Applications: compression, recognition, synthesis of videos steam river fountain 1 S. Soatto, D. Doretto and Y. Wu. Dynamic Textures. Proceedings of the ICCV, 2001

Linear Dynamical Systems Input data sequence

Linear Dynamical Systems Input data sequence

Linear Dynamical Systems Input data sequence Assume a latent variable that explains the data

Linear Dynamical Systems Input data sequence Assume a latent variable that explains the data

Linear Dynamical Systems Input data sequence Assume a latent variable that explains the data Assume a Markov model on the latent variable

Linear Dynamical Systems Input data sequence Assume a latent variable that explains the data Assume a Markov model on the latent variable

Linear Dynamical Systems 2 State and observation models: Dynamics matrix: Observation matrix: 2 Kalman, R. E. (1960). A new approach to linear filtering and prediction problems. Trans.ASME-JBE

Linear Dynamical Systems 2 State and observation models: Dynamics matrix: Observation matrix: This talk: - Learning LDS parameters from data while ensuring a stable dynamics matrix A more efficiently and accurately than previous methods 2 Kalman, R. E. (1960). A new approach to linear filtering and prediction problems. Trans.ASME-JBE

Learning Linear Dynamical Systems Suppose we have an estimated state sequence

Learning Linear Dynamical Systems Suppose we have an estimated state sequence Define state reconstruction error as the objective:

Learning Linear Dynamical Systems Suppose we have an estimated state sequence Define state reconstruction error as the objective: We would like to learn such that i.e.

Subspace Identification 3 Subspace ID uses matrix decomposition to estimate the state sequence 3 P. Van Overschee and B. De Moor Subspace Identification for Linear Systems. Kluwer, 1996

Subspace Identification 3 Subspace ID uses matrix decomposition to estimate the state sequence Build a Hankel matrix D of stacked observations 3 P. Van Overschee and B. De Moor Subspace Identification for Linear Systems. Kluwer, 1996

Subspace Identification 3 Subspace ID uses matrix decomposition to estimate the state sequence Build a Hankel matrix D of stacked observations 3 P. Van Overschee and B. De Moor Subspace Identification for Linear Systems. Kluwer, 1996

Subspace Identification 3 In expectation, the Hankel matrix is inherently low-rank! 3 P. Van Overschee and B. De Moor Subspace Identification for Linear Systems. Kluwer, 1996

Subspace Identification 3 In expectation, the Hankel matrix is inherently low-rank! 3 P. Van Overschee and B. De Moor Subspace Identification for Linear Systems. Kluwer, 1996

Subspace Identification 3 In expectation, the Hankel matrix is inherently low-rank! 3 P. Van Overschee and B. De Moor Subspace Identification for Linear Systems. Kluwer, 1996

Subspace Identification 3 In expectation, the Hankel matrix is inherently low-rank! Can use SVD to obtain the low-dimensional state sequence 3 P. Van Overschee and B. De Moor Subspace Identification for Linear Systems. Kluwer, 1996

Subspace Identification 3 In expectation, the Hankel matrix is inherently low-rank! Can use SVD to obtain the low-dimensional state sequence 3 P. Van Overschee and B. De Moor Subspace Identification for Linear Systems. Kluwer, 1996 = For D with k observations per column,

Subspace Identification 3 In expectation, the Hankel matrix is inherently low-rank! Can use SVD to obtain the low-dimensional state sequence 3 P. Van Overschee and B. De Moor Subspace Identification for Linear Systems. Kluwer, 1996 = For D with k observations per column,

Lets train an LDS for steam textures using this algorithm, and simulate a video from it! = [ …] x t 2 R 40

Simulating from a learned model xtxt t The model is unstable

Notation 1, …, n : eigenvalues of A (| 1 | > … > | n | ) 1,…, n : unit-length eigenvectors of A  1,…,  n : singular values of A (  1 >  2 > …  n ) S   matrices with |  | · 1 S   matrices with  1 · 1

Stability a matrix A is stable if | 1 | · 1, i.e. if

Stability a matrix A is stable if | 1 | · 1, i.e. if

Stability a matrix A is stable if | 1 | · 1, i.e. if | 1 | = 0.3 | 1 | = 1.295

Stability a matrix A is stable if | 1 | · 1, i.e. if | 1 | = 0.3 | 1 | = x t (1), x t (2)

Stability a matrix A is stable if | 1 | · 1, i.e. if We would like to solve | 1 | = 0.3 | 1 | = x t (1), x t (2) s.t.

Stability and Convexity But S is non-convex!

Stability and Convexity But S is non-convex! A1A1 A2A2

Stability and Convexity But S is non-convex! Lets look at S  instead … – S  is convex – S  µ S A1A1 A2A2

Stability and Convexity But S is non-convex! Lets look at S  instead … – S  is convex – S  µ S A1A1 A2A2

Stability and Convexity But S is non-convex! Lets look at S  instead … – S  is convex – S  µ S Previous work 4 exploits these properties to learn a stable A by solving the semi-definite program 4 S. L. Lacy and D. S. Bernstein. Subspace identification with guaranteed stability using constrained optimization. In Proc. of the ACC (2002), IEEE Trans. Automatic Control (2003) A1A1 A2A2 s.t.

Lets train an LDS for steam again, this time constraining A to be in S 

Simulating from a Lacy-Bernstein stable texture model xtxt t Model is over-constrained. Can we do better?

Formulate the S  approximation of the problem as a semi-definite program (SDP) Start with a quadratic program (QP) relaxation of this SDP, and incrementally add constraints Because the SDP is an inner approximation of the problem, we reach stability early, before reaching the feasible set of the SDP We interpolate the solution to return the best stable matrix possible Our Approach

The Algorithm

objective function contours

The Algorithm A 1 : unconstrained QP solution (least squares estimate) objective function contours

The Algorithm A 1 : unconstrained QP solution (least squares estimate) objective function contours

The Algorithm A 1 : unconstrained QP solution (least squares estimate) A 2 : QP solution after 1 constraint (happens to be stable) objective function contours

The Algorithm A 1 : unconstrained QP solution (least squares estimate) A 2 : QP solution after 1 constraint (happens to be stable) A final : Interpolation of stable solution with the last one objective function contours

The Algorithm A 1 : unconstrained QP solution (least squares estimate) A 2 : QP solution after 1 constraint (happens to be stable) A final : Interpolation of stable solution with the last one A previous method : Lacy Bernstein (2002) objective function contours

Lets train an LDS for steam using constraint generation, and simulate …

Simulating from a Constraint Generation stable texture model xtxt t Model captures more dynamics and is still stable

Least SquaresConstraint Generation

Empirical Evaluation Algorithms: –Constraint Generation – CG (our method) –Lacy and Bernstein (2002) – LB-1 finds a  1 · 1 solution –Lacy and Bernstein (2003)– LB-2 solves a similar problem in a transformed space Data sets –Dynamic textures –Biosurveillance baseline models (see paper)

Reconstruction error Steam video % decrease in objective (lower is better) number of latent dimensions

Running time Steam video Running time (s) number of latent dimensions

Conclusion A novel constraint generation algorithm for learning stable linear dynamical systems SDP relaxation enables us to optimize over a larger set of matrices while being more efficient

Conclusion A novel constraint generation algorithm for learning stable linear dynamical systems SDP relaxation enables us to optimize over a larger set of matrices while being more efficient Future work: –Adding stability constraints to EM –Stable models for more structured dynamic textures

Thank you!

Subspace ID with Hankel matrices Stacking multiple observations in D forces latent states to model the future e.g. annual sunspots data with 12-year cycles = First 2 columns of U k = 1 k = 12 t t

The state space of a stable LDS lies inside some ellipse Stability and Convexity

The state space of a stable LDS lies inside some ellipse The set of matrices that map a particular ellipse into itself (and hence are stable) is convex Stability and Convexity

The state space of a stable LDS lies inside some ellipse The set of matrices that map a particular ellipse into itself (and hence are stable) is convex If we knew in advance which ellipse contains our state space, finding A would be a convex problem. But we don’t ? ? ? Stability and Convexity

The state space of a stable LDS lies inside some ellipse The set of matrices that map a particular ellipse into itself (and hence are stable) is convex If we knew in advance which ellipse contains our state space, finding A would be a convex problem. But we don’t …and the set of all stable matrices is non-convex ? ? ? Stability and Convexity