Language of Motion: Hybrid Systems Modeling

Language of Motion: Hybrid Systems Modeling
René Vidal Center for Imaging Science Johns Hopkins University

Recognition of individual and crowd motions
Rigid backgrounds Individual motions Group motions Input video Dynamic backgrounds Crowd motions NSF CAREER : Recognition of Dynamic Activities in Unstructured Environments NSF CDI : A Bio-Inspired Approach to Recognition of Human Movements and Movement Styles

Modeling videos with hybrid systems
Model output with mixture of dynamical models exhibiting changes in Space: multiple motions in a video Time: appearing and disappearing motions in a video Solve a very complex hybrid system identification problem SARX1 SARX2 SARXnt NSF EHS : An Algebraic Geometric Approach to Hybrid System Identification

Overall goals of hybrid system modeling
Bottom-up Modeling The models should compactly capture the underlying structure of the raw motion signal. This will be done by developing methods for hybrid dynamical system (HDS) identification. Top-down Inference The models should capture variations in the motion signal between two instances of the same surgeme, performed by either the same or a different surgeon. Variations may be purely stochastic, due to surgical context or caused by the surgeon's skill level. This will be done using HMMs and ideas from automatic speech recognition. Joint Top-down and Bottom-up Modeling and Inference Identification of structure in the motion signal via a HDS need not be purely data-driven. We will investigate injection of top-down information into HDS identification for surgeme recognition, such as prior distributions on the identified HDS parameters and temporal dependencies in the surgeme sequence.

Specific goals of hybrid system modeling
Data: Motion data: surgical, hand, whole body Video: surgical, whole body Model learning: from data to models Dynamical models (Vidal) Sparse representation techniques for hybrid system identification Language models (Khudanpur) Hidden Markov Models of observed HDS parameters Language models of surgeme sequences “Dynamical language” models (Khudanpur & Vidal) Prior models for supervised hybrid system identification

Model comparison Distances between dynamical models: Binet-Cauchy kernels Distances between discrete trajectories of an HMM Metrics on hybrid systems (Petreczky-Vidal HSCC’07) Model classification Dynamic Boost (Vidal-Favaro ICCV’07) Extending boosting to dynamical systems Bag of dynamical systems (Ravichandran et al. CVPR’09) Using dictionaries of motion primitives to make recognition invariant to changes in Viewpoint Scale Illumination

Outline of today’s talk
What are hybrid dynamical systems (HDS)? How can hybrid systems be used for video Synthesis Registration Classification Segmentation What’s next? Sparse representation techniques for hybrid system identification Distances on hybrid systems for time-series classification Time series classification with invariance Co-registration of motion and video data

Dynamical systems y1 y2 y3 yt Discrete Continuous

Dynamical systems x1 x2 x3 xt y1 y2 y3 yt Hidden Markov Models:
Discrete state Discrete or continuous output Linear Dynamical Systems: Continuous state Continuous output

Dynamical systems x1 x2 x3 xt y1 y2 y3 yt Hybrid Systems: Switched:
q1 q2 q3 qt Hybrid Systems: Switched: Jump Markov:

Dynamical systems x1 x2 x3 xt y1 y2 y3 yt Hybrid Systems: q1 q2 q3 qt
o1 o2 o3 ot Hybrid Systems:

Identification of linear systems
Model is a LDS driven by IID white Gaussian noise Bilinear problem, can do EM Optimal solution: subspace identification (Overschee & de Moor ‘94) PCA-based solution in the absence of noise (Soatto et al. ‘01) Can compute C and z(t) from the SVD of the images Given z(t) solving for A is a linear problem dynamics images appearance

Using linear systems to model time series
Dynamic textures: Soatto ICCV’01 Extract a set of features from the video sequence Spatial filters ICA/PCA Wavelets Intensities of all pixels Human gaits: Bissacco CVPR’01 Model spatiotemporal evolution of features as the output of a linear dynamical system (LDS): Soatto et al. ‘01 dynamics appearance images

Using linear systems for video synthesis
Once a model of a dynamic texture has been learned, one can use it to synthesize novel sequences: Shöld et al. ’00, Soatto et al. ’01, Doretto et al. ’03, Yuan et al. ‘04

Using linear systems for video mosaicing
Given a non-rigid dynamical scene captured through multiple static cameras, we want to register the two sequences spatially and temporally Challenges We are dealing with non-rigid dynamical scenes, where feature tracking and matching is very difficult. We are dealing with both spatial and temporal misalignments. Goal We would like to develop a spatial alignment technique that is invariant to the temporal alignment by reducing video registration to an image registration problem. Remove synch and unsynch A. Ravichandran and R. Vidal, ICCV Workshop on Dynamical Vision, 2007 A. Ravichandran and R. Vidal, European Conference on Computer Vision, 2008

Overview of our approach
System identification Conversion to canonical form Extract SIFT Features Matching System identification Conversion to canonical form Extract SIFT Features A. Ravichandran and R. Vidal, ICCV Workshop on Dynamical Vision, 2007 A. Ravichandran and R. Vidal, European Conference on Computer Vision, 2008

A. Ravichandran and R. Vidal, ICCV Workshop on Dynamical Vision, 2007
Results: format Register RGB Decomposition A. Ravichandran and R. Vidal, ICCV Workshop on Dynamical Vision, 2007 A. Ravichandran and R. Vidal, European Conference on Computer Vision, 2008

Results: non rigid scenes
A. Ravichandran and R. Vidal, ICCV Workshop on Dynamical Vision, 2007 A. Ravichandran and R. Vidal, European Conference on Computer Vision, 2008

Results: more sequences
A. Ravichandran and R. Vidal, ICCV Workshop on Dynamical Vision, 2007 A. Ravichandran and R. Vidal, European Conference on Computer Vision, 2008

Classifying/recognizing novel sequences
Given videos of several classes of dynamic textures, one can use their models to classify new sequences (Saisan et al. ’01) Identify dynamical models for all sequences in the training set Identify a dynamical model for novel sequences Assign novel sequences to the class of its nearest neighbor Requires one to compute a distance between dynamical models Martin distance (Martin ’00) Subspace angles (De Cook ’02 ‘05) Kullback-Leibler divergence (Chan-Vasconcellos ‘07) Binet-Cauchy kernels (Vishwanathan-Smola-Vidal ‘07) V. Vishwanathan, A. Smola, and R. Vidal. Binet Cauchy Kernels on Dynamical Systems and its Application to the Analysis of Dynamic Scenes. International Journal of Computer Vision, 2007

Binet-Cauchy kernels for AR models
Consider two stable AR models Define an embedding Binet-Cauchy kernel Trace kernel for AR models where M satisfies the equation Determinant kernel for AR models where M satisfies the equation V. Vishwanathan, A. Smola, and R. Vidal. Binet Cauchy Kernels on Dynamical Systems and its Application to the Analysis of Dynamic Scenes. International Journal of Computer Vision, 2007

Results: clustering video clips
Kill Bill: Vol 1 (2003) Randomly sample 480 clips from the movie 120 frames each Fit a linear dynamical model to each clip Use trace kernel to compute the k-nearest neighbors of each clip Use Locally Linear Embedding (LLE) for clustering the clips and embedding them in 2D space V. Vishwanathan, A. Smola, and R. Vidal. Binet Cauchy Kernels on Dynamical Systems and its Application to the Analysis of Dynamic Scenes. International Journal of Computer Vision, 2007

Results: clustering video clips
Two people talking Sword fight Person rolling in the snow V. Vishwanathan, A. Smola, and R. Vidal. Binet Cauchy Kernels on Dynamical Systems and its Application to the Analysis of Dynamic Scenes. International Journal of Computer Vision, 2007

Results: dynamic texture recognition
UCLA Database: 200 sequences (75 frames, 160 x 110 pixels), 50 classes, dynamics extracted from 48 x 48 window)

Results: human gait recognition
Weizmann Database: 10 activities R. Chaudry, A. Ravichandran, G. Hager and R. Vidal. Histograms of Oriented Optical Flow and Binet-Cauchy Kernels on Nonlinear Dynamical Systems for the Recognition of Human. CVPR 2009.

Identification of hybrid systems
Given input/output data, identify Number of discrete states Model parameters of linear systems Hybrid state (continuous & discrete) Switching parameters (partition of state space) Piecewise ARX systems Clustering approach: k-means clustering + regression + classification + iterative refinement: (Ferrari-Trecate et al. ‘03) Bayesian approach: ML via EM algorithm (Juloski et al. ’05) Mixed integer quadratic programming: (Bemporad et al. ’01) Greedy/iterative approach: (Bemporad et al. ’03) Switched ARX systems Batch algebraic approach: (Vidal et al. ‘03 ’04, Ma-Vidal ‘05, Bako-Vidal ’07, Lauer et al. ‘09) Recursive algebraic approach: (Vidal et al. ‘04 ’05 ‘07) Support vector regression approach: (Lauer et al. ‘09) NSF 2006: An Algebraic Geometric Approach to Hybrid System Identification

Hybrid systems for temporal segmentation
R. Vidal, Recursive Identification of Switched ARX Systems. Automatica, 2008

Hybrid systems for temporal segmentation
Empty living room Middle-aged man enters Woman enters Young man enters, introduces the woman and leaves Middle-aged man flirts with woman and steals her tiara Middle-aged man checks the time, rises and leaves Woman walks him to the door Woman returns to her seat Woman misses her tiara Woman searches her tiara Woman sits and dismays

Using hybrid systems spatial segmentation
Fixed boundary segmentation results Moving boundary segmentation results Ocean-smoke Ocean-dynamics Ocean-appearance Racoon Ocean-fire A. Ghoreyshi and R. Vidal, Segmenting Dynamic Textures with Ising Descriptors, ARX Models and Level Sets., ECCV Workshop on Dynamical Vision, 2006

Sparse representation techniques for hybrid system identification (Vidal) Extending boosting to dynamical systems? DynamicBoost (Vidal-Favaro ICCV’07) Recognizing videos with multiple dynamic textures Metrics on hybrid systems (Petreczky-Vidal HSCC’07) Bag of dynamical systems: making recognition invariant to changes in Viewpoint Scale Illumination

Sparse hybrid system identification

Bag-of-Words: Sample Topic (Economy)

Bag of dynamical systems
Language of motion primitives Each motion primitive is represented with a dynamical system Motion words are obtained by clustering dynamical systems Ravichandran and Vidal, IEEE Conference on Computer Vision and Pattern Recognition, 2009

Bag of dynamical systems
UCLA database: 200 sequences 50 classes (8 view-inv. classes) Recognition using bag of dynamical systems versus using Doretto et al. Ravichandran and Vidal, IEEE Conference on Computer Vision and Pattern Recognition, 2009

Acknowledgements 2009 Sloan Fellowship ONR YIP N00014-09-1-0839
ONR N ONR N NSF CAREER ISS NSF CNS NSF CNS ARL Robotics-CTA JHU APL NIH RO1 HL082729 WSE-APL NIH-NHLBI JHU Rizwan Chaudhry Atiyeh Ghoreyshi Avinash Ravichandran UIUC Yi Ma Heriot Watt Paolo Favaro Yahoo Alex Smola Purdue SVN Vishwanathan

Language of Motion: Hybrid Systems Modeling

Similar presentations

Presentation on theme: "Language of Motion: Hybrid Systems Modeling"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Language of Motion: Hybrid Systems Modeling

Similar presentations

Presentation on theme: "Language of Motion: Hybrid Systems Modeling"— Presentation transcript:

Similar presentations

About project

Feedback