Real-time on-line learning of transformed hidden Markov models Nemanja Petrovic, Nebojsa Jojic, Brendan Frey and Thomas Huang Microsoft, University of.

Slides:

Advertisements

Similar presentations

TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST

Advertisements

Part 2: Unsupervised Learning

Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.

Clustering. How are we doing on the pass sequence? Pretty good! We can now automatically learn the features needed to track both people But, it sucks.

1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.

By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.

Tuesday, May 7 Integer Programming Formulations Handouts: Lecture Notes.

DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.

MULTIPLICATION EQUATIONS 1. SOLVE FOR X 3. WHAT EVER YOU DO TO ONE SIDE YOU HAVE TO DO TO THE OTHER 2. DIVIDE BY THE NUMBER IN FRONT OF THE VARIABLE.

Mean-Field Theory and Its Applications In Computer Vision1 1.

Bayesian Belief Propagation

Evaluating Window Joins over Unbounded Streams Author: Jaewoo Kang, Jeffrey F. Naughton, Stratis D. Viglas University of Wisconsin-Madison CS Dept. Presenter:

Simulating Decorative Mosaics Alejo Hausner University of Toronto [SIGGRAPH2001]

Vanishing Point Detection and Tracking

Randomized Algorithms Randomized Algorithms CS648 1.

Thomas Jellema & Wouter Van Gool 1 Question. 2Answer.

Computer vision: models, learning and inference

IPIM, IST, José Bioucas, Convolution Operators Spectral Representation Bandlimited Signals/Systems Inverse Operator Null and Range Spaces Sampling,

Linear Time Methods for Propagating Beliefs Min Convolution, Distance Transforms and Box Sums Daniel Huttenlocher Computer Science Department December,

DTAM: Dense Tracking and Mapping in Real-Time

Chapter 5 Test Review Sections 5-1 through 5-4.

Addition 1’s to 20.

25 seconds left…...

. Lecture #8: - Parameter Estimation for HMM with Hidden States: the Baum Welch Training - Viterbi Training - Extensions of HMM Background Readings: Chapters.

We will resume in: 25 Minutes.

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Gaussian Mixture.

Computer Vision Lecture 7: The Fourier Transform

Lirong Xia Reinforcement Learning (2) Tue, March 21, 2014.

Computer Science Department Learning on the Fly: Rapid Adaptation to the Image Erik Learned-Miller with Vidit Jain, Gary Huang, Laura Sevilla Lara, Manju.

all-pairs shortest paths in undirected graphs

1 ECE 776 Project Information-theoretic Approaches for Sensor Selection and Placement in Sensor Networks for Target Localization and Tracking Renita Machado.

Adaptive Segmentation Based on a Learned Quality Metric

Probabilistic Reasoning over Time

By Cynthia Rodriguez University of Texas at San Antonio

Hyeonsoo, Kang. ▫ Structure of the algorithm ▫ Introduction 1.Model learning algorithm 2.[Review HMM] 3.Feature selection algorithm ▫ Results.

Low Complexity Keypoint Recognition and Pose Estimation Vincent Lepetit.

Efficient Inference for Fully-Connected CRFs with Stationarity

Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.

December 5, 2013Computer Vision Lecture 20: Hidden Markov Models/Depth 1 Stereo Vision Due to the limited resolution of images, increasing the baseline.

Joint Estimation of Image Clusters and Image Transformations Brendan J. Frey Computer Science, University of Waterloo, Canada Beckman Institute and ECE,

1 Robust Video Stabilization Based on Particle Filter Tracking of Projected Camera Motion (IEEE 2009) Junlan Yang University of Illinois,Chicago.

Lecture 5: Learning models using EM

Audio-Visual Graphical Models Matthew Beal Gatsby Unit University College London Nebojsa Jojic Microsoft Research Redmond, Washington Hagai Attias Microsoft.

Expectation Maximization Algorithm

Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.

. Expressive Graphical Models in Variational Approximations: Chain-Graphs and Hidden Variables Tal El-Hay & Nir Friedman School of Computer Science & Engineering.

Feature and object tracking algorithms for video tracking Student: Oren Shevach Instructor: Arie nakhmani.

BraMBLe: The Bayesian Multiple-BLob Tracker By Michael Isard and John MacCormick Presented by Kristin Branson CSE 252C, Fall 2003.

EM and expected complete log-likelihood Mixture of Experts

Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.

Discovering Deformable Motifs in Time Series Data Jin Chen CSE Fall 1.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.

Stable Multi-Target Tracking in Real-Time Surveillance Video

Generative Models for Image Understanding Nebojsa Jojic and Thomas Huang Beckman Institute and ECE Dept. University of Illinois.

Epitomic Location Recognition A generative approach for location recognition K. Ni, A. Kannan, A. Criminisi and J. Winn In proc. CVPR Anchorage,

CS Statistical Machine learning Lecture 24

Unsupervised Mining of Statistical Temporal Structures in Video Liu ze yuan May 15,2011.

Visual Tracking by Cluster Analysis Arthur Pece Department of Computer Science University of Copenhagen

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

Visual and auditory scene analysis using graphical models Nebojsa Jojic

Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.

Image Mosaicing with Motion Segmentation from Video Augusto Roman, Taly Gilat EE392J Final Project 03/20/01.

Learning Deep Generative Models by Ruslan Salakhutdinov

LOCUS: Learning Object Classes with Unsupervised Segmentation

Dynamical Statistical Shape Priors for Level Set Based Tracking

Bayesian Models in Machine Learning

Transformation-invariant clustering using the EM algorithm

The EM Algorithm With Applications To Image Epitome

Presentation transcript:

Real-time on-line learning of transformed hidden Markov models Nemanja Petrovic, Nebojsa Jojic, Brendan Frey and Thomas Huang Microsoft, University of Toronto, University of Illinois

2 Six break points vs. six things in video Traditional video segmentation: Find breakpoints Example: MovieMaker (cut and paste) Our goal: Find possibly recurring scenes or objects timeline REPRESENTATIVE FRAMES

3 Transformed hidden Markov model z T x c Class with prior P(c=k) = π k P(z|c) = N(z;μ c,Φ c ) x = Tz Ex = Tμ Var x = TΦT T p(x|c,T) = N(x; Tμ c, TΦ c T T ) Generation is repeated for each frame of the sequence, with the pair (T,c) being the state of a Markov chain. Translation T with uniform prior Latent image z Observed frame x

4 Goal: maximize total likelihood of a dataset log p(X) = log Σ {T,c} Σ z p(X,{T,c},Z) = log Σ {T,c} Σ z q({T,c},Z)p(X,{T,c},Z)/q({T,c},Z) Σ {T,c} Σ z q({T,c},Z)log p(X,{T,c},Z) - Σ {T,c} Σ z q({T,c},Z)log q({T,c},Z) = B We express q(T,c,z) = q({T,c}) * q(Z|{T,c}) {T,c} represents values of transformation and class for all frames, i.e., the path that the video sequence takes through the state space of the model. Instead of the likelihood, we optimize the bound B, which is tight for q=p({T,c},Z|X)

5 Posterior approximation We allow q({T,c}) to have a non-zero probability only on M most probable paths: q({T,c}) = Σ m=1:M r m δ({T,c} - {T,c} * m ) (Viterbi 1982) This reduces a number of problems with adaptive scaling in the exact forward-backward inference.

6 Expensive part of the E step Find quick way to calculate log p(x|c,T) = -N/2 log(2π) – ½ log|TΦ c T T | - ½ (x-Tμ c ) T (TΦ c T T ) -1 (x-Tμ c ) for all possible shifts T in the E step of EM algorithm Shifted cluster mean Tμ Frame x T log p

7 Computing Mahalanobis distance using FFTs = Σ.* T = sum x.*T μ Φ = IFFT FFT(x).* conj ( FFT ) x T (TΦT T ) -1 Tμ = x T T(Φ -1 μ) = x T T(diag Φ -1.* μ) All terms that have to be evaluated for all T can be expressed as correlations, e.g. : μ Φ (where summation is over pixels) N log N versus N 2 !

8 Parameter optimization F = Σ T Σ c Σ z q(T,c,z)log p(X,T,c,z) = Σ T Σ c Σ z q({T,c}) * q(z|{T,c}) x ( logπ {Tc} + Σ time log p(x t,z t |T t,c t ) + Σ time log p(c t+1 |c t ) log p(T t+1 |T t,c t ) ) Solve F/( )=0 for an estimated q.

9 On-line vs. batch EM Example: Update equation for the class mean Σ t Σ T q(T t,c t )E[z|x t,c t,T t ] = Σ t q(c t ) μ c t Batch EM: –solve for μ using all frames. –Inference and parameter optimization iterated. On-line EM: –rewrite the equation for one extra frame –establish the relationsip between μ (t+1) and μ (t). –Parameters updated after each frame. No need for iteration.

10 Reducing the complexity of the M step Σ T q(T t,c t )E[z|x t,c t,T t ] can be expressed as a sum of convolutions. For example, when there is no observation noise, E[z|x t,c t,T t ]= T t T x t, and Σ T q(T t,c t )E[z|x t,c t,T t ] = IFFT (FFT(q).* FFT(x)) (similar trick applies to variance estiamates)

11 Represent pixels on a polar grid! Shifts in the log-polar coordinates correspond to scale and rotation changes in the Cartesian coordiante system How to deal with scale and rotation? rotation scale

12 Estimating the number of classes The algorithm is initialized with a single class A new class is introduced whenever the frame likelihood drops bellow a threshold The classes can be merged in the end to achieve a more compact representation

13 Clustering a 20-minute whale watching video

Clustering a 20-minute beach video

15 0 min9 min Shots from the first class

16 Discovering objects using motion priors Different motion prior predefined for each of the classes Three characteristic frames from 240x320 input sequence Learned means and variances

17 Tracking results

18 Summary Before - CVPR 99/00 28x44 images Grayscale images 1 day of computation for 15 sec video Batch EM Exact inference Fixed number of clusters Limited number of translations Memory inefficient Now 120x160 images Full color images 5-10 frames/sec On-line EM Approximate inference Variable number of clusters All possible translations Memory efficient

19 Sneak preview: Panoramic THMMs z T x c P(c=k) = π k P(z|c) = N(z;μ c,Φ c ) x = WTz Ex = WTμ Var x = WTΦT T W T p(x|c,T) = N(x; WTμ c, WTΦ c T T W T ) WT

20 Video clustering - model Appearancemean variance Camera/object motion Temporal constraints Unsupervised learning – the only input is the video

21 Current implementation DirectShow filter for frame clustering (5-15 frames/sec!) Translation invariance On-line learning algorithm Classes repeating across video Potential applications: –Video segmentation –Content based search/retrieval –Short video summary creation –DVD chapter creation

22 Comparing with layered sprites Perfect segmentation Layered sprites. Jojic, CVPR 01 But, THMM is hundreds/thousands of time faster!

Example with more content