Learning Layered Motion Segmentations of Video

Learning Layered Motion Segmentations of Video
UNIVERSITY OF OXFORD Learning Layered Motion Segmentations of Video M. Pawan Kumar Philip Torr Andrew Zisserman

Aim Given a video, to learn a model for the object
Input Video Output Model Model should (ideally) describe the object completely and accurately handle self-occlusion be learnt in an unsupervised manner

Motivation Object Recognition and Segmentation
Current object recognition methods often learn a model manually Hand-labelling position of parts OR Manually segmenting training images Leibe and Schiele, DAGM ‘04 Borenstein and Ullman, ECCV ‘02

Motivation Problem : Such ‘supervised’ methods are manually
intensive and practically infeasible Solution Use readily available data such as videos Automatically learn models which can be used to perform object recognition.

Challenges Articulation Self Occlusion Lighting Motion Blur
c(y) = diag(a) c(x) + b c(y) = ∫c(y-m(t)) dt

Using a Generative Model
Parameters  Segments (mattes + appearance) Layering Transformations Tt Lighting parameters a and b Motion parameters m obtained using Tt-1 and Tt Latent Image per segment per frame

Learning the Model Given a video D we need to learn all model parameters  Segments (mattes + appearance) Layering Transformations Lighting and motion blur parameters We define the posterior Pr( | D) This measures how well the generated frames match the observed data We learn the ‘best’ model by maximizing Pr( | D)

Previous Work Sprite-based approach Jojic and Frey – ICCV ’01
Williams and Titsias – Neural Computation ‘04 Restricted to translation, rotation Greedy optimisation Spatial continuity not considered Motion blur, lighting not handled

Outline Model Description Learning the Model Results Initial Estimate
Refining Mattes Updating appearance Refining Transformation Results

Model Description Layered Representation
Mattes of segments represented as binary masks. Appearance of part – RGB value per point  T – translation, rotation and anisotropic scale factors

Layering Layer number li for segment pi
For non-overlapping segments li = lj li > lj

Layering Layer number li for segment pi
For non-overlapping segments li = lj li < lj

Appearance Appearance measures consistency of observed and generated RGB values over the entire video sequence Generated Frames - - - - Observed Frames + Appearance

Boundary x y If intensity of x and y are similar, penalty is more.
Boundary gives preference to parts that are separated by edges in most frames x y If intensity of x and y are similar, penalty is more. different, penalty is less. Penalty on Energy 

Our Approach 1) An initial estimate of  is obtained by dividing the scene into rigidly moving components. 2) Mattes are optimised using graph cuts. 3) Appearance parameters are updated. 4) Transformation, lighting, motion blur are re-estimated.

Rectangular patches fi
1. Initial Estimate Divide Frame n Rectangular patches fi e.g. 3x3 Track Reconstructed Frame n+1

Tracking Patches Patch fk Transformation tk … … … … … …
Frame n Patch fk Transformation tk n1 n2 n3 … nj … … … (tk) = 0.6 nk … … MRF over patches Frame n+1

Frame n Patch fk Transformation tk n1 n2 n3 … (tk) = 0.9 nj … … … nk … … MRF over patches Frame n+1

Frame n Patch fk Transformation tk (tk) = 0.7 n1 n2 n3 … nj … … … nk … … MRF over patches Frame n+1

Tracking Patches … … … … … … (tj,tk) = d1jk if rigid motion Frame n
nj … … … nk … … (tj,tk) = d1jk if rigid motion Frame n+1

Tracking Patches … … … … … … (tj,tk) = d2 otherwise jk Frame n
nj … … … nk … … (tj,tk) = d2 otherwise jk Frame n+1

Tracking Patches Pr(t) (ti) (ti,tj)
Inference using belief propagation Time complexity Speed-up using Distance Transforms Felzenszwalb and Huttenlocher, NIPS 2004 Memory requirements Coarse-to-fine strategy Vogiatzis et al., BMVC 2004 Multiple coarse labels chosen instead of best one

Coarse-to-fine Strategy
… Similar labels nj … … … nk … … Original MRF

… nj … … … (Ti) = maxj (tj) nk … … Group similar labels into one representative label

… (Ti,Tj) = maxk,l (tk,tl) nj … … … nk … … Solve the ‘coarser’ MRF using Belief Propagation

… Best Labels nj … … … nk … … Choose ‘m’ best representative labels per site

… nj … … … nk … … Expand the labels to obtain a ‘smaller’ MRF

Tracking Patches

Initial Estimate Cluster rigidly moving points to obtain components
Frame n Frame n+1 Components

Initial Estimate Cluster components based on appearance (cross-correlation) Smallest member of a cluster is a segment Components Segments

Object is not described completely Layering is not determined
We need to refine this estimate by minimizing  Re-label surrounding points using consistency of motion consistency of texture Form of  suggests using Graph Cuts

Graph Cuts Consider the case of two segments.
W(x1,ph) x1 x2 x3 … xj … … … W(xj,xk) xk … … xn Form of energy function. Examples of functions that can be minimized and cannot be minimized. W(xn,pt) pt W(xi,pj) appearance component W(xj,xk) boundary component

Graph Cuts ph … … … … … … pt W(x1,ph) x1 x2 x3 xj W(xj,xk) xk xn
W(xn,pt) pt

Graph Cuts The energy  is of the form  D(fX) +  V(fX,fY)
V is called regular if V(0,0) + V(1,1) <= V(0,1) + V(1,0) For LPS, V is regular. Theorem : If V is regular, then the minimum cut minimizes energy  -Kolmogorov and Zabih, PAMI ‘04.

Multi-way Graph Cuts Each cut assigns label pi and ~pi to points
in binary matte of segment pi Number of cuts = Number of parts Ideally, all cuts must be found simultaneously NP-hard problem -swap/ -expansion algorithm

-swap One pair of parts is considered at a time.
Relabel One pair of parts is considered at a time. All other parts are kept fixed. Points belonging to one part can be re-labelled as the other part. Fixed

-expansion Iteratively find graph cuts A cut corresponding to one
Refine Iteratively find graph cuts A cut corresponding to one part is considered at a time All other parts are kept fixed Theorem: -expansion finds a strong local minima. Fixed

2. Refining Mattes Consider one segment at a time
(along with its neighbouring segments) Segment to be refined Neighbouring Segment

2. Refining Mattes   Apply -swap Neighbouring Segment
Segment to be refined Neighbouring Segment

2. Refining Mattes  Apply -expansion Neighbouring Segment
Segment to be refined Neighbouring Segment

2. Refining Mattes  Apply -expansion
Refined Segment Neighbouring Segment Iterate over segments till energy  cannot be minimized further.

#iterations Mattes Frame 1 Frame 30

#iterations Mattes 1 Frame 1 Frame 30

4. Refining Transformations
3. Updating Appearance Appearance of a point is the mean of RGB values of all visible points it projects onto. 4. Refining Transformations Transformations around initial estimate are explored. The transformation resulting in least SSD is chosen.

Results

Results – Complex Motion

Results – Poor Quality Video

Applications The learnt model is used for several applications
Motion Segmentation Object Recognition Object Category Specific Segmentation

Object Recognition Matching the model to still images
Multiple shape exemplars and texture examples Extending Pictorial Structures for Object Recognition – BMVC ‘04

Class-Specific Segmentation
Global shape prior for graph cut based segmentation OBJ CUT – CVPR ‘05

Conclusions and Future Work
We have presented a method for unsupervised learning of a generative model from videos. Applications for object recognition and segmentation are demonstrated. Method needs to be extended to handle various visual aspects.

Learning Layered Motion Segmentations of Video

Similar presentations

Presentation on theme: "Learning Layered Motion Segmentations of Video"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Learning Layered Motion Segmentations of Video

Similar presentations

Presentation on theme: "Learning Layered Motion Segmentations of Video"— Presentation transcript:

Similar presentations

About project

Feedback