Download presentation
Presentation is loading. Please wait.
Published byGregor Ewald Baumhauer Modified over 6 years ago
1
Learning Layered Motion Segmentations of Video
UNIVERSITY OF OXFORD Learning Layered Motion Segmentations of Video M. Pawan Kumar Philip Torr Andrew Zisserman
2
Aim Given a video, to learn a model for the object
Input Video Output Model Model should (ideally) describe the object completely and accurately handle self-occlusion be learnt in an unsupervised manner
3
Motivation Object Recognition and Segmentation
Current object recognition methods often learn a model manually Hand-labelling position of parts OR Manually segmenting training images Leibe and Schiele, DAGM ‘04 Borenstein and Ullman, ECCV ‘02
4
Motivation Problem : Such ‘supervised’ methods are manually
intensive and practically infeasible Solution Use readily available data such as videos Automatically learn models which can be used to perform object recognition.
5
Challenges Articulation Self Occlusion Lighting Motion Blur
c(y) = diag(a) c(x) + b c(y) = ∫c(y-m(t)) dt
6
Using a Generative Model
Parameters Segments (mattes + appearance) Layering Transformations Tt Lighting parameters a and b Motion parameters m obtained using Tt-1 and Tt Latent Image per segment per frame
7
Learning the Model Given a video D we need to learn all model parameters Segments (mattes + appearance) Layering Transformations Lighting and motion blur parameters We define the posterior Pr( | D) This measures how well the generated frames match the observed data We learn the ‘best’ model by maximizing Pr( | D)
8
Previous Work Sprite-based approach Jojic and Frey – ICCV ’01
Williams and Titsias – Neural Computation ‘04 Restricted to translation, rotation Greedy optimisation Spatial continuity not considered Motion blur, lighting not handled
9
Outline Model Description Learning the Model Results Initial Estimate
Refining Mattes Updating appearance Refining Transformation Results
10
Model Description Layered Representation
Mattes of segments represented as binary masks. Appearance of part – RGB value per point T – translation, rotation and anisotropic scale factors
11
Layering Layer number li for segment pi
For non-overlapping segments li = lj li > lj
12
Layering Layer number li for segment pi
For non-overlapping segments li = lj li < lj
13
Energy of the model Energy = -log (Pr(D| ))
Pr( | D) Pr(D| ) Pr() Energy = -log (Pr(D| )) Maximize Pr( | D) implies Minimize = Appearance + Boundary
14
Appearance Appearance measures consistency of observed and generated RGB values over the entire video sequence Generated Frames - - - - Observed Frames + Appearance
15
Boundary x y If intensity of x and y are similar, penalty is more.
Boundary gives preference to parts that are separated by edges in most frames x y If intensity of x and y are similar, penalty is more. different, penalty is less. Penalty on Energy
16
Our Approach 1) An initial estimate of is obtained by dividing the scene into rigidly moving components. 2) Mattes are optimised using graph cuts. 3) Appearance parameters are updated. 4) Transformation, lighting, motion blur are re-estimated.
17
Outline Model Description Learning the Model Results Initial Estimate
Refining Mattes Updating appearance Refining Transformation Results
18
Rectangular patches fi
1. Initial Estimate Divide Frame n Rectangular patches fi e.g. 3x3 Track Reconstructed Frame n+1
19
Tracking Patches Patch fk Transformation tk … … … … … …
Frame n Patch fk Transformation tk n1 n2 n3 … nj … … … (tk) = 0.6 nk … … MRF over patches Frame n+1
20
Tracking Patches Patch fk Transformation tk … … … … … …
Frame n Patch fk Transformation tk n1 n2 n3 … (tk) = 0.9 nj … … … nk … … MRF over patches Frame n+1
21
Tracking Patches Patch fk Transformation tk … … … … … …
Frame n Patch fk Transformation tk (tk) = 0.7 n1 n2 n3 … nj … … … nk … … MRF over patches Frame n+1
22
Tracking Patches … … … … … … (tj,tk) = d1jk if rigid motion Frame n
nj … … … nk … … (tj,tk) = d1jk if rigid motion Frame n+1
23
Tracking Patches … … … … … … (tj,tk) = d2 otherwise jk Frame n
nj … … … nk … … (tj,tk) = d2 otherwise jk Frame n+1
24
Tracking Patches Pr(t) (ti) (ti,tj)
Inference using belief propagation Time complexity Speed-up using Distance Transforms Felzenszwalb and Huttenlocher, NIPS 2004 Memory requirements Coarse-to-fine strategy Vogiatzis et al., BMVC 2004 Multiple coarse labels chosen instead of best one
25
Coarse-to-fine Strategy
… Similar labels nj … … … nk … … Original MRF
26
Coarse-to-fine Strategy
… nj … … … (Ti) = maxj (tj) nk … … Group similar labels into one representative label
27
Coarse-to-fine Strategy
… (Ti,Tj) = maxk,l (tk,tl) nj … … … nk … … Solve the ‘coarser’ MRF using Belief Propagation
28
Coarse-to-fine Strategy
… Best Labels nj … … … nk … … Choose ‘m’ best representative labels per site
29
Coarse-to-fine Strategy
… nj … … … nk … … Expand the labels to obtain a ‘smaller’ MRF
30
Tracking Patches
31
Initial Estimate Cluster rigidly moving points to obtain components
Frame n Frame n+1 Components
32
Initial Estimate Cluster components based on appearance (cross-correlation) Smallest member of a cluster is a segment Components Segments
33
Object is not described completely Layering is not determined
We need to refine this estimate by minimizing Re-label surrounding points using consistency of motion consistency of texture Form of suggests using Graph Cuts
34
Graph Cuts Consider the case of two segments.
W(x1,ph) x1 x2 x3 … xj … … … W(xj,xk) xk … … xn Form of energy function. Examples of functions that can be minimized and cannot be minimized. W(xn,pt) pt W(xi,pj) appearance component W(xj,xk) boundary component
35
Graph Cuts ph … … … … … … pt W(x1,ph) x1 x2 x3 xj W(xj,xk) xk xn
W(xn,pt) pt
36
Graph Cuts The energy is of the form D(fX) + V(fX,fY)
V is called regular if V(0,0) + V(1,1) <= V(0,1) + V(1,0) For LPS, V is regular. Theorem : If V is regular, then the minimum cut minimizes energy -Kolmogorov and Zabih, PAMI ‘04.
37
Multi-way Graph Cuts Each cut assigns label pi and ~pi to points
in binary matte of segment pi Number of cuts = Number of parts Ideally, all cuts must be found simultaneously NP-hard problem -swap/ -expansion algorithm
38
-swap One pair of parts is considered at a time.
Relabel One pair of parts is considered at a time. All other parts are kept fixed. Points belonging to one part can be re-labelled as the other part. Fixed
39
-expansion Iteratively find graph cuts A cut corresponding to one
Refine Iteratively find graph cuts A cut corresponding to one part is considered at a time All other parts are kept fixed Theorem: -expansion finds a strong local minima. Fixed
40
Outline Model Description Learning the Model Results Initial Estimate
Refining Mattes Updating appearance Refining Transformation Results
41
2. Refining Mattes Consider one segment at a time
(along with its neighbouring segments) Segment to be refined Neighbouring Segment
42
2. Refining Mattes Apply -swap Neighbouring Segment
Segment to be refined Neighbouring Segment
43
2. Refining Mattes Apply -swap Neighbouring Segment
Segment to be refined Neighbouring Segment
44
2. Refining Mattes Apply -expansion Neighbouring Segment
Segment to be refined Neighbouring Segment
45
2. Refining Mattes Apply -expansion Neighbouring Segment
Segment to be refined Neighbouring Segment
46
2. Refining Mattes Apply -expansion
Refined Segment Neighbouring Segment Iterate over segments till energy cannot be minimized further.
47
#iterations Mattes Frame 1 Frame 30
48
#iterations Mattes 1 Frame 1 Frame 30
49
#iterations Mattes 2 Frame 1 Frame 30
50
#iterations Mattes 3 Frame 1 Frame 30
51
#iterations Mattes 4 Frame 1 Frame 30
52
#iterations Mattes 5 Frame 1 Frame 30
53
#iterations Mattes 6 Frame 1 Frame 30
54
#iterations Mattes 7 Frame 1 Frame 30
55
#iterations Mattes 8 Frame 1 Frame 30
56
#iterations Mattes 9 Frame 1 Frame 30
57
Outline Model Description Learning the Model Results Initial Estimate
Refining Mattes Updating appearance Refining Transformation Results
58
4. Refining Transformations
3. Updating Appearance Appearance of a point is the mean of RGB values of all visible points it projects onto. 4. Refining Transformations Transformations around initial estimate are explored. The transformation resulting in least SSD is chosen.
59
4. Refining Transformations
3. Updating Appearance Appearance of a point is the mean of RGB values of all visible points it projects onto. 4. Refining Transformations Transformations around initial estimate are explored. The transformation resulting in least SSD is chosen.
60
Outline Model Description Learning the Model Results Initial Estimate
Refining Mattes Updating appearance Refining Transformation Results
61
Results
62
Results – Complex Motion
63
Results – Poor Quality Video
64
Applications The learnt model is used for several applications
Motion Segmentation Object Recognition Object Category Specific Segmentation
65
Object Recognition Matching the model to still images
Multiple shape exemplars and texture examples Extending Pictorial Structures for Object Recognition – BMVC ‘04
66
Class-Specific Segmentation
Global shape prior for graph cut based segmentation OBJ CUT – CVPR ‘05
67
Conclusions and Future Work
We have presented a method for unsupervised learning of a generative model from videos. Applications for object recognition and segmentation are demonstrated. Method needs to be extended to handle various visual aspects.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.