Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning Layered Motion Segmentations of Video

Similar presentations


Presentation on theme: "Learning Layered Motion Segmentations of Video"— Presentation transcript:

1 Learning Layered Motion Segmentations of Video
UNIVERSITY OF OXFORD Learning Layered Motion Segmentations of Video M. Pawan Kumar Philip Torr Andrew Zisserman

2 Aim Given a video, to learn a model for the object
Input Video Output Model Model should (ideally) describe the object completely and accurately handle self-occlusion be learnt in an unsupervised manner

3 Motivation Object Recognition and Segmentation
Current object recognition methods often learn a model manually Hand-labelling position of parts OR Manually segmenting training images Leibe and Schiele, DAGM ‘04 Borenstein and Ullman, ECCV ‘02

4 Motivation Problem : Such ‘supervised’ methods are manually
intensive and practically infeasible Solution Use readily available data such as videos Automatically learn models which can be used to perform object recognition.

5 Challenges Articulation Self Occlusion Lighting Motion Blur
c(y) = diag(a) c(x) + b c(y) = ∫c(y-m(t)) dt

6 Using a Generative Model
Parameters  Segments (mattes + appearance) Layering Transformations Tt Lighting parameters a and b Motion parameters m obtained using Tt-1 and Tt Latent Image per segment per frame

7 Learning the Model Given a video D we need to learn all model parameters  Segments (mattes + appearance) Layering Transformations Lighting and motion blur parameters We define the posterior Pr( | D) This measures how well the generated frames match the observed data We learn the ‘best’ model by maximizing Pr( | D)

8 Previous Work Sprite-based approach Jojic and Frey – ICCV ’01
Williams and Titsias – Neural Computation ‘04 Restricted to translation, rotation Greedy optimisation Spatial continuity not considered Motion blur, lighting not handled

9 Outline Model Description Learning the Model Results Initial Estimate
Refining Mattes Updating appearance Refining Transformation Results

10 Model Description Layered Representation
Mattes of segments represented as binary masks. Appearance of part – RGB value per point  T – translation, rotation and anisotropic scale factors

11 Layering Layer number li for segment pi
For non-overlapping segments li = lj li > lj

12 Layering Layer number li for segment pi
For non-overlapping segments li = lj li < lj

13 Energy of the model Energy  = -log (Pr(D| ))
Pr( | D) Pr(D| ) Pr() Energy  = -log (Pr(D| )) Maximize Pr( | D) implies Minimize   = Appearance + Boundary

14 Appearance Appearance measures consistency of observed and generated RGB values over the entire video sequence Generated Frames - - - - Observed Frames + Appearance

15 Boundary x y If intensity of x and y are similar, penalty is more.
Boundary gives preference to parts that are separated by edges in most frames x y If intensity of x and y are similar, penalty is more. different, penalty is less. Penalty on Energy 

16 Our Approach 1) An initial estimate of  is obtained by dividing the scene into rigidly moving components. 2) Mattes are optimised using graph cuts. 3) Appearance parameters are updated. 4) Transformation, lighting, motion blur are re-estimated.

17 Outline Model Description Learning the Model Results Initial Estimate
Refining Mattes Updating appearance Refining Transformation Results

18 Rectangular patches fi
1. Initial Estimate Divide Frame n Rectangular patches fi e.g. 3x3 Track Reconstructed Frame n+1

19 Tracking Patches Patch fk Transformation tk … … … … … …
Frame n Patch fk Transformation tk n1 n2 n3 nj (tk) = 0.6 nk MRF over patches Frame n+1

20 Tracking Patches Patch fk Transformation tk … … … … … …
Frame n Patch fk Transformation tk n1 n2 n3 (tk) = 0.9 nj nk MRF over patches Frame n+1

21 Tracking Patches Patch fk Transformation tk … … … … … …
Frame n Patch fk Transformation tk (tk) = 0.7 n1 n2 n3 nj nk MRF over patches Frame n+1

22 Tracking Patches … … … … … … (tj,tk) = d1jk if rigid motion Frame n
nj nk (tj,tk) = d1jk if rigid motion Frame n+1

23 Tracking Patches … … … … … … (tj,tk) = d2 otherwise jk Frame n
nj nk (tj,tk) = d2 otherwise jk Frame n+1

24 Tracking Patches Pr(t) (ti) (ti,tj)
Inference using belief propagation Time complexity Speed-up using Distance Transforms Felzenszwalb and Huttenlocher, NIPS 2004 Memory requirements Coarse-to-fine strategy Vogiatzis et al., BMVC 2004 Multiple coarse labels chosen instead of best one

25 Coarse-to-fine Strategy
Similar labels nj nk Original MRF

26 Coarse-to-fine Strategy
nj (Ti) = maxj (tj) nk Group similar labels into one representative label

27 Coarse-to-fine Strategy
(Ti,Tj) = maxk,l (tk,tl) nj nk Solve the ‘coarser’ MRF using Belief Propagation

28 Coarse-to-fine Strategy
Best Labels nj nk Choose ‘m’ best representative labels per site

29 Coarse-to-fine Strategy
nj nk Expand the labels to obtain a ‘smaller’ MRF

30 Tracking Patches

31 Initial Estimate Cluster rigidly moving points to obtain components
Frame n Frame n+1 Components

32 Initial Estimate Cluster components based on appearance (cross-correlation) Smallest member of a cluster is a segment Components Segments

33 Object is not described completely Layering is not determined
We need to refine this estimate by minimizing  Re-label surrounding points using consistency of motion consistency of texture Form of  suggests using Graph Cuts

34 Graph Cuts Consider the case of two segments.
W(x1,ph) x1 x2 x3 xj W(xj,xk) xk xn Form of energy function. Examples of functions that can be minimized and cannot be minimized. W(xn,pt) pt W(xi,pj) appearance component W(xj,xk) boundary component

35 Graph Cuts ph … … … … … … pt W(x1,ph) x1 x2 x3 xj W(xj,xk) xk xn
W(xn,pt) pt

36 Graph Cuts The energy  is of the form  D(fX) +  V(fX,fY)
V is called regular if V(0,0) + V(1,1) <= V(0,1) + V(1,0) For LPS, V is regular. Theorem : If V is regular, then the minimum cut minimizes energy  -Kolmogorov and Zabih, PAMI ‘04.

37 Multi-way Graph Cuts Each cut assigns label pi and ~pi to points
in binary matte of segment pi Number of cuts = Number of parts Ideally, all cuts must be found simultaneously NP-hard problem -swap/ -expansion algorithm

38 -swap One pair of parts is considered at a time.
Relabel One pair of parts is considered at a time. All other parts are kept fixed. Points belonging to one part can be re-labelled as the other part. Fixed

39 -expansion Iteratively find graph cuts A cut corresponding to one
Refine Iteratively find graph cuts A cut corresponding to one part is considered at a time All other parts are kept fixed Theorem: -expansion finds a strong local minima. Fixed

40 Outline Model Description Learning the Model Results Initial Estimate
Refining Mattes Updating appearance Refining Transformation Results

41 2. Refining Mattes Consider one segment at a time
(along with its neighbouring segments) Segment to be refined Neighbouring Segment

42 2. Refining Mattes   Apply -swap Neighbouring Segment
Segment to be refined Neighbouring Segment

43 2. Refining Mattes   Apply -swap Neighbouring Segment
Segment to be refined Neighbouring Segment

44 2. Refining Mattes  Apply -expansion Neighbouring Segment
Segment to be refined Neighbouring Segment

45 2. Refining Mattes  Apply -expansion Neighbouring Segment
Segment to be refined Neighbouring Segment

46 2. Refining Mattes  Apply -expansion
Refined Segment Neighbouring Segment Iterate over segments till energy  cannot be minimized further.

47 #iterations Mattes Frame 1 Frame 30

48 #iterations Mattes 1 Frame 1 Frame 30

49 #iterations Mattes 2 Frame 1 Frame 30

50 #iterations Mattes 3 Frame 1 Frame 30

51 #iterations Mattes 4 Frame 1 Frame 30

52 #iterations Mattes 5 Frame 1 Frame 30

53 #iterations Mattes 6 Frame 1 Frame 30

54 #iterations Mattes 7 Frame 1 Frame 30

55 #iterations Mattes 8 Frame 1 Frame 30

56 #iterations Mattes 9 Frame 1 Frame 30

57 Outline Model Description Learning the Model Results Initial Estimate
Refining Mattes Updating appearance Refining Transformation Results

58 4. Refining Transformations
3. Updating Appearance Appearance of a point is the mean of RGB values of all visible points it projects onto. 4. Refining Transformations Transformations around initial estimate are explored. The transformation resulting in least SSD is chosen.

59 4. Refining Transformations
3. Updating Appearance Appearance of a point is the mean of RGB values of all visible points it projects onto. 4. Refining Transformations Transformations around initial estimate are explored. The transformation resulting in least SSD is chosen.

60 Outline Model Description Learning the Model Results Initial Estimate
Refining Mattes Updating appearance Refining Transformation Results

61 Results

62 Results – Complex Motion

63 Results – Poor Quality Video

64 Applications The learnt model is used for several applications
Motion Segmentation Object Recognition Object Category Specific Segmentation

65 Object Recognition Matching the model to still images
Multiple shape exemplars and texture examples Extending Pictorial Structures for Object Recognition – BMVC ‘04

66 Class-Specific Segmentation
Global shape prior for graph cut based segmentation OBJ CUT – CVPR ‘05

67 Conclusions and Future Work
We have presented a method for unsupervised learning of a generative model from videos. Applications for object recognition and segmentation are demonstrated. Method needs to be extended to handle various visual aspects.


Download ppt "Learning Layered Motion Segmentations of Video"

Similar presentations


Ads by Google