Presentation is loading. Please wait.

Presentation is loading. Please wait.

Inference in generative models of images and video John Winn MSR Cambridge May 2004.

Similar presentations


Presentation on theme: "Inference in generative models of images and video John Winn MSR Cambridge May 2004."— Presentation transcript:

1 Inference in generative models of images and video John Winn MSR Cambridge May 2004

2 Overview Generative vs. conditional models Combined approach Inference in the flexible sprite model Extending the model

3 We have an image I and latent variables H which we wish to infer, e.g. object position, orientation, class. There will also be other sources of variability, e.g. illumination, parameterised by θ. Generative vs. conditional models Generative model: P(H, θ, I) Conditional model: P(H, θ|I) or P(H|I)

4 Conditional models use features Features are functions of I which aim to be informative about H but invariant to θ. Edge featuresCorner features Blob features

5 Conditional models Using features f(I), train a conditional model e.g. using labelled data Example: Viola & Jones face recognition using rectangle features and AdaBoost

6 Conditional models Advantages Simple - only model variables of interest Inference is fast - due to use of features and simple model Disadvantages Non-robust Difficult to compare different models Difficult to combine different models

7 Generative models A generative model defines a process of generating the image pixels I from the latent variables H and θ, giving a joint distribution over all variables: P(H, θ, I) Learning and inference carried out using standard machine learning techniques e.g. Expectation Maximisation, MCMC, variational methods. No features!

8 Generative models Example: image modeled as layers of ‘flexible’ sprites.

9 Generative models Advantages Accurate – as the entire image is modeled Can compare different models Can combine different models Can generate new images Disadvantages Inference is difficult due to local minima Inference is slower due to complex model Limitations on model complexity

10 Combined approach Use a generative model, but speed up inference using proposal distributions given by a conditional model. A proposal R(X) suggests a new distribution over some of the latent variables X  H, θ. Inference is extended to allow accepting or rejecting the proposal e.g. depending on whether it improves the model evidence.

11 Using proposals in an MCMC framework Proposals for text and facesAccepted proposals From Tu et al, 2003 Generative model: textured regions combined with face and text models Conditional model: face and text detector using AdaBoost (Viola & Jones)

12 Using proposals in an MCMC framework Proposals for text and facesReconstructed image From Tu et al, 2003 Generative model: textured regions combined with face and text models Conditional model: face and text detector using AdaBoost (Viola & Jones)

13 Proposals in the flexible sprite model

14 Flexible sprite model x Set of images e.g. frames from a video

15 Flexible sprite model x

16 πf x Sprite shape and appearance

17 Flexible sprite model π m f T x Sprite transform for this image (discretised) Transformed mask instance for this image

18 Flexible sprite model π m fb T x Background

19 Inference method & problems Apply variational inference with factorised Q distribution Slow – since we have to search entire discrete transform space Limited size of transform space e.g. translations only (160  120). Many local minima.

20 Proposals in the flexible sprite model π m T We wish to create a proposal R(T). Cannot use features of the image directly until object appearance found. Use features of the inferred mask. proposal

21 Moment-based features Use the first and second moments of the inferred mask as features. Learn a proposal distribution R(T). True location C-of-G of mask Contour of proposal distribution over object location Can also use R to get a probabilistic bound on T.

22 Iteration #1

23 Iteration #2

24 Iteration #3

25 Iteration #4

26 Iteration #5

27 Iteration #6

28 Iteration #7

29 Results on scissors video. On average, ~1% of transform space searched. Always converges, independent of initialisation. OriginalReconstruction Foreground only

30 Beyond translation

31 Extended transform space OriginalReconstruction

32 Extended transform space OriginalReconstruction

33 Extended transform space Normalised video Learned sprite appearance

34 Corner features Learned sprite appearance Masked normalised image

35 Corner feature proposals

36 Preliminary results

37 Future directions

38 Extensions to the generative model Very wide range of possible extensions: Local appearance model e.g. patch-based Multiple layered objects Object classes Illumination modelling Incorporation of object-specific models e.g. faces Articulated models

39 Further investigation of using proposals Investigate other bottom-up features, including: Optical flow Color/texture Use of standard invariant features e.g. SIFT Discriminative models for particular object classes e.g. faces, text

40 π m fb T x N


Download ppt "Inference in generative models of images and video John Winn MSR Cambridge May 2004."

Similar presentations


Ads by Google