Presentation is loading. Please wait.

Presentation is loading. Please wait.

Michael J. BlackFebruary 2002 Learning the Appearance and Motion of People in Video Hedvig Sidenbladh Michael J. Black Department.

Similar presentations


Presentation on theme: "Michael J. BlackFebruary 2002 Learning the Appearance and Motion of People in Video Hedvig Sidenbladh Michael J. Black Department."— Presentation transcript:

1 Michael J. BlackFebruary 2002 Learning the Appearance and Motion of People in Video Hedvig Sidenbladh Michael J. Black http://www.cs.brown.edu/~black Department of Computer Science Brown University Defense Research Institute Stockholm Sweden http://www.nada.kth.se/~hedvig (The Science of Silly Walks)

2 Michael J. BlackFebruary 2002 Collaborators David Fleet, Xerox PARC Nancy Pollard, Brown University Dirk Ormoneit and Trevor Hastie Dept. of Statistics, Stanford University Allan Jepson, University of Toronto

3 Michael J. BlackFebruary 2002 The (Silly) Problem Unsolved without manual intervention.

4 Michael J. BlackFebruary 2002 Inferring 3D Human Motion * No special clothing * Monocular, grayscale, sequences (archival data) * Unknown, cluttered, environment * Incremental estimation * Infer 3D human motion from 2D image properties.

5 Michael J. BlackFebruary 2002 Why is it Hard? Low contrast Self occlusion Singularities in viewing direction Unusual viewpoints Ambiguous matches

6 Michael J. BlackFebruary 2002 Clothing and Lighting

7 Michael J. BlackFebruary 2002 Large Motions Limbs move rapidly with respect to their width. Non-linear dynamics. Motion blur.

8 Michael J. BlackFebruary 2002 Ambiguities Where is the leg? Which leg is in front?

9 Michael J. BlackFebruary 2002 Ambiguities Accidental alignment

10 Michael J. BlackFebruary 2002 Ambiguities Whose legs are whose? Occlusion

11 Michael J. BlackFebruary 2002 Requirements 1. Represent uncertainty and multiple hypotheses. 2. Model non-linear dynamics of the body. 3. Exploit image cues in a robust fashion. 4. Integrate information over time. 5. Combine multiple image cues.

12 Michael J. BlackFebruary 2002 Simple Body Model * Limbs are truncated cones * Parameter vector of joint angles and angular velocities = 

13 Michael J. BlackFebruary 2002 Inference/Issues Bayesian formulation p(model | cues) = p(cues | model) p(model) 3. Need an effective way to explore the model space (very high dimensional) and represent ambiguities. p(cues) 1. 1.Need a constraining likelihood model that is also invariant to variations in human appearance. 2. Need a prior model of how people move.

14 Michael J. BlackFebruary 2002 What Image Cues? Pixels? Temporal differences? Background differences? Edges? Color? Silhouettes? Optical flow?

15 Michael J. BlackFebruary 2002 Brightness Constancy I(x, t+1) = I(x+u, t) +  Image motion of foreground as a function of the 3D motion of the body. Problem: no fixed model of appearance (drift).

16 Michael J. BlackFebruary 2002 Bregler and Malik ‘98 State of the Art. * Brightness constancy cue insensitive to appearance * Full-body required multiple cameras. * Single hypothesis. MAP estimate

17 Michael J. BlackFebruary 2002 Cham and Rehg ‘99 State of the Art. * Single camera, multiple hypotheses. * 2D templates (solves drift but is view dependent) I(x, t) = I(x+u, 0) + 

18 Michael J. BlackFebruary 2002 Edges as a Cue? Probabilistic model? Under/over-segmentation, thresholds, …

19 Michael J. BlackFebruary 2002 Deutscher, North, Bascle, & Blake ‘99 * Multiple cameras * Simplified, clothing, lighting and background. State of the Art.

20 Michael J. BlackFebruary 2002 Changing background Low contrast limb boundaries Occlusion Varying shadows Deforming clothing What do people look like? What do non-people look like?

21 Michael J. BlackFebruary 2002 Key Idea #1 (Rigorous Likelihood) 1. Use the 3D model to predict the location of limb boundaries (not necessarily features) in the scene. 2. Compute various filter responses steered to the predicted orientation of the limb. 3. Compute likelihood of filter responses using a statistical model learned from examples.

22 Michael J. BlackFebruary 2002 Natural Image Statistics Ruderman. Lee, Mumford, Huang. Portilla and Simoncelli. Olshausen & Field. Xu, Wu, & Mumford. … * Statistics of image derivatives are non-Gaussian. * Consistent across scale.

23 Michael J. BlackFebruary 2002 Statistics of Edges Statistics of filter responses, F, on edges, p on (F), differs from background statistics, p off (F). Likelihood ratio, p on / p off, can be used for edge detection and road following. Geman & Jednyak and Konishi, Yuille, & Coughlan What about the object specific statistics of limbs? * edge may be present or not.

24 Michael J. BlackFebruary 2002 Object-Specific Statistics

25 Michael J. BlackFebruary 2002 Edge Filters Normalized derivatives of Gaussians (Lindeberg, Granlund and Knutsson, Perona, Freeman&Adelson, …) Edge filter response steered to limb orientation: Filter responses steered to arm orientation.

26 Michael J. BlackFebruary 2002 Distribution of Edge Filter Responses p on (F)p off (F)

27 Michael J. BlackFebruary 2002 Contrast Normalization? Lee, Mumford & Huang

28 Michael J. BlackFebruary 2002 Contrast Normalization Maximize difference between distributions * e.g. Bhattarcharyya distance:

29 Michael J. BlackFebruary 2002 Local Contrast Normalization

30 Michael J. BlackFebruary 2002 Ridge Features Scale specific

31 Michael J. BlackFebruary 2002 Ridge Filters Relationship between limb diameter in image and scale of maximum ridge filter response.

32 Michael J. BlackFebruary 2002 Ridge Thigh Statistics

33 Michael J. BlackFebruary 2002 Brightness Constancy What are the statistics of brightness variation I(x, t) - I(x+u, t+1)? Variation due to clothing, self shadowing, etc. I(x, t)I(x+u, t+1)

34 Michael J. BlackFebruary 2002 Brightness Constancy well fit by t-distribution or Cauchy distribution (heavy tails) related to robust statistics

35 Michael J. BlackFebruary 2002 Key Idea #2 (Explain the Image) p(image | foreground, background) Generic, unknown, background Foreground person Foreground should explain what the background can’t. See also McCormick and Isard, ICCV’01.

36 Michael J. BlackFebruary 2002 Likelihood Steered edge filter responses crude assumption : filter responses independent across scale.

37 Michael J. BlackFebruary 2002 Inference/Issues Bayesian formulation p(model | cues) = p(cues | model) p(model) p(cues) 1. 1.Need a constraining likelihood model that is also invariant to variations in human appearance. 2. Need a prior model of how people move.

38 Michael J. BlackFebruary 2002 Learning Human Motion * constrain the posterior to likely & valid poses/motions * model the variability time joint angles 3D motion-capture data. * Database with multiple actors and a variety of motions. (from M. Gleicher)

39 Michael J. BlackFebruary 2002 Key Idea #3 (Trade learning for search.) Problem: * insufficient data to learn a prior probabilistic model of human motion. Alternative: * the data represents all we know * replace representation and learning with search. (challenge: search has to be fast)

40 Michael J. BlackFebruary 2002 Texture Synthesis Efros & Freeman’01 “Database” Synthetic Texture * De Bonnet & Viola, Efros & Leung, Efros & Freeman, Paztor & Freeman, Hertzmann et al, … * Image(s) as an implicit probabilistic model.

41 Michael J. BlackFebruary 2002 Implicit Probabilistic Model Key idea: probabilistic search (log time) of this tree approximates sampling from p(stored sequence | generated sequence).

42 Michael J. BlackFebruary 2002 Synthesis * Colors indicate different training sequences. * For graphics, we need - editability, constraints (ground contact, pose, interpenetration), key frames, style, …

43 Michael J. BlackFebruary 2002 Tracking * Efficiently generate samples (image data will sort out which are good). * Temperature parameter controls randomness of tree search.

44 Michael J. BlackFebruary 2002 Bayesian Formulation Posterior over model parameters given an image sequence. Likelihood of observing the image given the model parameters Temporal model (prior) Posterior from previous time instant

45 Michael J. BlackFebruary 2002 What does the posterior look like? Shoulder: 3dof Elbow: 1dof Elbow bends

46 Michael J. BlackFebruary 2002 Inference/Issues Bayesian formulation p(model | cues) = p(cues | model) p(model) 3. Need an effective way to explore the model space (very high dimensional) and represent ambiguities. p(cues) 1. 1.Need a constraining likelihood model that is also invariant to variations in human appearance. 2. Need a prior model of how people move.

47 Michael J. BlackFebruary 2002 Key Idea #4 (Represent Ambiguity) Samples from a distribution over 3D poses. * Represent a multi-modal posterior probability distribution over model parameters - sampled representation - each sample is a pose and its probability - predict over time using a particle filtering approach.

48 Michael J. BlackFebruary 2002 Particle Filtering * large literature (Gordon et al ‘93, Isard & Blake ‘96,…) * non-Gaussian posterior approximated by N discrete samples * explicitly represent the ambiguities * exploit stochastic sampling for tracking

49 Michael J. BlackFebruary 2002 Particle Filter sample sample normalize PosteriorTemporal dynamics LikelihoodPosterior

50 Michael J. BlackFebruary 2002 Particle Filter Isard & Blake ‘96

51 Michael J. BlackFebruary 2002 Tracking with Occlusion 1500 samples, ~2 minutes/frame.

52 Michael J. BlackFebruary 2002 Moving Camera 1500 samples, ~2 minutes/frame.

53 Michael J. BlackFebruary 2002 Stochastic 3D Tracking * 2500 samples (now down as low as 300 with the new prior).

54 Michael J. BlackFebruary 2002 Conclusions Inferring human motion, silly or not, from video is challenging. We have tackled three important parts of the problem: 1. Probabilistically modeling human appearance in a generic, yet useful, way. 2. Representing the range of possible motions using techniques from texture modeling. 3. Dealing with ambiguities and non-linearities using particle filtering for Bayesian inference.

55 Michael J. BlackFebruary 2002 Ongoing and Future Work Better search algorithms Hybrid Monte Carlo tracker (Choo and Fleet ’01) Covariance scaled sampling (Schiminescu&Triggs’01) Richer prior models of motion. Estimate background motion. Statistical models of color and texture. Automatic initialization. Training data and likelihood models to be available in the web.


Download ppt "Michael J. BlackFebruary 2002 Learning the Appearance and Motion of People in Video Hedvig Sidenbladh Michael J. Black Department."

Similar presentations


Ads by Google