Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Science of Silly Walks Hedvig Sidenbladh Michael J. Black Department of Computer Science Brown University Royal Inst.

Similar presentations


Presentation on theme: "The Science of Silly Walks Hedvig Sidenbladh Michael J. Black Department of Computer Science Brown University Royal Inst."— Presentation transcript:

1 The Science of Silly Walks Hedvig Sidenbladh Michael J. Black http://www.cs.brown.edu/~black Department of Computer Science Brown University Royal Inst. of Technology, KTH Stockholm Sweden http://www.nada.kth.se/~hedvig

2 Collaborators David Fleet, Xerox PARC Nancy Pollard, Brown University Dirk Ormoneit and Trevor Hastie Dept. of Statistics, Stanford University Allan Jepson, University of Toronto

3 The (Silly) Problem

4 Inferring 3D Human Motion * No special clothing * Monocular, grayscale, sequences (archival data) * Unknown, cluttered, environment * Incremental estimation * Infer 3D human motion from 2D image properties.

5 Why is it Hard? Low contrast Self occlusion Singularities in viewing direction Unusual viewpoints

6 Clothing and Lighting

7 Large Motions Limbs move rapidly with respect to their width. Non-linear dynamics. Motion blur.

8 Ambiguities Where is the leg? Which leg is in front?

9 Ambiguities Accidental alignment

10 Ambiguities Whose legs are whose?Occlusion

11 Inference/Issues Bayesian formulation p(model | cues) = p(cues | model) p(model) 3. Need an effective way to explore the model space (very high dimensional) and represent ambiguities. p(cues) 1.Need a constraining likelihood model that is also invariant to variations in human appearance. 2. Need a prior model of how people move.

12 Simple Body Model * Limbs are truncated cones * Parameter vector of joint angles and angular velocities = 

13 Key Idea #1 (Likelihood) 1. Use the 3D model to predict the location of limb boundaries (not necessarily features) in the scene. 2. Compute various filter responses steered to the predicted orientation of the limb. 3. Compute likelihood of filter responses using a statistical model learned from examples.

14 Example Training Images

15 Edge Filters Normalized derivatives of Gaussians (Lindeberg, Granlund and Knutsson, Perona, Freeman&Adelson, …) Edge filter response steered to limb orientation: Filter responses steered to arm orientation.

16 Distribution of Edge Filter Responses p on (F) p off (F) Likelihood ratio, p on / p off, used for edge detection Geman & Jednyak and Konishi, Yuille, & Coughlan Object specific statistics

17 Other Cues I(x, t) I(x+u, t+1) Ridges Motion

18 Key Idea #2 (Likelihood) “Explain” the entire image. p(image | foreground, background) Generic, unknown, background Foreground person Foreground should explain what the background can’t.

19 Likelihood Steered edge filter responses crude assumption : filter responses independent across scale.

20 Learning Human Motion * constrain the posterior to likely & valid poses/motions * model the variability time joint angles 3D motion-capture data. * Database with multiple actors and a variety of motions. (from M. Gleicher)

21 Key Idea #3 (Prior) Problem: * insufficient data to learn probabilistic model of human motion. Alternative: * the data represents all we know * replace representation and learning with search. (search has to be fast) * De Bonnet & Viola, Efros & Leung, Efros & Freeman, Paztor & Freeman, Hertzmann et al, … Efros & Freeman’01

22 Implicit Empirical Distribution Off-line: learn a low-dimensional model of every n-frame sequence of joint angles and angular velocities (Leventon & Freeman, Ormoneit et al, …) project training data onto model to get small number of coefficients describing each time instant build a tree structured representation

23 “Textural” Model On-line: Given an n-frame input motion project onto low-dimensional model. index in log time using the coefficients. return the best k approximate matches (and form a “proposal” distribution). sample from them and return the n+1 st pose.

24 Synthetic Walker * Colors indicate different training sequences.

25 Synthetic Swing Dancer

26 Bayesian Formulation Posterior over model parameters given an image sequence. Likelihood of observing the image given the model parameters Temporal model (prior) Posterior from previous time instant

27 Key Idea #4 (Ambiguity) Samples from a distribution over 3D poses. * Represent a multi-modal posterior probability distribution over model parameters - sampled representation - each sample is a pose and its probability - predict over time using a particle filtering approach.

28 Particle Filter sample sample normalize PosteriorTemporal dynamics LikelihoodPosterior

29 What does the posterior look like? Shoulder: 3dof Elbow: 1dof Elbow bends

30 Stochastic 3D Tracking * 2500 samples, multiple cues. Preliminary result

31 Conclusions Inferring human motion, silly or not, from video is challenging. We have tackled three important parts of the problem: 1. Probabilistically modeling human appearance in a generic, yet useful, way. 2. Representing the range of possible motions using techniques from texture modeling. 3. Dealing with ambiguities and non-linearities using particle filtering for Bayesian inference.

32 Learned Walking Model * mean walker

33 Learned Walking Model * sample with small 

34 Learned Walking Model * sample with moderate 

35 Learned Walking Model * sample with very large  (Silly-Walk Generator)

36 Preliminary result Tracking with Occlusion 1500 samples, ~2 minutes/frame.

37 Preliminary result Moving Camera 1500 samples, ~2 minutes/frame.

38 Ongoing and Future Work Hybrid Monte Carlo tracker (Choo and Fleet ’01) * analytic, differentiable, likelihood. Learned dynamics. Correlation across scale. Estimate background motion. Statistical models of color and texture. Automatic initialization. Training data and likelihood models to be available in the web.

39 Lessons Learned * Probabilistic (Bayesian) framework allows - integration of information over time - modeling of priors * Particle filtering allows - multi-modal distributions - tracking with ambiguities and non-linear models * Learning image statistics and combining cues improves robustness and reduces computation

40 Outlook 5 years: - Relatively reliable people tracking in monocular video. - Path is pretty clear. … solve the vision problem. Next step: Beyond person-centric - people interacting with object/world Beyond that: Recognizing action - goals, intentions,... … solve the AI problem.

41 Conclusions * Generic, learned, model of appearance. Combines multiple cues. * Exploits work on image statistics. * Use the 3D model to predict features. * Principled way to chose filters. * Model of foreground and background is incorporated into the tracking framework. exploits the ratio between foreground and background likelihood. improves tracking.

42 Motion Blur

43 Requirements 1. Represent uncertainty and multiple hypotheses. 2. Model non-linear dynamics of the body. 3. Exploit image cues in a robust fashion. 4. Integrate information over time. 5. Combine multiple image cues.

44 What Image Cues? Pixels? Temporal differences? Background differences? Edges? Color? Silhouettes?

45 Brightness Constancy I(x, t+1) = I(x+u, t) +  Image motion of foreground as a function of the 3D motion of the body. Problem: no fixed model of appearance (drift).

46 Changing background Low contrast limb boundaries Occlusion Varying shadows Deforming clothing What do people look like? What do non-people look like?

47 Edges as a Cue? Probabilistic model? Under/over-segmentation, thresholds, …

48 Contrast Normalization? Lee, Mumford & Huang

49 Contrast Normalization Maximize difference between distributions * e.g. Bhattarcharyya distance:

50 Local Contrast Normalization

51 Ridge Features Scale specific

52 Ridge Thigh Statistics

53 Brightness Constancy What are the statistics of brightness variation I(x, t) - I(x+u, t+1)? Variation due to clothing, self shadowing, etc. I(x, t)I(x+u, t+1)

54 Brightness Constancy Scale 4 Scale 0

55 Edges

56 Temporal Model: Smooth Motion * individual angles and velocities assumed independent

57 Particle Filtering * large literature (Gordon et al ‘93, Isard & Blake ‘96,…) * non-Gaussian posterior approximated by N discrete samples * explicitly represent the ambiguities * exploit stochastic sampling for tracking

58 Representing the Posterior represented by discrete set of N samples Normalized likelihood:

59 Condensation 1. Selection Sample from posterior at t-1 Most probable states selected most often. 2. Prediction. 3. Updating

60 states p 1. Selection 2. Prediction/Diffusion (sample from ) Models the dynamics: 3. Updating Condensation

61 Condensation 1. Selection 2. Prediction 3. Updating (the distribution) Evaluate new likelihood. Repeat until N new samples have been generated. Compute normalized probability distribution.

62 Temporal Model: Walking Parameters of the generative model are now Probabilistic model for

63 No likelihood * how strong is the walking prior? (or is our likelihood doing anything?)

64 Other Related Work J. Sullivan, A. Blake, M. Isard, and J.MacCormick. Object localization by Bayesian correlation. ICCV’99. J. Sullivan, A. Blake, and J.Rittscher. Statistical foreground modelling for object localisation. ECCV, 2000. J. Rittscher, J. Kato, S. Joga, and A. Blake. A Probabilistic Background Model for Tracking. ECCV, 2000. S. Wachter and H. Nagel. Tracking of persons in monocular image sequences. CVIU, 74(3), 1999.

65 What does the posterior look like? Shoulder: 3dof Elbow: 1dof Elbow bends

66 Statistics of Limbs How do people appear in natural scenes? Want a general model. Edge Filters Ridge Filters

67 Other Related Work * Bregler & Malik: image motion, single hypothesis, full-body required multiple cameras, scaled ortho. * Ju, Black, Yacoob: cardboard person model, image motion, 2D * Deutscher et al: Condensation, edge cues, background subtraction. * Cham& Rehg: known templates, 2D (SPM), particle filter. * Wachter & Nagel: nicely combines motion and edges, single hypothesis (Kalman filter). * Leventon & Freeman: assumes 2D tracking, probabilistic formulation, learned temporal model (full body, monocular, articulated)

68 Open Questions Representation of human motions * model the range of human activity * constrain the estimation to plausible motions Representation of human appearance * (somewhat) invariant to the variation in human appearance * specific enough to constrain the estimation

69 Likelihood Foreground pixels Background pixels

70 Overview * Why is 3D human motion important? * Why is recovering it hard? * A Bayesian approach * generative model * robust likelihood function * temporal prior model (learning) * stochastic search (particle filtering) * Where are we going? * Recent advances & state of the art. * What remains to be done?

71 Problems A simple articulated human model may have 30+ parameters (e.g. joint angles. 60+ w/ velocities). Models of human action are non-linear and likelihood models will be multi-modal. Key challenges Key challenges (common to other domains) representation, learning, and search in high dimensional spaces.

72 Bayesian Formulation Represent a distribution over 3D poses. * define generative model of image appearance * multi-modal posterior over model parameters - sampled representation - particle filtering approach. * focus on image motion as a cue (adding edges,…)

73 Generative Model: Temporal * general smooth motion or, * action-specific motion (walking) First order Markov assumption on angles, , and angular velocity, V: Explore two models of human motion

74 Arm Tracking: Smooth motion prior Particle filter * represents ambiguity * propagates information over time Display: expected value of joint angles.

75 Learning Temporal Models * Motion capture data is noisy, data is missing, activities are performed differently. * For cyclic motion (important but special class): 1. Detect cycles and segment 2. Account for missing data 3. Preserve continuity of cycles 4. Statistical model of variation * Approaches should generalize to non-cyclic motion. (Dirk Ormoneit & Trevor Hastie)

76 Detecting Cycles Automatically detect length of cycles, Automatically segment and align cycles.

77 Modeling Cyclic Motion Automatically align 3D data with a reference curve represented using periodically constrained regression splines.

78 Modeling Cyclic Motion * Iterative SVD method (from gene expression work) * computes SVD in Fourier domain * construct a rank-q approximation and take inverse Fourier transform * impute missing data from the approximation * repeat until convergence. * Segment into cycles, compute mean curve and represent variation by performing PCA on data. * SVD must enforce periodicity and cope with missing data.

79 Issues * Large parameter space * approx. 10000 samples * sparsely represented * not real time * Flow-based models can drift * Requires initialization

80 Conclusions Bayesian formulation for tracking 3D human figures using monocular image information. * Generative model of image appearance. * Non-linear model represents ambiguities, singularities occlusion, etc - sampled representation of posterior. * Particle filtering for incremental estimation. * Automatic learning of cyclic motion prior. Rich framework for modeling the complexity of human motion.

81 Initialization Using 2D Model * Full-body walking model. * Constructed from 3D mocap data. * 2D, view-based (every 30 degrees) * 4 subjects, 14 cycles

82 2D, View-Based Walker * Construct linear optical flow basis * Use similar Bayesian framework for tracking (Black CVPR’99) * Coarse estimate of 3D parameters * Automatic initialization Example Bases:... 0 degrees 90 degrees

83 Recent Results * Box indicates mean position and scale. * Recovers distribution over phase and 3D scale.

84 Motion Converged Dense optical flow. Open questions: appearance change, textural motion. Converging Human motion. Faces: Here we focus on full- body.

85 Truth in Advertising Not about realistic models for synthesizing * faces * clothing * skin * hair Focus on generic models of appearance for human motion capture.

86 Graphics to the Rescue? Hodgins and Pollard ‘97 How big is the parameter space of all possible appearances? Accurately synthesize appearance?

87 Human Appearance

88 Likelihood * To cope with occluded limbs or those viewed at narrow angles, we introduce a probability of occlusion. * likelihood of observing limb j is then * likelihood of the model is product of limb likelihoods

89 Generative Model: Motion t-1 t

90 Learned Walking Model * sample with large 

91 Temporal Model: Walking Parameters of the generative model are now Probabilistic model for

92 Common Assumptions * Multiple Cameras (additional constraints, occlusion) * Color Images (locate face and hands) * Known Background (background subtraction to locate person) * Batch process an entire sequence. * Known Initialization (to be avoided)

93 Ratios for different limbs

94 Modeling Appearance What do people look like? What do non-people look like? How can we model appearance in a way the captures the variability across people, clothing, lighting, pose, …?

95 Ridge Filters Relationship between limb diameter in image and scale of maximum ridge filter response.

96 Ridges

97 BrightnessConstancy Correct position at t Incorrect position at t Vary position at t+1

98 1. Selection 2. Prediction/Diffusion (sample from ) ie from the temporal prior: 1. Compute 2. Sample from 3. Sample from 3. Updating Condensation

99 Visualizing Results Expected value of state parameter

100 Why is it hard? Geometrically under-constrained.

101 Vigil Calculare Watchful computation.

102 Tiny People

103 Why is it Important? Applications Human-Computer Interaction Surveillance Motion capture (games and animation) Video search/annotation Work practice analysis. Social display of puzzlement * detect moving regions * estimate motion * model articulated objects * model temporal patterns of activity * interpret the motion

104 Why is it Hard? The appearance of people can vary dramatically. Bones and joints are unobservable (muscle, skin, clothing hide the underlying structure). (inference)

105 Why is it hard? People can appear in arbitrary poses. They can deform in complex ways. Occlusion results in ambiguities and multiple interpretations.

106 Other Problems * geometrically under-constrained * non-linear dynamics of limbs * similarity of appearance of different limbs (matching ambiguities) * image noise * outliers Our models are approximations. Image changes that are not modeled (e.g. clothing deformation) will be outliers.

107 Bregler and Malik ‘98 State of the Art. * Brightness constancy cue insensitive to appearance * Full-body required multiple cameras. * Single hypothesis. MAP estimate

108 Cham and Rehg ‘99 State of the Art. * Single camera, multiple hypotheses. * 2D templates (solves drift but is view dependent) I(x, t) = I(x+u, 0) + 

109 Deutscher, North, Bascle, & Blake ‘99 State of the Art. * Multiple cameras * Simplified, clothing, lighting and background.

110 Sidenbladh, Black, & Fleet ‘00 * Monocular. Brightness constancy as the only cue. * Significant changes in view and depth. * Template-based methods will fail. State of the Art.

111 Bayesian Inference Exploit cues in the images. Learn likelihood models: p(image cue | model) Build models of human form and motion. Learn priors over model parameters: p(model) Represent the posterior distribution p(model | cue) p(cue | model) p(model)

112 Natural Image Statistics Ruderman. Lee, Mumford, Huang. Portilla and Simoncelli. Olshausen & Field. Xu, Wu, & Mumford. … * Statistics of image derivatives are non-Gaussian. * Consistent across scale.

113 Statistics of Edges Statistics of filter responses, F, on edges, p on (F), differs from background statistics, p off (F). Likelihood ratio, p on / p off, can be used for edge detection and road following. Geman & Jednyak and Konishi, Yuille, & Coughlan What about the object specific statistics of limbs? * edge may be present or not.

114 Distribution of Edge Filter Responses

115 Likelihood Foreground pixels Background pixels

116 Action-Specific Model The joint angles at time t are a linear combination of the basis motions evaluated at phase  Mean curveBasis curves  


Download ppt "The Science of Silly Walks Hedvig Sidenbladh Michael J. Black Department of Computer Science Brown University Royal Inst."

Similar presentations


Ads by Google