1 Articulated Pose Estimation in a Learned Smooth Space of Feasible Solutions Taipeng Tian, Rui Li and Stan Sclaroff Computer Science Dept. Boston University.

1 Articulated Pose Estimation in a Learned Smooth Space of Feasible Solutions Taipeng Tian, Rui Li and Stan Sclaroff Computer Science Dept. Boston University

2 Introduction Motivating application –Gesture Recognition –Fixed Gesture Lexicon. –For example : Aircraft Signaler hand gestures Traffic Controller hand Signals Basketball Referee hand Signals

3 Pose Estimation Problem Definition 2D Projected Marker Positions Input (Observation)Output Silhouette (Alt Moments)

4 Related Work : Pose Estimation from a Single Image Geometry Based –Taylor CVIU ’01 –Barron & Kakadiaris IVC ’01 –Parameswaran & Chellappa CVPR ‘04 Learning Based –Rosales & Sclaroff HUMO ’00 –Agarwal & Triggs CVPR ’04 Others –Lee & Cohen CVPR ’04 –Shakhnarovich, Viola, Darrell ICCV ’03 –Mori, Ren, Efros and Malik CVPR ‘04 –Many more …

5 Idea 1 : Learning Mappings Specialized Mapping Architechture (SMA) [Rosales and Sclaroff NIPS ‘01] Relevance Vector Regression [Agarwal and Triggs CVPR ‘04] Image Features Pose

6 Idea 1 : Learning Mappings Specialized Mapping Architechture (SMA) [Rosales and Sclaroff NIPS ‘01] Relevance Vector Regression [Agarwal and Triggs CVPR ‘04] Image Features Pose

7 Idea 2 : Exploring the Solution Space Simulated Annealing [Deutscher et al. CVPR ’00] Monte Carlo Markov Chain [Lee and Cohen CVPR ‘04] etc …

8 Idea 2 : Exploring the Solution Space Simulated Annealing [Deutscher et al. CVPR ’00] Monte Carlo Markov Chain [Lee and Cohen CVPR ‘04] etc … Accurate model and typically with high DOF. Exploring the pose space for a solution consistent with observations. Difficult for high DOF. Computationally intensive.

9 Key Observations We have a constrained set of poses. Not necessary to explore the full parameter space. Combine two ideas –Learn Mappings –Explore a constrained space (i.e. learned model of body poses) Aircraft Signaler hand gestures Traffic Controller hand Signals Basketball Referee hand Signals

10 Overview of Framework Learn the rendering function Φ(.) Learn a model of human body poses 12 Y: Training Data Learning Phase Pose Inference Phase Input SilhouetteOutput Pose X: Latent Space

11 Learning a Model of Human Poses Gaussian Process Latent Variable Model (GPLVM) [Neil Lawrence NIPS ’04] is used. GPLVM originally used for visualizing high dimensional data Grochow et al. (SIGGRAPH ’03) uses it to solve the inverse kinematics problem for human motion animation. Currently we use it for automated articulated body pose inference

12 Gaussian Process Latent Variable Model (GPLVM) Overview Higher Dimensional Lower Dimensional / Latent Space Probabilistic Mapping

13 GPLVM Training : Learning a Model of Body Poses Given : training set of 2D projected marker positions {y i } (each y i is of D dimension) Goal : Learn parameters Corresponding latent variable values for each training data point Variables related to the Kernel

14 Kernel Function Also known as covariance function. Measures the similarity of the latent variables x and x’. For a data set of size N, we form an N by N kernel matrix K, in which K i,j = k(x i, x j ).

15 For a single dimension, the likelihood of y given the Gaussian Process (GP) model parameters is: Joint likelihood for D dimensions is: GPLVM Training : Learning a Model of Body Poses

16 To learn GPLVM from the training set {yi}, we maximize the following posterior: And placing the priors Negative Log

17 To learn GPLVM from the training set {yi}, we maximize the following posterior: Negative Log Computationally Intensive. A subset is chosen to compute the kernel matrix. This subset of poses is called the Active Set.

18 For a new pair (x,y) we can predict using This eqn. can be used to solve for x given y or y given x, via gradient descent.

19 GPLVM

20 GPLVM

21 GPLVM Left hand raised silhouettes tend to be clustered together

22 GPLVM Does not always do a good job

23 About GPLVM Allows mapping to and from the lower dimensional space. Allows smooth parameterization (i.e. allows derivatives) in lower dimensional space. Two dimensions work well for our data set. (Growchow et al. uses 2-5)

24 Input 2D Pose Silhouettes (Represented using Alt Moments) Learning the Forward/Rendering Function Similar to Rosales and Sclaroff

25 Overview of Framework Learn the rendering function Φ(.) Learn a model of human body poses 12 Y: Training Data Learning Phase Pose Inference Phase Input SilhouetteOutput Pose X: Latent Space

26 Pose Inference Typical Regularization (Also used by Agarwal and Triggs)

27 Data Term Forward function (Rendering function) 2D Projected Marker Positions Silhouette (Alt Moments)

28 Regularization Term Replace with prior knowledge term (i.e the learned model of poses) Independent of feature s

29 Pose Inference Solution obtained using Conjugate Gradient - Initialization using Active Set Also need to talk about initialization

30 Data Collection 12 gestures in the flight director lexicon Synthesize 6000 pairs of (Silhouette, Pose) pairs using Poser 3000 training (Male model) 3000 testing (Female model) 3D Pose Synthesized Silhouettes sampled Uniformly over the frontal view-sphere

31 (a) Silhouette images generated by Poser 5 (Test Set) Experiments (Synthetic Data) (c) Our Approach (b) Estimation from SMA (Specialized Mapping Architecture) (d) Ground Truth

32 Comparison with SMA

33 Additional Constraints Additional constraints can be added to achieve more accurate estimate, e.g. temporal consistency

34 Experiments (Real Data) (d) Our Approach (With Temporal Consistency) (a) Silhouette images of real person (b) SMA (Specialized Mapping Architecture) (c) Our Approach (Without Temporal Consistency)

35 Experiments (Real Data) (a) Silhouette images of real person (b) SMA (Specialized Mapping Architecture) (c) Our Approach (Without Temporal Consistency) (d) Our Approach (With Temporal Consistency)

36 Conclusion Proposed a novel method for Pose estimation for a pre-defined gesture lexicon. Interesting to note that two dimension is enough in our case. Technique is fast. (about 0.1 sec per frame in Matlab) Tracking as an extension. [video]

37 Thank You

38 Comments after the talk Related Works –Bullets / Summary of Strength vs Weakness –Why we need this work? Include year of publication for the related work (eg Rosales Sclaroff work not mentioned, Smichisecu work not mentioned) Order the related work temporally? Include an introduction slide and motivating slide –How to Motivate this work? –State of the art is so and so… We found this common weakness. So we proposed this work.. Human Pose not mentioned in Intro At the end of the talk say why use this work over the others Why GPLVM and not other reduction techniques? Like LLE/PCA/ISOMAP etc Give a top overview of the algorithm. A flow chart view? Explain the L(x,y) mapping using an illustration like the mapping between two planes. Clearly say what is high dimension y and what is low dimension x Give reference for GPLVM or website link. Add a slide on Math of GPLVM The Tikhonov regularization approach of minimizing ||phi(y)-s|| + regularization term. Usually the regularization term is ||Dx|| but now we chose L(x,y). Explain why Slide to talk about temporal constraint. Why learn the rendering function? i.e because we want to take the derivative… Give the numbers for the training set and this gives an idea how good are the quantitative results

39 Related Work Model Based Simulated Annealing [Deutscher et al CVPR ’00] Kinematic Jump Processes [Sminchisescu and Triggs CVPR ’03] Monte Carlo Markov Chain [Lee and Cohen CVPR ‘04] etc … Learning Based Specialized Mapping Architechture (SMA) [Rosales and Sclaroff NIPS ‘01] Relevance Vector Regression [Agarwal and Triggs CVPR ‘04] Parameter Sensitive Hashing [Shakhnarovich et al CVPR ‘03 ] etc …

40 To learn GPLVM from the training set {yi}, we maximize the following posterior: Negative Log

41 Overview of Framework (Learning Phase) Learn the Rendering Function Φ(.) Learning a model of human body poses (Using GPLVM) 12

42 Overview of Framework (Estimation Phase) Input Silhouette Output Pose Search over learned model of human body pose for solution consistent with observation

43 Kernel Function measures the similarity of the latent variables x and x’. For a data set of N, we can form a N by N kernel matrix K, in which Ki,j = k(xi, xj). how correlated x, x’ are in general spread of the function noise in the prediction

44 To learn the parameters of the GPLVM from the training set {yi}, we maximize the following posterior: And placing the priors GPLVM Training : Learning a Model of Body Poses

45 Gaussian Process Latent Variable Model (GPLVM) Low dimensional parameterization Original space representation Express how well the two value matches Space of Feasible Poses

46 For a new pair (x,y) we can predict using

1 Articulated Pose Estimation in a Learned Smooth Space of Feasible Solutions Taipeng Tian, Rui Li and Stan Sclaroff Computer Science Dept. Boston University.

Similar presentations

Presentation on theme: "1 Articulated Pose Estimation in a Learned Smooth Space of Feasible Solutions Taipeng Tian, Rui Li and Stan Sclaroff Computer Science Dept. Boston University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Articulated Pose Estimation in a Learned Smooth Space of Feasible Solutions Taipeng Tian, Rui Li and Stan Sclaroff Computer Science Dept. Boston University.

Similar presentations

Presentation on theme: "1 Articulated Pose Estimation in a Learned Smooth Space of Feasible Solutions Taipeng Tian, Rui Li and Stan Sclaroff Computer Science Dept. Boston University."— Presentation transcript:

Similar presentations

About project

Feedback