Learning to estimate human pose with data driven belief propagation Gang Hua, Ming-Hsuan Yang, Ying Wu CVPR 05
Project Goal Learning human motion Need to know the human body configurations Detect human body parts from a single image Where are head, arms, legs, torsos? Estimate human body configurations What are the size, location and orientation? First step (i.e., initialization) for full human body tracking Statistical Inference InputOutput Location/size/ orientation head arms legs
Challenges Large variation in pose Occlusion: some parts are not visible Lighting variation: affects appearance Cluttered background: noisy visual cues High dimensional state variables
Main Idea Analysis by synthesis (i.e., Hypothesize and test) Statistical inference Locate body parts using cues Importance sampling Learn the shapes of human body parts Intelligently guess some possible answers, i.e., assembly of body parts Match each guessed answer with image observation using shape prior and geometry constraints Head sample Torso sample Upper leg sample Which observed assembly looks most likely to be a human? Lower arm sample Image Potential body parts Assembly of body parts Best assembly visual cues & importance sampling local observation & belief propagation
In Plain English Learning shape Collect prior knowledge of body parts Importance sampling Intelligent guess of answer Observation What is seen in image such as appearance, color, and edges Belief Local evidence Belief propagation Inference using all relevant local evidence Potential functions Encode constraints Head sample Torso sample Upper leg sample Which observed assembly looks most likely to be a human? Lower arm sample
Markov Network X i : pose state of each limb Z i : image observation of each limb Ψ ij (X i, X j ): each undirected link represents a potential function Φ i (Z i |X i ): each directed link represents a observation likelihood To infer P(X i |Z) (i.e., P(state variables|image observations )
For each body part, normalize labeled shape and learn a low-dimensional representation, ps i, using probabilistic Principal Component Analysis (PCA) Pose parameters: X i ={ps i, s x, s y, , t x, t y } Normalize the labeled shape (1) Normalized shape, (2) originally labeled shape and (3) reconstructed shape labeled shape (1) (2) (3) Learning Body Shapes
Face Detection for Head Pose AdaBoost-based face detector Detection results are good but not precise 2 class k-means algorithm to cluster skin color pixels The head pose hypothesis Ix h is obtained by re-centering the face rectangle to the centroid of the skin color cluster and then projecting to the head PCA space Gaussian importance function
Image specific skin color segmentation Least square rectangle fitting for lower-arm& upper-leg hypothesis Upper-arm& Lower-leg hypothesis from constrained local search Gaussian mixture importance function Skin color segmentationRectangle fittingUpper-arm& lower-leg search Arm/Leg Importance Functions
Torso Pose Importance Function Probabilistic Hough transform to detect line segments Lines are assembled to quad-shapes and are pruned Canny edge masked likelihood t (n) are evaluated for each good hypothesis Ix t (n) Gaussian mixture importance function Results from Hough transform Torso hypothesis
Potential Constraint Encode physical constraints of human body parts Link points are defined between two adjacent body parts The potential function is defined by a Gaussian radial basis function Defined link points
Likelihood Model Average normalized steered edge response in R, G, B bands Likelihood is the maximum of the three
Experiment: Likelihood Model Translation of the left-lower-legCurve for the likelihood value
Joint Posterior Distribution The joint posterior distribution of the Markov network is where X={X 1, X 2, …, X 9 } The goal is to infer the marginal posterior P(X i |Z) i.e., P( Configuration of body part i | Image observation)
Belief Propagation Message passing Non-Gaussian distribution makes closed form implementation intractable Belief propagation Monte Carlo evidence from neighboring nodes combine with local evidence from observation
Belief Propagation Monte Carlo
Experimental Results
State of the art Proposed methodUSCBrown Set up Single frame Multi-view and video Algorithm Data Driven Belief Propagation Monte Carlo (DDBPMC), Marko Chain Monte Carlo (MCMC), Belief Propagation (BP) and PAMPAS Characteristics efficient, + well posed problem,+ more robust to lighting change, + can be applied to ASIMO directly, ++ extended to full body tracker easily, + numerous experiments, ++ overall: ++ ad-hoc, - not a well posed problem, - may be sensitive to lighting change, - not applicable to ASIMO directly, - may not be extended to full body tracker, - few results are available, -- overall: - systematic, + work for specific environment, - may be sensitive to lighting change, - require multi cameras, -- may be extended to full body tracker, + few results are available, -- overall: + Speed 2 to 3 minute per frame 5+ minute per frameUnknown, but should be more than 3 minutes
Limitations of Current Work Some skin color regions Face in frontal pose Reasonable contrast (visible edges) Low degree of occlusions
Concluding Remarks A novel algorithm for pose estimation Principled statistical formulation in recovering Human pose in 2-D A working prototype Work towards full human body tracking