Download presentation
Presentation is loading. Please wait.
Published byJalen Swindall Modified over 9 years ago
1
Recovering Human Body Configurations: Combining Segmentation and Recognition Greg Mori, Xiaofeng Ren, and Jitentendra Malik (UC Berkeley) Alexei A. Efros (Oxford)
2
The goal Given an image: Given an image: Detect a human figure Detect a human figure Localize joints and limbs Localize joints and limbs Create a skeleton of their pose Create a skeleton of their pose Create a segmentation mask of the person Create a segmentation mask of the person
3
Other approaches: Simple features Model people as generalized cylinders (1980’s) Model people as generalized cylinders (1980’s) Easily implemented bottom up Easily implemented bottom up Often use tree to express relations Often use tree to express relations Problems: Problems: Cylinders are common Cylinders are common Often dependencies between body parts Often dependencies between body parts Really need context Really need context
4
Other approaches: Probable pose Often use probable pose Often use probable pose Template matching Template matching Top down constraints on pose Top down constraints on pose But even highly improbable poses are still possible But even highly improbable poses are still possible
5
Other approaches: Frequent simplifications Nude models Nude models Limited poses Limited poses Background subtraction or limited clutter Background subtraction or limited clutter
6
“Arguably the most difficult recognition problem in computer vision” Variation in clothing Variation in clothing Variation in limbs Variation in limbs Variation in pose Variation in pose
7
Solution: “Islands of Saliency” Use low-level features that are informative independent of context Use low-level features that are informative independent of context Based on these islands, one is able to fill in gaps with context Based on these islands, one is able to fill in gaps with context
8
Algorithm
9
Algorithm: Segmenting into regions and superpixels
10
Segmentation Combine boundary finder (Martin et al., 2002) with Normalized Cuts (Malik, Belongie, et al., 2001) Combine boundary finder (Martin et al., 2002) with Normalized Cuts (Malik, Belongie, et al., 2001) Groups similar pixels into regions Groups similar pixels into regions
11
Segmentation: Regions 40 regions 40 regions Most salient parts of body become regions Most salient parts of body become regions Limbs usually two “half-limbs” Limbs usually two “half-limbs”
12
Segmentation: Superpixels 200 region (oversegmentation) 200 region (oversegmentation) Retains virtually all structures in original Retains virtually all structures in original Still reduces complexity from 400,000 pixels to 200 superpixels Still reduces complexity from 400,000 pixels to 200 superpixels
13
Algorithm: Finding salient limbs and torsos
14
Finding limbs Candidates: all 40 regions Candidates: all 40 regions Four cues for half-limb detection Four cues for half-limb detection Contour: Probability of the boundary Contour: Probability of the boundary Average probability of the region’s boundary, as measured by Martin’s boundary finder Average probability of the region’s boundary, as measured by Martin’s boundary finder Shape: How close to a rectangle Shape: How close to a rectangle Area of overlap with reconstructed rectangle, Area of overlap with reconstructed rectangle,
15
Find limbs Shading Shading Limbs are roughly cylindrical, so should have 3D pop out due to shading Limbs are roughly cylindrical, so should have 3D pop out due to shading Compare I x-, I x+, I y-, I y+ for region to mean of I x-, I x+, I y-, I y+ for training set Compare I x-, I x+, I y-, I y+ for region to mean of I x-, I x+, I y-, I y+ for training set Focus cue Focus cue Background is often not in focus Background is often not in focus C focus = E high /(a E low + b) C focus = E high /(a E low + b)
16
Finding limbs Cues are combined by summing Cues are combined by summing Use logistic regression to learn weights (training set of hand-labeled half-limbs) Use logistic regression to learn weights (training set of hand-labeled half-limbs)
17
Evaluation: Cues Number of candidates generated Number of hits
18
Evaluation: Performance
19
Evaluation summary Not very good detectors Not very good detectors Strength of boundary best cue Strength of boundary best cue Combining cues yields better performance Combining cues yields better performance On average 4.08 of top 8 candidates produced were hits On average 4.08 of top 8 candidates produced were hits 89% have at least 3 hits among top 8 89% have at least 3 hits among top 8 Motivates search for 3 half-limbs combined with head and torso Motivates search for 3 half-limbs combined with head and torso
20
Finding torsos Unlike half-limbs, typically several regions Unlike half-limbs, typically several regions Consider all sets of adjacent regions within some range of total sizes Consider all sets of adjacent regions within some range of total sizes Set of cues: Set of cues: Contour Contour Shape Shape Focus Focus (No shading) (No shading)
21
Finding torsos Find orientation of torso Find orientation of torso Find best matching head Find best matching head Again contour, shape, and focus cues with shape a disk Again contour, shape, and focus cues with shape a disk Score for torso, score for head, and score for relative positions of head to torso multiplied to create score for oriented torso Score for torso, score for head, and score for relative positions of head to torso multiplied to create score for oriented torso
23
Evaluation Success if all four torso points within 60 pixels of ground truth Success if all four torso points within 60 pixels of ground truth
24
Algorithm: Pruning to form partial configurations
25
Body building From 5-7 half-limbs and ~50 candidate oriented torsos form partial configurations consisting of: From 5-7 half-limbs and ~50 candidate oriented torsos form partial configurations consisting of: Each torso Each torso Three half limbs assigned each assigned to: Three half limbs assigned each assigned to: One of 8 half limb body parts One of 8 half limb body parts One of two polarities One of two polarities 2-3 million partial configurations! 2-3 million partial configurations!
26
Enforce constraints: Relative widths Relative widths Foreshortening doesn’t affect width of limbs much Foreshortening doesn’t affect width of limbs much Use anthropomorphic data to rule out limbs more than 4 standard deviations wider than expected Use anthropomorphic data to rule out limbs more than 4 standard deviations wider than expected Length of limbs relative to torso Length of limbs relative to torso Assume torso not too foreshortened Assume torso not too foreshortened No more than +/- 40% angle with image plane No more than +/- 40% angle with image plane Again, prune limbs more than 4 standard deviations away from mean length, relative to torso Again, prune limbs more than 4 standard deviations away from mean length, relative to torso Seems to be making some assumptions of probable pose Seems to be making some assumptions of probable pose
27
Enforce constraints Adjacency Adjacency Upper limbs must be adjacent to torso Upper limbs must be adjacent to torso Lower limbs must be adjacent to upper limbs Lower limbs must be adjacent to upper limbs Symmetry in clothing: color histograms must not be overly dissimilar for corresponding segments Symmetry in clothing: color histograms must not be overly dissimilar for corresponding segments E.g. right and left upper arms should be similar E.g. right and left upper arms should be similar Makes some small assumptions about variations in clothing Makes some small assumptions about variations in clothing
28
Body building: slimming down Reduces to ~1000 partial configurations Reduces to ~1000 partial configurations Sorted by linear combination of the torso and the three half-limb scores Sorted by linear combination of the torso and the three half-limb scores (This score can be used to improve torso detection) (This score can be used to improve torso detection)
29
Algorithm
30
Extending to full limbs Adding additional rectangles evaluated on adjacent superpixels to empty limb joints Adding additional rectangles evaluated on adjacent superpixels to empty limb joints Want high internal similarity and high dissimilarity to surroundings Want high internal similarity and high dissimilarity to surroundings
31
Algorithm
34
Summary “Arguably the most difficult problem in computer vision” “Arguably the most difficult problem in computer vision” Not solved here Not solved here Method here is appealing: Method here is appealing: Don’t need to store exemplars Don’t need to store exemplars Island of saliency approach seems useful in many contexts Island of saliency approach seems useful in many contexts Use some configural knowledge to make reasonable guesses Use some configural knowledge to make reasonable guesses Good illustration of integrating recognition and segmentation Good illustration of integrating recognition and segmentation
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.