Zhuode Liu 2016/2/13 University of Texas at Austin CS 381V: Visual Recognition Discovering the Spatial Extent of Relative Attributes Xiao and Lee, ICCV 2015
Outline Problem statement Approach Evaluation Discussion Points
Problem statement Discover which part of the image is related to the attribute. (called “spatial extent” in the paper) Rank images w.r.t this attribute Smile [Xiao and Lee 2015]
Problem statement Training: many pairs of images, and = Testing: given a new pair, rank the images in it. Attribute: Smile [Xiao and Lee 2015]
Why need spatial extent Because many attribute are local So focusing on local parts can give better result It’s also difficult, because the region is not given in training data. Mountainous Pointy
Approach [Xiao and Lee 2015] key idea: “visual chain” StrongWeak If I rank these images correctly… Observation: local smoothness of adjacent images Therefore: gradually discover useful information (spatial extent of attribute)
Initializing chains Train a ranker with global features [Parikh & Grauman, 2011] Select (say) 5 top ranked images StrongWeak... Slide Credit: Xiao and Lee
Initializing chains Search for locally similar-looking patches Slide Credit: Xiao and Lee
Search for locally similar-looking patches Solve for Slide Credit: Xiao and Lee Initializing chains
Search for locally similar-looking patches Solve for Solve with dynamic programming Slide Credit: Xiao and Lee Initializing chains
Compute multiple initial chains Slide Credit: Xiao and Lee Initializing chains
Iterative growing visual chains Train a detector for the chain Learn detector Slide Credit: Xiao and Lee
Select only a subset, not all. Because the model is still bad. (svm detector & svm ranker) Predicted Attribute Strength StrongWeak Slide Credit: Xiao and Lee iter 1 Iterative growing visual chains
Add the selected to a new training set Initial image set Slide Credit: Xiao and Lee
Solve for Initial image set Slide Credit: Xiao and Lee Add the selected to a new training set Search for patches again.
Search for patches Solve for Solve with dynamic programming Slide Credit: Xiao and Lee
Update the detector Learn detector Initial chain Slide Credit: Xiao and Lee
Iterative growing of visual chains Select image subset based on ranking Predicted Attribute Strength StrongWeak iter 1 iter 2 iter 3 Slide Credit: Xiao and Lee
Creating a chain ensemble Slide Credit: Xiao and Lee
Train a SVM ranker for each chain Validation set Attribute: Smile Score: 3/4 Then rank the validation set Slide Credit: Xiao and Lee
Creating a chain ensemble ScoresHigh Low Slide Credit: Xiao and Lee
Creating a chain ensemble Learn final image-level SVM ranker [Parikh & Grauman 2011] : Dense SIFT or Pool5 activation of AlexNet Slide Credit: Xiao and Lee
Evaluation
Dataset LFW10 Smile Visible teeth Strong Weak Bald head Dark hair Slide Credit: Xiao and Lee It finds the mouth part is related to #1,2 attribute, and the head part to #3,4 attribute.
Dataset UTZAP50K Pointy Sporty Comfort Strong Weak Open Slide Credit: Xiao and Lee For pointy shoes, it discovered not only the toe, but also the heel, because pointy shoes are often high-heeled
Results Dataset: LFW-10 Dataset: UTZAP50K Slide Credit: Xiao and Lee not much gain Global: 73.7% -> This: 83.5% Global:74.6% -> This: 84.6%
Discussions 1. Drawback: rely on good initialization –Every chain is grown using the initial top (say) 5 images as seed –Whether the algorithm used to initialize the first 5 images gives a good ranking is very important --- if the “local smoothness” does not hold for these 5 images, then Dynamic Programming cannot find good patches. 2. Is there a reason the author only tested on humans and shoes? 3. Given that the approach samples many features densely from many candidate patches, how well does the algorithm scale to large datasets where the key features are much harder to localize than ideal face and shoe views?