Presentation is loading. Please wait.

Presentation is loading. Please wait.

Zhuode Liu 2016/2/13 University of Texas at Austin CS 381V: Visual Recognition Discovering the Spatial Extent of Relative Attributes Xiao and Lee, ICCV.

Similar presentations


Presentation on theme: "Zhuode Liu 2016/2/13 University of Texas at Austin CS 381V: Visual Recognition Discovering the Spatial Extent of Relative Attributes Xiao and Lee, ICCV."— Presentation transcript:

1 Zhuode Liu 2016/2/13 University of Texas at Austin CS 381V: Visual Recognition Discovering the Spatial Extent of Relative Attributes Xiao and Lee, ICCV 2015

2 Outline Problem statement Approach Evaluation Discussion Points

3 Problem statement Discover which part of the image is related to the attribute. (called “spatial extent” in the paper) Rank images w.r.t this attribute Smile [Xiao and Lee 2015]

4 Problem statement Training: many pairs of images, and = Testing: given a new pair, rank the images in it. Attribute: Smile [Xiao and Lee 2015]

5 Why need spatial extent Because many attribute are local So focusing on local parts can give better result It’s also difficult, because the region is not given in training data. Mountainous Pointy

6 Approach [Xiao and Lee 2015] key idea: “visual chain” StrongWeak If I rank these images correctly… Observation: local smoothness of adjacent images Therefore: gradually discover useful information (spatial extent of attribute)

7 Initializing chains Train a ranker with global features [Parikh & Grauman, 2011] Select (say) 5 top ranked images StrongWeak... Slide Credit: Xiao and Lee

8 Initializing chains Search for locally similar-looking patches Slide Credit: Xiao and Lee

9 Search for locally similar-looking patches Solve for Slide Credit: Xiao and Lee Initializing chains

10 Search for locally similar-looking patches Solve for Solve with dynamic programming Slide Credit: Xiao and Lee Initializing chains

11 Compute multiple initial chains Slide Credit: Xiao and Lee Initializing chains

12 Iterative growing visual chains Train a detector for the chain Learn detector Slide Credit: Xiao and Lee

13 Select only a subset, not all. Because the model is still bad. (svm detector & svm ranker) Predicted Attribute Strength StrongWeak Slide Credit: Xiao and Lee iter 1 Iterative growing visual chains

14 Add the selected to a new training set Initial image set Slide Credit: Xiao and Lee

15 Solve for Initial image set Slide Credit: Xiao and Lee Add the selected to a new training set Search for patches again.

16 Search for patches Solve for Solve with dynamic programming Slide Credit: Xiao and Lee

17 Update the detector Learn detector Initial chain Slide Credit: Xiao and Lee

18 Iterative growing of visual chains Select image subset based on ranking Predicted Attribute Strength StrongWeak iter 1 iter 2 iter 3 Slide Credit: Xiao and Lee

19 Creating a chain ensemble Slide Credit: Xiao and Lee

20 Train a SVM ranker for each chain Validation set Attribute: Smile Score: 3/4 Then rank the validation set Slide Credit: Xiao and Lee

21 Creating a chain ensemble ScoresHigh Low Slide Credit: Xiao and Lee

22 Creating a chain ensemble Learn final image-level SVM ranker [Parikh & Grauman 2011] : Dense SIFT or Pool5 activation of AlexNet Slide Credit: Xiao and Lee

23 Evaluation

24 Dataset LFW10 Smile Visible teeth Strong Weak Bald head Dark hair Slide Credit: Xiao and Lee It finds the mouth part is related to #1,2 attribute, and the head part to #3,4 attribute.

25 Dataset UTZAP50K Pointy Sporty Comfort Strong Weak Open Slide Credit: Xiao and Lee For pointy shoes, it discovered not only the toe, but also the heel, because pointy shoes are often high-heeled

26 Results Dataset: LFW-10 Dataset: UTZAP50K Slide Credit: Xiao and Lee not much gain Global: 73.7% -> This: 83.5% Global:74.6% -> This: 84.6%

27 Discussions 1. Drawback: rely on good initialization –Every chain is grown using the initial top (say) 5 images as seed –Whether the algorithm used to initialize the first 5 images gives a good ranking is very important --- if the “local smoothness” does not hold for these 5 images, then Dynamic Programming cannot find good patches. 2. Is there a reason the author only tested on humans and shoes? 3. Given that the approach samples many features densely from many candidate patches, how well does the algorithm scale to large datasets where the key features are much harder to localize than ideal face and shoe views?


Download ppt "Zhuode Liu 2016/2/13 University of Texas at Austin CS 381V: Visual Recognition Discovering the Spatial Extent of Relative Attributes Xiao and Lee, ICCV."

Similar presentations


Ads by Google