Download presentation
Presentation is loading. Please wait.
Published byΔήλια Θεοδωρίδης Modified over 6 years ago
1
ICCV 2013 Hierarchical Part Matching for Fine-Grained Image Classification
Speaker: Lingxi Xie Authors: Lingxi Xie, Qi Tian, Richang Hong, Shuicheng Yan, Bo Zhang State Key Laboratory of Intelligent Technology and Systems Department of Computer Science and Technology Tsinghua University
2
Outline Introduction The Bag-of-Feature Model
Hierarchical Part Matching Experimental Results Conclusions 9/19/2018 ICCV Presentation
3
Outline Introduction The Bag-of-Feature Model
Hierarchical Part Matching Experimental Results Conclusions 9/19/2018 ICCV Presentation
4
Image Classification A basic task towards image understanding
General vs. Fine-Grained 9/19/2018 ICCV Presentation
5
Outline Introduction The Bag-of-Feature Model
Hierarchical Part Matching Experimental Results Conclusions 9/19/2018 ICCV Presentation
6
Image-level Vector Compact Feature Codes Visual Vocabulary
Spatial Pooling: Sum Pooling/Max Pooling, Spatial Pyramid Matching [Lazebnik, CVPR06] Geometric Phrase Pooling [Xie, ACMMM12] Compact Feature Codes Hard/Soft/Sparse Coding methods: Vector Quantization ScSPM encoding [Yang, CVPR09] LLC encoding [Wang, CVPR10] Visual Vocabulary Clustering methods: K-Means Hierarchical K-Means [Nister, CVPR06] Approximate K-Means [Philbin, CVPR07] Image Descriptors Gradient-based local descriptors: SIFT [Lowe, IJCV04] HOG[Dalal, CVPR05] Raw Image 9/19/2018 ICCV Presentation
7
Image-level Vector Compact Feature Codes Visual Vocabulary
Spatial Pooling: Sum Pooling/Max Pooling, Spatial Pyramid Matching [Lazebnik, CVPR06] Geometric Phrase Pooling [Xie, ACMMM12] Compact Feature Codes Hard/Soft/Sparse Coding methods: Vector Quantization ScSPM encoding [Yang, CVPR09] LLC encoding [Wang, CVPR10] Visual Vocabulary Clustering methods: K-Means Hierarchical K-Means [Nister, CVPR06] Approximate K-Means [Philbin, CVPR07] Image Descriptors Gradient-based local descriptors: SIFT [Lowe, IJCV04] HOG[Dalal, CVPR05] Raw Image 9/19/2018 ICCV Presentation
8
Spatial Pyramid Matching (SPM)
= Part 1 [Lazebnik, CVPR06] = Part 2 = Part 3 = Part 4 = Part 5 9/19/2018 ICCV Presentation
9
Hierarchical Part Matching (HPM)
= Part 1 [Xie, ICCV13] = Part 2 = Part 3 = Part 4 = Part 5 9/19/2018 ICCV Presentation
10
Outline Introduction The Bag-of-Feature Model
Hierarchical Part Matching Experimental Results Conclusions 9/19/2018 ICCV Presentation
11
Model Overview Key Modules Foreground Inference Part Segmentation
Hierarchical Structure Learning Geometric Phrase Pooling 9/19/2018 ICCV Presentation
12
Foreground Inference bounding box crown forehead left eye beak nape
right eye throat right wing right leg breast back belly left wing tail left leg 9/19/2018 ICCV Presentation
13
Grab-Cut Algorithm Foreground Inference definite background possible
9/19/2018 ICCV Presentation
14
Ultremetric Contour Map Part Segmentation 9/19/2018
ICCV Presentation
15
Part Segmentation edge response matrix 9/19/2018
ICCV Presentation
16
Part Segmentation 0.50 0.00 0.85 0.15 9/19/2018 ICCV Presentation
17
Part Segmentation step penalty = 0.01 0.50 0.00 0.85 0.85 0.50 0.00
0.86 0.86 0.51 0.01 0.86 0.50 0.00 0.85 0.85 0.51 0.51 0.01 0.01 0.86 0.01 0.86 0.86 0.01 0.01 0.86 0.50 0.00 0.00 0.85 0.51 0.01 0.01 0.51 0.01 0.01 0.01 0.01 0.01 0.86 0.16 0.00 0.00 0.00 0.15 0.01 0.01 0.16 0.01 0.01 0.01 0.00 0.00 0.15 0.15 0.01 0.01 0.01 0.01 0.01 0.16 0.16 0.16 0.01 0.16 0.16 step penalty = 0.01 0.01 0.01 0.16 9/19/2018 ICCV Presentation
18
Part Segmentation 9/19/2018 ICCV Presentation
19
Part Segmentation back breast left wing belly 9/19/2018
ICCV Presentation
20
Shortest-Path Algorithm Part Segmentation 9/19/2018
ICCV Presentation
21
Part Segmentation forehead left eye beak nape throat back breast
left wing belly tail left leg 9/19/2018 ICCV Presentation
22
Hierarchical Structure Learning
Discovering Mid-Level Parts Part Distance 9/19/2018 ICCV Presentation
23
Hierarchical Structure Learning
Discovering Mid-Level Parts Cost Function when Merging Parts 9/19/2018 ICCV Presentation
24
Hierarchical Structure Learning
Discovering Mid-Level Parts Hierarchical Structure Learning (HSL) Algorithm 9/19/2018 ICCV Presentation
25
Hierarchical Structure Learning
beak + crown + forehead + eyes = head nape + throat = neck back + belly + breast + tail = neck 9/19/2018 ICCV Presentation
26
Hierarchical Structure Learning
best choice μ: controlling the complexity of the model! 9/19/2018 ICCV Presentation
27
Geometric Phrase Pooling
tail visual words side words visual phrase central word 9/19/2018 ICCV Presentation
28
ACM Multimedia 2012 - Oral Presentation
1 2 3 4 5 6 7 8 9 A B C D E F central word side words 1 2 3 4 5 6 7 8 9 A B C D E F 1 2 3 4 5 6 7 8 9 A B C D E F 1 2 3 4 5 6 7 8 9 A B C D E F Phrase Vector for 1st Word Pair 9/19/2018 ACM Multimedia Oral Presentation
29
ACM Multimedia 2012 - Oral Presentation
1 2 3 4 5 6 7 8 9 A B C D E F central word side words 1 2 3 4 5 6 7 8 9 A B C D E F 1 2 3 4 5 6 7 8 9 A B C D E F 1 2 3 4 5 6 7 8 9 A B C D E F Phrase Vector for 2nd Word Pair 9/19/2018 ACM Multimedia Oral Presentation
30
ACM Multimedia 2012 - Oral Presentation
1 2 3 4 5 6 7 8 9 A B C D E F …… 1 2 3 4 5 6 7 8 9 A B C D E F 1 2 3 4 5 6 7 8 9 A B C D E F 1 2 3 4 5 6 7 8 9 A B C D E F 1 2 3 4 5 6 7 8 9 A B C D E F Phrase Vector for the Visual Phrase 9/19/2018 ACM Multimedia Oral Presentation
31
Model Summarization Foreground Inference and Part Segmentation
Accurate Segmentation, Better Representation Hierarchical Structure Learning Discovering High-level Semantic Parts Geometric Phrase Pooling Capturing Geometric Information 9/19/2018 ICCV Presentation
32
Outline Introduction The Bag-of-Feature Model
Hierarchical Part Matching Experimental Results Conclusions 9/19/2018 ICCV Presentation
33
Dataset and Annotations
Caltech-UCSD Bird Dataset 200 Bird Categories 11788 Images (at least 55 per Category) Accuracy by Category (5, 10, 20, 30 trainings) Manual Annotation by Web Users At Most 15 Landmarks per Image Beak, Crown, Forehead, Nape, Throat, Left/Right Eyes; Belly, Breast, Back, Tail, Left/Right Wings/Legs. 9/19/2018 ICCV Presentation
34
Part Segmentation #training 5 10 20 30 Baseline 13.64 20.25 28.36
33.63 +FG Inf. 19.25 27.66 37.08 43.06 +Part Seg. 28.55 40.46 52.52 58.09 9/19/2018 ICCV Presentation
35
Hierarchical Structure Learning
#training 5 10 20 30 No Struct. 28.55 40.46 52.52 58.09 Struct. #1 29.29 41.62 53.36 59.24 Struct. #2 29.75 42.03 53.55 59.32 Struct. #3 30.33 42.66 53.94 59.86 Struct. #4 27.38 38.64 50.22 56.11 9/19/2018 ICCV Presentation
36
Hierarchical Structure Learning
best choice μ: controlling the complexity of the model! 9/19/2018 ICCV Presentation
37
Geometric Phrase Pooling
#training 5 10 20 30 No Phrase 30.33 42.66 53.94 59.86 GPP (5,5) 31.69 43.80 55.26 60.80 GPP (5,10) 32.23 45.10 56.11 61.93 GPP (5,20) 34.13 47.29 58.60 64.01 GPP (5,40) 36.09 48.87 60.56 65.62 9/19/2018 ICCV Presentation
38
Comparison #training 5 10 20 30 Wah et.al, TechRep11 10.05 Zhang et.al, CVPR12 24.21 Wang et.al, CVPR10 13.64 20.25 28.36 33.63 Xie et.al, ACMMM12 15.34 22.91 31.01 36.17 Ours 36.09 48.87 60.56 65.62 9/19/2018 ICCV Presentation
39
Summarization All the Components Help! The HUGE Improvement
Mainly Comes from Part Segmentation Comparison Directly with Previous Methods without Using Landmarks is NOT Fair 9/19/2018 ICCV Presentation
40
Updates after Publication
Baseline Performance might be Much Better Using Fisher Kernels [Perronnin, ECCV10] Using Deep Features [Donahue, ICML14] Automatically Detected Parts Works Well Landmark Detection [Berg, CVPR13] Symbiotic Localization [Chai, ICCV13] Geometric Segmentation [Gavves, ICCV13] State-of-the-Art ~80%/~70% w/o using Landmarks (30 trainings). 9/19/2018 ICCV Presentation
41
Outline Introduction The Bag-of-Feature Model
Hierarchical Part Matching Experimental Results Conclusions 9/19/2018 ICCV Presentation
42
Main Contributions An Important Clue to Fine-Grained Problem
Part Information is Crucial Automatic Annotation is Required A Complete Flowchart for Part Representation Starting from Landmarks Foreground Inference and Segmentation Hierarchical Structure Learning Transplantable to Other Datasets 9/19/2018 ICCV Presentation
43
Conclusions and Future Work
Fine-Grained Classification is More Difficult Surprising Inter-class Similarity Discovering Parts is Very Important! BUT, Annotation is still Expensive and Unrealistic Alternative Methods? Template Matching [Yang, NIPS12] Co. Segmentation and Localization [Chai, ICCV13] Geometric Shape Alignment [Gaaves, ICCV13] 9/19/2018 ICCV Presentation
44
Thank you! Questions please? 9/19/2018 ICCV Presentation
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.