ICCV Hierarchical Part Matching for Fine-Grained Image Classification

ICCV 2013 Hierarchical Part Matching for Fine-Grained Image Classification
Speaker: Lingxi Xie Authors: Lingxi Xie, Qi Tian, Richang Hong, Shuicheng Yan, Bo Zhang State Key Laboratory of Intelligent Technology and Systems Department of Computer Science and Technology Tsinghua University

Outline Introduction The Bag-of-Feature Model
Hierarchical Part Matching Experimental Results Conclusions 9/19/2018 ICCV Presentation

Image Classification A basic task towards image understanding
General vs. Fine-Grained 9/19/2018 ICCV Presentation

Image-level Vector Compact Feature Codes Visual Vocabulary
Spatial Pooling: Sum Pooling/Max Pooling, Spatial Pyramid Matching [Lazebnik, CVPR06] Geometric Phrase Pooling [Xie, ACMMM12] Compact Feature Codes Hard/Soft/Sparse Coding methods: Vector Quantization ScSPM encoding [Yang, CVPR09] LLC encoding [Wang, CVPR10] Visual Vocabulary Clustering methods: K-Means Hierarchical K-Means [Nister, CVPR06] Approximate K-Means [Philbin, CVPR07] Image Descriptors Gradient-based local descriptors: SIFT [Lowe, IJCV04] HOG[Dalal, CVPR05] Raw Image 9/19/2018 ICCV Presentation

Spatial Pyramid Matching (SPM)
= Part 1 [Lazebnik, CVPR06] = Part 2 = Part 3 = Part 4 = Part 5 9/19/2018 ICCV Presentation

Hierarchical Part Matching (HPM)
= Part 1 [Xie, ICCV13] = Part 2 = Part 3 = Part 4 = Part 5 9/19/2018 ICCV Presentation

Model Overview Key Modules Foreground Inference Part Segmentation
Hierarchical Structure Learning Geometric Phrase Pooling 9/19/2018 ICCV Presentation

Foreground Inference bounding box crown forehead left eye beak nape
right eye throat right wing right leg breast back belly left wing tail left leg 9/19/2018 ICCV Presentation

Grab-Cut Algorithm Foreground Inference definite background possible
9/19/2018 ICCV Presentation

Ultremetric Contour Map Part Segmentation 9/19/2018
ICCV Presentation

Part Segmentation edge response matrix 9/19/2018
ICCV Presentation

Part Segmentation 0.50 0.00 0.85 0.15 9/19/2018 ICCV Presentation

Part Segmentation step penalty = 0.01 0.50 0.00 0.85 0.85 0.50 0.00
0.86 0.86 0.51 0.01 0.86 0.50 0.00 0.85 0.85 0.51 0.51 0.01 0.01 0.86 0.01 0.86 0.86 0.01 0.01 0.86 0.50 0.00 0.00 0.85 0.51 0.01 0.01 0.51 0.01 0.01 0.01 0.01 0.01 0.86 0.16 0.00 0.00 0.00 0.15 0.01 0.01 0.16 0.01 0.01 0.01 0.00 0.00 0.15 0.15 0.01 0.01 0.01 0.01 0.01 0.16 0.16 0.16 0.01 0.16 0.16 step penalty = 0.01 0.01 0.01 0.16 9/19/2018 ICCV Presentation

Part Segmentation 9/19/2018 ICCV Presentation

Part Segmentation back breast left wing belly 9/19/2018
ICCV Presentation

Shortest-Path Algorithm Part Segmentation 9/19/2018
ICCV Presentation

Part Segmentation forehead left eye beak nape throat back breast
left wing belly tail left leg 9/19/2018 ICCV Presentation

Hierarchical Structure Learning
Discovering Mid-Level Parts Part Distance 9/19/2018 ICCV Presentation

Discovering Mid-Level Parts Cost Function when Merging Parts 9/19/2018 ICCV Presentation

Discovering Mid-Level Parts Hierarchical Structure Learning (HSL) Algorithm 9/19/2018 ICCV Presentation

beak + crown + forehead + eyes = head nape + throat = neck back + belly + breast + tail = neck 9/19/2018 ICCV Presentation

best choice μ: controlling the complexity of the model! 9/19/2018 ICCV Presentation

Geometric Phrase Pooling
tail visual words side words visual phrase central word 9/19/2018 ICCV Presentation

ACM Multimedia 2012 - Oral Presentation
1 2 3 4 5 6 7 8 9 A B C D E F central word side words 1 2 3 4 5 6 7 8 9 A B C D E F 1 2 3 4 5 6 7 8 9 A B C D E F 1 2 3 4 5 6 7 8 9 A B C D E F Phrase Vector for 1st Word Pair 9/19/2018 ACM Multimedia Oral Presentation

1 2 3 4 5 6 7 8 9 A B C D E F central word side words 1 2 3 4 5 6 7 8 9 A B C D E F 1 2 3 4 5 6 7 8 9 A B C D E F 1 2 3 4 5 6 7 8 9 A B C D E F Phrase Vector for 2nd Word Pair 9/19/2018 ACM Multimedia Oral Presentation

1 2 3 4 5 6 7 8 9 A B C D E F …… 1 2 3 4 5 6 7 8 9 A B C D E F 1 2 3 4 5 6 7 8 9 A B C D E F 1 2 3 4 5 6 7 8 9 A B C D E F 1 2 3 4 5 6 7 8 9 A B C D E F Phrase Vector for the Visual Phrase 9/19/2018 ACM Multimedia Oral Presentation

Model Summarization Foreground Inference and Part Segmentation
Accurate Segmentation, Better Representation Hierarchical Structure Learning Discovering High-level Semantic Parts Geometric Phrase Pooling Capturing Geometric Information 9/19/2018 ICCV Presentation

Dataset and Annotations
Caltech-UCSD Bird Dataset 200 Bird Categories 11788 Images (at least 55 per Category) Accuracy by Category (5, 10, 20, 30 trainings) Manual Annotation by Web Users At Most 15 Landmarks per Image Beak, Crown, Forehead, Nape, Throat, Left/Right Eyes; Belly, Breast, Back, Tail, Left/Right Wings/Legs. 9/19/2018 ICCV Presentation

Part Segmentation #training 5 10 20 30 Baseline 13.64 20.25 28.36
33.63 +FG Inf. 19.25 27.66 37.08 43.06 +Part Seg. 28.55 40.46 52.52 58.09 9/19/2018 ICCV Presentation

#training 5 10 20 30 No Struct. 28.55 40.46 52.52 58.09 Struct. #1 29.29 41.62 53.36 59.24 Struct. #2 29.75 42.03 53.55 59.32 Struct. #3 30.33 42.66 53.94 59.86 Struct. #4 27.38 38.64 50.22 56.11 9/19/2018 ICCV Presentation

best choice μ: controlling the complexity of the model! 9/19/2018 ICCV Presentation

Geometric Phrase Pooling
#training 5 10 20 30 No Phrase 30.33 42.66 53.94 59.86 GPP (5,5) 31.69 43.80 55.26 60.80 GPP (5,10) 32.23 45.10 56.11 61.93 GPP (5,20) 34.13 47.29 58.60 64.01 GPP (5,40) 36.09 48.87 60.56 65.62 9/19/2018 ICCV Presentation

Comparison #training 5 10 20 30 Wah et.al, TechRep11 10.05 Zhang et.al, CVPR12 24.21 Wang et.al, CVPR10 13.64 20.25 28.36 33.63 Xie et.al, ACMMM12 15.34 22.91 31.01 36.17 Ours 36.09 48.87 60.56 65.62 9/19/2018 ICCV Presentation

Summarization All the Components Help! The HUGE Improvement
Mainly Comes from Part Segmentation Comparison Directly with Previous Methods without Using Landmarks is NOT Fair 9/19/2018 ICCV Presentation

Updates after Publication
Baseline Performance might be Much Better Using Fisher Kernels [Perronnin, ECCV10] Using Deep Features [Donahue, ICML14] Automatically Detected Parts Works Well Landmark Detection [Berg, CVPR13] Symbiotic Localization [Chai, ICCV13] Geometric Segmentation [Gavves, ICCV13] State-of-the-Art ~80%/~70% w/o using Landmarks (30 trainings). 9/19/2018 ICCV Presentation

Main Contributions An Important Clue to Fine-Grained Problem
Part Information is Crucial Automatic Annotation is Required A Complete Flowchart for Part Representation Starting from Landmarks Foreground Inference and Segmentation Hierarchical Structure Learning Transplantable to Other Datasets 9/19/2018 ICCV Presentation

Conclusions and Future Work
Fine-Grained Classification is More Difficult Surprising Inter-class Similarity Discovering Parts is Very Important! BUT, Annotation is still Expensive and Unrealistic Alternative Methods? Template Matching [Yang, NIPS12] Co. Segmentation and Localization [Chai, ICCV13] Geometric Shape Alignment [Gaaves, ICCV13] 9/19/2018 ICCV Presentation

Thank you! Questions please? 9/19/2018 ICCV Presentation

ICCV Hierarchical Part Matching for Fine-Grained Image Classification

Similar presentations

Presentation on theme: "ICCV Hierarchical Part Matching for Fine-Grained Image Classification"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

ICCV Hierarchical Part Matching for Fine-Grained Image Classification

Similar presentations

Presentation on theme: "ICCV Hierarchical Part Matching for Fine-Grained Image Classification"— Presentation transcript:

Similar presentations

About project

Feedback