ICCV 2013 Hierarchical Part Matching for Fine-Grained Image Classification Speaker: Lingxi Xie Authors: Lingxi Xie, Qi Tian, Richang Hong, Shuicheng Yan, Bo Zhang State Key Laboratory of Intelligent Technology and Systems Department of Computer Science and Technology Tsinghua University http://www.tsinghua.edu.cn
Outline Introduction The Bag-of-Feature Model Hierarchical Part Matching Experimental Results Conclusions 9/19/2018 ICCV 2013 - Presentation
Outline Introduction The Bag-of-Feature Model Hierarchical Part Matching Experimental Results Conclusions 9/19/2018 ICCV 2013 - Presentation
Image Classification A basic task towards image understanding General vs. Fine-Grained 9/19/2018 ICCV 2013 - Presentation
Outline Introduction The Bag-of-Feature Model Hierarchical Part Matching Experimental Results Conclusions 9/19/2018 ICCV 2013 - Presentation
Image-level Vector Compact Feature Codes Visual Vocabulary Spatial Pooling: Sum Pooling/Max Pooling, Spatial Pyramid Matching [Lazebnik, CVPR06] Geometric Phrase Pooling [Xie, ACMMM12] Compact Feature Codes Hard/Soft/Sparse Coding methods: Vector Quantization ScSPM encoding [Yang, CVPR09] LLC encoding [Wang, CVPR10] Visual Vocabulary Clustering methods: K-Means Hierarchical K-Means [Nister, CVPR06] Approximate K-Means [Philbin, CVPR07] Image Descriptors Gradient-based local descriptors: SIFT [Lowe, IJCV04] HOG[Dalal, CVPR05] Raw Image 9/19/2018 ICCV 2013 - Presentation
Image-level Vector Compact Feature Codes Visual Vocabulary Spatial Pooling: Sum Pooling/Max Pooling, Spatial Pyramid Matching [Lazebnik, CVPR06] Geometric Phrase Pooling [Xie, ACMMM12] Compact Feature Codes Hard/Soft/Sparse Coding methods: Vector Quantization ScSPM encoding [Yang, CVPR09] LLC encoding [Wang, CVPR10] Visual Vocabulary Clustering methods: K-Means Hierarchical K-Means [Nister, CVPR06] Approximate K-Means [Philbin, CVPR07] Image Descriptors Gradient-based local descriptors: SIFT [Lowe, IJCV04] HOG[Dalal, CVPR05] Raw Image 9/19/2018 ICCV 2013 - Presentation
Spatial Pyramid Matching (SPM) = Part 1 [Lazebnik, CVPR06] = Part 2 = Part 3 = Part 4 = Part 5 9/19/2018 ICCV 2013 - Presentation
Hierarchical Part Matching (HPM) = Part 1 [Xie, ICCV13] = Part 2 = Part 3 = Part 4 = Part 5 9/19/2018 ICCV 2013 - Presentation
Outline Introduction The Bag-of-Feature Model Hierarchical Part Matching Experimental Results Conclusions 9/19/2018 ICCV 2013 - Presentation
Model Overview Key Modules Foreground Inference Part Segmentation Hierarchical Structure Learning Geometric Phrase Pooling 9/19/2018 ICCV 2013 - Presentation
Foreground Inference bounding box crown forehead left eye beak nape right eye throat right wing right leg breast back belly left wing tail left leg 9/19/2018 ICCV 2013 - Presentation
Grab-Cut Algorithm Foreground Inference definite background possible 9/19/2018 ICCV 2013 - Presentation
Ultremetric Contour Map Part Segmentation 9/19/2018 ICCV 2013 - Presentation
Part Segmentation edge response matrix 9/19/2018 ICCV 2013 - Presentation
Part Segmentation 0.50 0.00 0.85 0.15 9/19/2018 ICCV 2013 - Presentation
Part Segmentation step penalty = 0.01 0.50 0.00 0.85 0.85 0.50 0.00 0.86 0.86 0.51 0.01 0.86 0.50 0.00 0.85 0.85 0.51 0.51 0.01 0.01 0.86 0.01 0.86 0.86 0.01 0.01 0.86 0.50 0.00 0.00 0.85 0.51 0.01 0.01 0.51 0.01 0.01 0.01 0.01 0.01 0.86 0.16 0.00 0.00 0.00 0.15 0.01 0.01 0.16 0.01 0.01 0.01 0.00 0.00 0.15 0.15 0.01 0.01 0.01 0.01 0.01 0.16 0.16 0.16 0.01 0.16 0.16 step penalty = 0.01 0.01 0.01 0.16 9/19/2018 ICCV 2013 - Presentation
Part Segmentation 9/19/2018 ICCV 2013 - Presentation
Part Segmentation back breast left wing belly 9/19/2018 ICCV 2013 - Presentation
Shortest-Path Algorithm Part Segmentation 9/19/2018 ICCV 2013 - Presentation
Part Segmentation forehead left eye beak nape throat back breast left wing belly tail left leg 9/19/2018 ICCV 2013 - Presentation
Hierarchical Structure Learning Discovering Mid-Level Parts Part Distance 9/19/2018 ICCV 2013 - Presentation
Hierarchical Structure Learning Discovering Mid-Level Parts Cost Function when Merging Parts 9/19/2018 ICCV 2013 - Presentation
Hierarchical Structure Learning Discovering Mid-Level Parts Hierarchical Structure Learning (HSL) Algorithm 9/19/2018 ICCV 2013 - Presentation
Hierarchical Structure Learning beak + crown + forehead + eyes = head nape + throat = neck back + belly + breast + tail = neck 9/19/2018 ICCV 2013 - Presentation
Hierarchical Structure Learning best choice μ: controlling the complexity of the model! 9/19/2018 ICCV 2013 - Presentation
Geometric Phrase Pooling tail visual words side words visual phrase central word 9/19/2018 ICCV 2013 - Presentation
ACM Multimedia 2012 - Oral Presentation 1 2 3 4 5 6 7 8 9 A B C D E F central word side words 1 2 3 4 5 6 7 8 9 A B C D E F 1 2 3 4 5 6 7 8 9 A B C D E F 1 2 3 4 5 6 7 8 9 A B C D E F Phrase Vector for 1st Word Pair 9/19/2018 ACM Multimedia 2012 - Oral Presentation
ACM Multimedia 2012 - Oral Presentation 1 2 3 4 5 6 7 8 9 A B C D E F central word side words 1 2 3 4 5 6 7 8 9 A B C D E F 1 2 3 4 5 6 7 8 9 A B C D E F 1 2 3 4 5 6 7 8 9 A B C D E F Phrase Vector for 2nd Word Pair 9/19/2018 ACM Multimedia 2012 - Oral Presentation
ACM Multimedia 2012 - Oral Presentation 1 2 3 4 5 6 7 8 9 A B C D E F …… 1 2 3 4 5 6 7 8 9 A B C D E F 1 2 3 4 5 6 7 8 9 A B C D E F 1 2 3 4 5 6 7 8 9 A B C D E F 1 2 3 4 5 6 7 8 9 A B C D E F Phrase Vector for the Visual Phrase 9/19/2018 ACM Multimedia 2012 - Oral Presentation
Model Summarization Foreground Inference and Part Segmentation Accurate Segmentation, Better Representation Hierarchical Structure Learning Discovering High-level Semantic Parts Geometric Phrase Pooling Capturing Geometric Information 9/19/2018 ICCV 2013 - Presentation
Outline Introduction The Bag-of-Feature Model Hierarchical Part Matching Experimental Results Conclusions 9/19/2018 ICCV 2013 - Presentation
Dataset and Annotations Caltech-UCSD Bird-200-2011 Dataset 200 Bird Categories 11788 Images (at least 55 per Category) Accuracy by Category (5, 10, 20, 30 trainings) Manual Annotation by Web Users At Most 15 Landmarks per Image Beak, Crown, Forehead, Nape, Throat, Left/Right Eyes; Belly, Breast, Back, Tail, Left/Right Wings/Legs. 9/19/2018 ICCV 2013 - Presentation
Part Segmentation #training 5 10 20 30 Baseline 13.64 20.25 28.36 33.63 +FG Inf. 19.25 27.66 37.08 43.06 +Part Seg. 28.55 40.46 52.52 58.09 9/19/2018 ICCV 2013 - Presentation
Hierarchical Structure Learning #training 5 10 20 30 No Struct. 28.55 40.46 52.52 58.09 Struct. #1 29.29 41.62 53.36 59.24 Struct. #2 29.75 42.03 53.55 59.32 Struct. #3 30.33 42.66 53.94 59.86 Struct. #4 27.38 38.64 50.22 56.11 9/19/2018 ICCV 2013 - Presentation
Hierarchical Structure Learning best choice μ: controlling the complexity of the model! 9/19/2018 ICCV 2013 - Presentation
Geometric Phrase Pooling #training 5 10 20 30 No Phrase 30.33 42.66 53.94 59.86 GPP (5,5) 31.69 43.80 55.26 60.80 GPP (5,10) 32.23 45.10 56.11 61.93 GPP (5,20) 34.13 47.29 58.60 64.01 GPP (5,40) 36.09 48.87 60.56 65.62 9/19/2018 ICCV 2013 - Presentation
Comparison #training 5 10 20 30 Wah et.al, TechRep11 10.05 Zhang et.al, CVPR12 24.21 Wang et.al, CVPR10 13.64 20.25 28.36 33.63 Xie et.al, ACMMM12 15.34 22.91 31.01 36.17 Ours 36.09 48.87 60.56 65.62 9/19/2018 ICCV 2013 - Presentation
Summarization All the Components Help! The HUGE Improvement Mainly Comes from Part Segmentation Comparison Directly with Previous Methods without Using Landmarks is NOT Fair 9/19/2018 ICCV 2013 - Presentation
Updates after Publication Baseline Performance might be Much Better Using Fisher Kernels [Perronnin, ECCV10] Using Deep Features [Donahue, ICML14] Automatically Detected Parts Works Well Landmark Detection [Berg, CVPR13] Symbiotic Localization [Chai, ICCV13] Geometric Segmentation [Gavves, ICCV13] State-of-the-Art ~80%/~70% w/o using Landmarks (30 trainings). 9/19/2018 ICCV 2013 - Presentation
Outline Introduction The Bag-of-Feature Model Hierarchical Part Matching Experimental Results Conclusions 9/19/2018 ICCV 2013 - Presentation
Main Contributions An Important Clue to Fine-Grained Problem Part Information is Crucial Automatic Annotation is Required A Complete Flowchart for Part Representation Starting from Landmarks Foreground Inference and Segmentation Hierarchical Structure Learning Transplantable to Other Datasets 9/19/2018 ICCV 2013 - Presentation
Conclusions and Future Work Fine-Grained Classification is More Difficult Surprising Inter-class Similarity Discovering Parts is Very Important! BUT, Annotation is still Expensive and Unrealistic Alternative Methods? Template Matching [Yang, NIPS12] Co. Segmentation and Localization [Chai, ICCV13] Geometric Shape Alignment [Gaaves, ICCV13] 9/19/2018 ICCV 2013 - Presentation
Thank you! Questions please? 9/19/2018 ICCV 2013 - Presentation