ICCV Hierarchical Part Matching for Fine-Grained Image Classification

Slides:

Advertisements

Similar presentations

Context-based object-class recognition and retrieval by generalized correlograms by J. Amores, N. Sebe and P. Radeva Discussion led by Qi An Duke University.

Advertisements

Three things everyone should know to improve object retrieval

Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.

Human Action Recognition across Datasets by Foreground-weighted Histogram Decomposition Waqas Sultani, Imran Saleemi CVPR 2014.

Carolina Galleguillos, Brian McFee, Serge Belongie, Gert Lanckriet Computer Science and Engineering Department Electrical and Computer Engineering Department.

Multi-layer Orthogonal Codebook for Image Classification Presented by Xia Li.

CS395: Visual Recognition Spatial Pyramid Matching Heath Vinicombe The University of Texas at Austin 21 st September 2012.

Query Specific Fusion for Image Retrieval

Patch to the Future: Unsupervised Visual Prediction

Mixture of trees model: Face Detection, Pose Estimation and Landmark Localization Presenter: Zhang Li.

Intelligent Systems Lab. Recognizing Human actions from Still Images with Latent Poses Authors: Weilong Yang, Yang Wang, and Greg Mori Simon Fraser University,

Ziming Zhang *, Ze-Nian Li, Mark Drew School of Computing Science, Simon Fraser University, Vancouver, B.C., Canada {zza27, li, Learning.

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

Global spatial layout: spatial pyramid matching Spatial weighting the features Beyond bags of features: Adding spatial information.

Object-centric spatial pooling for image classification Olga Russakovsky, Yuanqing Lin, Kai Yu, Li Fei-Fei ECCV 2012.

Large-Scale Object Recognition with Weak Supervision

Ghunhui Gu, Joseph J. Lim, Pablo Arbeláez, Jitendra Malik University of California at Berkeley Berkeley, CA

Bag of Features Approach: recent work, using geometric information.

Recognition using Regions CVPR Outline Introduction Overview of the Approach Experimental Results Conclusion.

Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.

Spatial Pyramid Pooling in Deep Convolutional

Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.

A String Matching Approach for Visual Retrieval and Classification Mei-Chen Yeh* and Kwang-Ting Cheng Learning-Based Multimedia Lab Department of Electrical.

Global and Efficient Self-Similarity for Object Classification and Detection CVPR 2010 Thomas Deselaers and Vittorio Ferrari.

Review: Intro to recognition Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear.

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

Unsupervised Learning of Categories from Sets of Partially Matching Image Features Kristen Grauman and Trevor Darrel CVPR 2006 Presented By Sovan Biswas.

Action recognition with improved trajectories

Problem Statement A pair of images or videos in which one is close to the exact duplicate of the other, but different in conditions related to capture,

Object Bank Presenter ： Liu Changyu Advisor ： Prof. Alex Hauptmann Interest ： Multimedia Analysis April 4 th, 2013.

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

Svetlana Lazebnik, Cordelia Schmid, Jean Ponce

Yao, B., and Fei-fei, L. IEEE Transactions on PAMI(2012)

Classifying Images with Visual/Textual Cues By Steven Kappes and Yan Cao.

Semantic Embedding Space for Zero Shot Action Recognition Xun XuTimothy HospedalesShaogang GongAuthors: Computer Vision Group Queen Mary University of.

Automatic Detection of Social Tag Spams Using a Text Mining Approach Hsin-Chang Yang Associate Professor Department of Information Management National.

Locality-constrained Linear Coding for Image Classification

Dynamic Captioning: Video Accessibility Enhancement for Hearing Impairment Richang Hong, Meng Wang, Mengdi Xuy Shuicheng Yany and Tat-Seng Chua School.

I. Problem  Improve large-scale retrieval / classification accuracy  Incorporate spatial relationship between the features in the image  Oxford 5K Dataset.

Hierarchical Matching with Side Information for Image Classification

First-Person Activity Recognition: What Are They Doing to Me? M. S. Ryoo and Larry Matthies Jet Propulsion Laboratory, California Institute of Technology,

Dense Color Moment: A New Discriminative Color Descriptor Kylie Gorman, Mentor: Yang Zhang University of Central Florida I.Problem:  Create Robust Discriminative.

Object Recognition as Ranking Holistic Figure-Ground Hypotheses Fuxin Li and Joao Carreira and Cristian Sminchisescu 1.

Goggle Gist on the Google Phone A Content-based image retrieval system for the Google phone Manu Viswanathan Chin-Kai Chang Ji Hyun Moon.

From Dictionary of Visual Words to Subspaces: Locality-constrained Affine Subspace Coding (LASC) Peihua Li, Xiaoxiao Lu, Qilong Wang Presented by Peihua.

Hierarchical Motion Evolution for Action Recognition Authors: Hongsong Wang, Wei Wang, Liang Wang Center for Research on Intelligent Perception and Computing,

Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.

NICTA SML Seminar, May 26, 2011 Modeling spatial layout for image classification Jakob Verbeek 1 Joint work with Josip Krapac 1 & Frédéric Jurie 2 1: LEAR.

Recent developments in object detection

Compact Bilinear Pooling

Learning Mid-Level Features For Recognition

Nonparametric Semantic Segmentation

Huazhong University of Science and Technology

Paper Presentation: Shape and Matching

Mixtures of Gaussians and Advanced Feature Encoding

Digit Recognition using SVMS

By Suren Manvelyan, Crocodile (nile crocodile?) By Suren Manvelyan,

ICMR Image Classification and Retrieval are ONE (Online NN Estimation)

Speaker: Lingxi Xie Authors: Lingxi Xie, Qi Tian, Bo Zhang

IEEE ICIP Feature Normalization for Part-Based Image Classification

CVPR 2014 Orientational Pyramid Matching for Recognizing Indoor Scenes

Brief Review of Recognition + Context

Fine-Grained Visual Categorization

MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks Jing Zhang+, Bo Chen+, Xianming Wang+, Fengmei Jin+, Hong Chen+, Cuiping.

CornerNet: Detecting Objects as Paired Keypoints

KFC: Keypoints, Features and Correspondences

Outline Background Motivation Proposed Model Experimental Results

Liyuan Li, Jerry Kah Eng Hoe, Xinguo Yu, Li Dong, and Xinqi Chu

A Graph-Matching Kernel for Object Categorization

SFNet: Learning Object-aware Semantic Correspondence

Presentation transcript:

ICCV 2013 Hierarchical Part Matching for Fine-Grained Image Classification Speaker: Lingxi Xie Authors: Lingxi Xie, Qi Tian, Richang Hong, Shuicheng Yan, Bo Zhang State Key Laboratory of Intelligent Technology and Systems Department of Computer Science and Technology Tsinghua University http://www.tsinghua.edu.cn

Outline Introduction The Bag-of-Feature Model Hierarchical Part Matching Experimental Results Conclusions 9/19/2018 ICCV 2013 - Presentation

Outline Introduction The Bag-of-Feature Model Hierarchical Part Matching Experimental Results Conclusions 9/19/2018 ICCV 2013 - Presentation

Image Classification A basic task towards image understanding General vs. Fine-Grained 9/19/2018 ICCV 2013 - Presentation

Outline Introduction The Bag-of-Feature Model Hierarchical Part Matching Experimental Results Conclusions 9/19/2018 ICCV 2013 - Presentation

Image-level Vector Compact Feature Codes Visual Vocabulary Spatial Pooling: Sum Pooling/Max Pooling, Spatial Pyramid Matching [Lazebnik, CVPR06] Geometric Phrase Pooling [Xie, ACMMM12] Compact Feature Codes Hard/Soft/Sparse Coding methods: Vector Quantization ScSPM encoding [Yang, CVPR09] LLC encoding [Wang, CVPR10] Visual Vocabulary Clustering methods: K-Means Hierarchical K-Means [Nister, CVPR06] Approximate K-Means [Philbin, CVPR07] Image Descriptors Gradient-based local descriptors: SIFT [Lowe, IJCV04] HOG[Dalal, CVPR05] Raw Image 9/19/2018 ICCV 2013 - Presentation

Image-level Vector Compact Feature Codes Visual Vocabulary Spatial Pooling: Sum Pooling/Max Pooling, Spatial Pyramid Matching [Lazebnik, CVPR06] Geometric Phrase Pooling [Xie, ACMMM12] Compact Feature Codes Hard/Soft/Sparse Coding methods: Vector Quantization ScSPM encoding [Yang, CVPR09] LLC encoding [Wang, CVPR10] Visual Vocabulary Clustering methods: K-Means Hierarchical K-Means [Nister, CVPR06] Approximate K-Means [Philbin, CVPR07] Image Descriptors Gradient-based local descriptors: SIFT [Lowe, IJCV04] HOG[Dalal, CVPR05] Raw Image 9/19/2018 ICCV 2013 - Presentation

Spatial Pyramid Matching (SPM) = Part 1 [Lazebnik, CVPR06] = Part 2 = Part 3 = Part 4 = Part 5 9/19/2018 ICCV 2013 - Presentation

Hierarchical Part Matching (HPM) = Part 1 [Xie, ICCV13] = Part 2 = Part 3 = Part 4 = Part 5 9/19/2018 ICCV 2013 - Presentation

Outline Introduction The Bag-of-Feature Model Hierarchical Part Matching Experimental Results Conclusions 9/19/2018 ICCV 2013 - Presentation

Model Overview Key Modules Foreground Inference Part Segmentation Hierarchical Structure Learning Geometric Phrase Pooling 9/19/2018 ICCV 2013 - Presentation

Foreground Inference bounding box crown forehead left eye beak nape right eye throat right wing right leg breast back belly left wing tail left leg 9/19/2018 ICCV 2013 - Presentation

Grab-Cut Algorithm Foreground Inference definite background possible 9/19/2018 ICCV 2013 - Presentation

Ultremetric Contour Map Part Segmentation 9/19/2018 ICCV 2013 - Presentation

Part Segmentation edge response matrix 9/19/2018 ICCV 2013 - Presentation

Part Segmentation 0.50 0.00 0.85 0.15 9/19/2018 ICCV 2013 - Presentation

Part Segmentation step penalty = 0.01 0.50 0.00 0.85 0.85 0.50 0.00 0.86 0.86 0.51 0.01 0.86 0.50 0.00 0.85 0.85 0.51 0.51 0.01 0.01 0.86 0.01 0.86 0.86 0.01 0.01 0.86 0.50 0.00 0.00 0.85 0.51 0.01 0.01 0.51 0.01 0.01 0.01 0.01 0.01 0.86 0.16 0.00 0.00 0.00 0.15 0.01 0.01 0.16 0.01 0.01 0.01 0.00 0.00 0.15 0.15 0.01 0.01 0.01 0.01 0.01 0.16 0.16 0.16 0.01 0.16 0.16 step penalty = 0.01 0.01 0.01 0.16 9/19/2018 ICCV 2013 - Presentation

Part Segmentation 9/19/2018 ICCV 2013 - Presentation

Part Segmentation back breast left wing belly 9/19/2018 ICCV 2013 - Presentation

Shortest-Path Algorithm Part Segmentation 9/19/2018 ICCV 2013 - Presentation

Part Segmentation forehead left eye beak nape throat back breast left wing belly tail left leg 9/19/2018 ICCV 2013 - Presentation

Hierarchical Structure Learning Discovering Mid-Level Parts Part Distance 9/19/2018 ICCV 2013 - Presentation

Hierarchical Structure Learning Discovering Mid-Level Parts Cost Function when Merging Parts 9/19/2018 ICCV 2013 - Presentation

Hierarchical Structure Learning Discovering Mid-Level Parts Hierarchical Structure Learning (HSL) Algorithm 9/19/2018 ICCV 2013 - Presentation

Hierarchical Structure Learning beak + crown + forehead + eyes = head nape + throat = neck back + belly + breast + tail = neck 9/19/2018 ICCV 2013 - Presentation

Hierarchical Structure Learning best choice μ: controlling the complexity of the model! 9/19/2018 ICCV 2013 - Presentation

Geometric Phrase Pooling tail visual words side words visual phrase central word 9/19/2018 ICCV 2013 - Presentation

ACM Multimedia 2012 - Oral Presentation 1 2 3 4 5 6 7 8 9 A B C D E F central word side words 1 2 3 4 5 6 7 8 9 A B C D E F 1 2 3 4 5 6 7 8 9 A B C D E F 1 2 3 4 5 6 7 8 9 A B C D E F Phrase Vector for 1st Word Pair 9/19/2018 ACM Multimedia 2012 - Oral Presentation

ACM Multimedia 2012 - Oral Presentation 1 2 3 4 5 6 7 8 9 A B C D E F central word side words 1 2 3 4 5 6 7 8 9 A B C D E F 1 2 3 4 5 6 7 8 9 A B C D E F 1 2 3 4 5 6 7 8 9 A B C D E F Phrase Vector for 2nd Word Pair 9/19/2018 ACM Multimedia 2012 - Oral Presentation

ACM Multimedia 2012 - Oral Presentation 1 2 3 4 5 6 7 8 9 A B C D E F …… 1 2 3 4 5 6 7 8 9 A B C D E F 1 2 3 4 5 6 7 8 9 A B C D E F 1 2 3 4 5 6 7 8 9 A B C D E F 1 2 3 4 5 6 7 8 9 A B C D E F Phrase Vector for the Visual Phrase 9/19/2018 ACM Multimedia 2012 - Oral Presentation

Model Summarization Foreground Inference and Part Segmentation Accurate Segmentation, Better Representation Hierarchical Structure Learning Discovering High-level Semantic Parts Geometric Phrase Pooling Capturing Geometric Information 9/19/2018 ICCV 2013 - Presentation

Outline Introduction The Bag-of-Feature Model Hierarchical Part Matching Experimental Results Conclusions 9/19/2018 ICCV 2013 - Presentation

Dataset and Annotations Caltech-UCSD Bird-200-2011 Dataset 200 Bird Categories 11788 Images (at least 55 per Category) Accuracy by Category (5, 10, 20, 30 trainings) Manual Annotation by Web Users At Most 15 Landmarks per Image Beak, Crown, Forehead, Nape, Throat, Left/Right Eyes; Belly, Breast, Back, Tail, Left/Right Wings/Legs. 9/19/2018 ICCV 2013 - Presentation

Part Segmentation #training 5 10 20 30 Baseline 13.64 20.25 28.36 33.63 +FG Inf. 19.25 27.66 37.08 43.06 +Part Seg. 28.55 40.46 52.52 58.09 9/19/2018 ICCV 2013 - Presentation

Hierarchical Structure Learning #training 5 10 20 30 No Struct. 28.55 40.46 52.52 58.09 Struct. #1 29.29 41.62 53.36 59.24 Struct. #2 29.75 42.03 53.55 59.32 Struct. #3 30.33 42.66 53.94 59.86 Struct. #4 27.38 38.64 50.22 56.11 9/19/2018 ICCV 2013 - Presentation

Hierarchical Structure Learning best choice μ: controlling the complexity of the model! 9/19/2018 ICCV 2013 - Presentation

Geometric Phrase Pooling #training 5 10 20 30 No Phrase 30.33 42.66 53.94 59.86 GPP (5,5) 31.69 43.80 55.26 60.80 GPP (5,10) 32.23 45.10 56.11 61.93 GPP (5,20) 34.13 47.29 58.60 64.01 GPP (5,40) 36.09 48.87 60.56 65.62 9/19/2018 ICCV 2013 - Presentation

Comparison #training 5 10 20 30 Wah et.al, TechRep11 10.05 Zhang et.al, CVPR12 24.21 Wang et.al, CVPR10 13.64 20.25 28.36 33.63 Xie et.al, ACMMM12 15.34 22.91 31.01 36.17 Ours 36.09 48.87 60.56 65.62 9/19/2018 ICCV 2013 - Presentation

Summarization All the Components Help! The HUGE Improvement Mainly Comes from Part Segmentation Comparison Directly with Previous Methods without Using Landmarks is NOT Fair 9/19/2018 ICCV 2013 - Presentation

Updates after Publication Baseline Performance might be Much Better Using Fisher Kernels [Perronnin, ECCV10] Using Deep Features [Donahue, ICML14] Automatically Detected Parts Works Well Landmark Detection [Berg, CVPR13] Symbiotic Localization [Chai, ICCV13] Geometric Segmentation [Gavves, ICCV13] State-of-the-Art ~80%/~70% w/o using Landmarks (30 trainings). 9/19/2018 ICCV 2013 - Presentation

Outline Introduction The Bag-of-Feature Model Hierarchical Part Matching Experimental Results Conclusions 9/19/2018 ICCV 2013 - Presentation

Main Contributions An Important Clue to Fine-Grained Problem Part Information is Crucial Automatic Annotation is Required A Complete Flowchart for Part Representation Starting from Landmarks Foreground Inference and Segmentation Hierarchical Structure Learning Transplantable to Other Datasets 9/19/2018 ICCV 2013 - Presentation

Conclusions and Future Work Fine-Grained Classification is More Difficult Surprising Inter-class Similarity Discovering Parts is Very Important! BUT, Annotation is still Expensive and Unrealistic Alternative Methods? Template Matching [Yang, NIPS12] Co. Segmentation and Localization [Chai, ICCV13] Geometric Shape Alignment [Gaaves, ICCV13] 9/19/2018 ICCV 2013 - Presentation

Thank you! Questions please? 9/19/2018 ICCV 2013 - Presentation