Zhuode Liu 2016/2/13 University of Texas at Austin CS 381V: Visual Recognition Discovering the Spatial Extent of Relative Attributes Xiao and Lee, ICCV.

Slides:



Advertisements
Similar presentations
Attributes for Classifier Feedback Amar Parkash and Devi Parikh.
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
CVPR2013 Poster Modeling Actions through State Changes.
Context-based object-class recognition and retrieval by generalized correlograms by J. Amores, N. Sebe and P. Radeva Discussion led by Qi An Duke University.
Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.
Olivier Duchenne , Armand Joulin , Jean Ponce Willow Lab , ICCV2011.
Florian Schroff, Antonio Criminisi & Andrew Zisserman ICCV 2007 Harvesting Image Databases from the Web.
3 Small Comments Alex Berg Stony Brook University I work on recognition: features – action recognition – alignment – detection – attributes – hierarchical.
Machine learning continued Image source:
Learning Globally-Consistent Local Distance Functions for Shape-Based Image Retrieval and Classification Computer Vision, ICCV IEEE 11th International.
- Recovering Human Body Configurations: Combining Segmentation and Recognition (CVPR’04) Greg Mori, Xiaofeng Ren, Alexei A. Efros and Jitendra Malik -
Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 1 Computer Vision: Histograms of Oriented.
CS4670 / 5670: Computer Vision Bag-of-words models Noah Snavely Object
Object-centric spatial pooling for image classification Olga Russakovsky, Yuanqing Lin, Kai Yu, Li Fei-Fei ECCV 2012.
Knowing a Good HOG Filter When You See It: Efficient Selection of Filters for Detection Ejaz Ahmed 1, Gregory Shakhnarovich 2, and Subhransu Maji 3 1 University.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Tracking Objects with Dynamics Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 04/21/15 some slides from Amin Sadeghi, Lana Lazebnik,
Lecture 28: Bag-of-words models
Object Recognition with Informative Features and Linear Classification Authors: Vidal-Naquet & Ullman Presenter: David Bradley.
Ensemble Learning: An Introduction
5/30/2006EE 148, Spring Visual Categorization with Bags of Keypoints Gabriella Csurka Christopher R. Dance Lixin Fan Jutta Willamowski Cedric Bray.
Style-aware Mid-level Representation for Discovering Visual Connections in Space and Time Yong Jae Lee, Alexei A. Efros, and Martial Hebert Carnegie Mellon.
Robust fitting Prof. Noah Snavely CS1114
Kullback-Leibler Boosting Ce Liu, Hueng-Yeung Shum Microsoft Research Asia CVPR 2003 Presented by Derek Hoiem.
Face Alignment Using Cascaded Boosted Regression Active Shape Models
Watch, Listen and Learn Sonal Gupta, Joohyun Kim, Kristen Grauman and Raymond Mooney -Pratiksha Shah.
Jifeng Dai 2011/09/27.  Introduction  Structural SVM  Kernel Design  Segmentation and parameter learning  Object Feature Descriptors  Experimental.
“Secret” of Object Detection Zheng Wu (Summer intern in MSRNE) Sep. 3, 2010 Joint work with Ce Liu (MSRNE) William T. Freeman (MIT) Adam Kalai (MSRNE)
Building local part models for category-level recognition C. Schmid, INRIA Grenoble Joint work with G. Dorko, S. Lazebnik, J. Ponce.
Window-based models for generic object detection Mei-Chen Yeh 04/24/2012.
Svetlana Lazebnik, Cordelia Schmid, Jean Ponce
Lecture 29: Face Detection Revisited CS4670 / 5670: Computer Vision Noah Snavely.
Describing Images using Inferred Visual Dependency Representations Authors : Desmond Elliot & Arjen P. de Vries Presentation of Paper by : Jantre Sanket.
Visual Object Recognition
Unsupervised Constraint Driven Learning for Transliteration Discovery M. Chang, D. Goldwasser, D. Roth, and Y. Tu.
Reading Between The Lines: Object Localization Using Implicit Cues from Image Tags Sung Ju Hwang and Kristen Grauman University of Texas at Austin Jingnan.
Beauty is Here! Evaluating Aesthetics in Videos Using Multimodal Features and Free Training Data Yanran Wang, Qi Dai, Rui Feng, Yu-Gang Jiang School of.
Efficient Region Search for Object Detection Sudheendra Vijayanarasimhan and Kristen Grauman Department of Computer Science, University of Texas at Austin.
Representations for object class recognition David Lowe Department of Computer Science University of British Columbia Vancouver, Canada Sept. 21, 2006.
Efficient Subwindow Search: A Branch and Bound Framework for Object Localization ‘PAMI09 Beyond Sliding Windows: Object Localization by Efficient Subwindow.
A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.
CS654: Digital Image Analysis Lecture 25: Hough Transform Slide credits: Guillermo Sapiro, Mubarak Shah, Derek Hoiem.
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall 2004 AdaBoost.. Binary Classification. Read 9.5 Duda,
Describing Images using Inferred Visual Dependency Representations Authors : Desmond Elliot & Arjen P. de Vries Presentation of Paper by : Jantre Sanket.
Project 3 Results.
Kylie Gorman WEEK 1-2 REVIEW. CONVERTING AN IMAGE FROM RGB TO HSV AND DISPLAY CHANNELS.
Data Mining, ICDM '08. Eighth IEEE International Conference on Duy-Dinh Le National Institute of Informatics Hitotsubashi, Chiyoda-ku Tokyo,
WhittleSearch: Image Search with Relative Attribute Feedback CVPR 2012 Adriana Kovashka Devi Parikh Kristen Grauman University of Texas at Austin Toyota.
Learning to Detect Faces A Large-Scale Application of Machine Learning (This material is not in the text: for further information see the paper by P.
Recognition Using Visual Phrases
Understanding and Predicting Interestingness of Videos Yu-Gang Jiang, Yanran Wang, Rui Feng, Hanfang Yang, Yingbin Zheng, Xiangyang Xue School of Computer.
COT6930 Course Project. Outline Gene Selection Sequence Alignment.
Object Recognition as Ranking Holistic Figure-Ground Hypotheses Fuxin Li and Joao Carreira and Cristian Sminchisescu 1.
Describing People: A Poselet-Based Approach to Attribute Classification.
Notes on HW 1 grading I gave full credit as long as you gave a description, confusion matrix, and working code Many people’s descriptions were quite short.
Semantic Alignment Spring 2009 Ben-Gurion University of the Negev.
Strong Supervision From Weak Annotation Interactive Training of Deformable Part Models ICCV /05/23.
PANDA: Pose Aligned Networks for Deep Attribute Modeling Ning Zhang 1,2 Manohar Paluri 1 Marć Aurelio Ranzato 1 Trevor Darrell 2 Lumbomir Boudev 1 1 Facebook.
Grid-Based Genetic Algorithm Approach to Colour Image Segmentation Marco Gallotta Keri Woods Supervised by Audrey Mbogho.
City Forensics: Using Visual Elements to Predict Non-Visual City Attributes Sean M. Arietta, Alexei A. Efros, Ravi Ramamoorthi, Maneesh Agrawala Presented.
Recent developments in object detection
Abdul Jabbar Siddiqui, Abdelhamid Mammeri, and Azzedine Boukerche
Interactive Offline Tracking for Color Objects
Krishna Kumar Singh, Yong Jae Lee University of California, Davis
Learning Mid-Level Features For Recognition
R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.
Thesis Advisor : Prof C.V. Jawahar
Attributes and Simile Classifiers for Face Verification
CS 1674: Intro to Computer Vision Scene Recognition
Convolutional Neural Networks for Visual Tracking
Presentation transcript:

Zhuode Liu 2016/2/13 University of Texas at Austin CS 381V: Visual Recognition Discovering the Spatial Extent of Relative Attributes Xiao and Lee, ICCV 2015

Outline Problem statement Approach Evaluation Discussion Points

Problem statement Discover which part of the image is related to the attribute. (called “spatial extent” in the paper) Rank images w.r.t this attribute Smile [Xiao and Lee 2015]

Problem statement Training: many pairs of images, and = Testing: given a new pair, rank the images in it. Attribute: Smile [Xiao and Lee 2015]

Why need spatial extent Because many attribute are local So focusing on local parts can give better result It’s also difficult, because the region is not given in training data. Mountainous Pointy

Approach [Xiao and Lee 2015] key idea: “visual chain” StrongWeak If I rank these images correctly… Observation: local smoothness of adjacent images Therefore: gradually discover useful information (spatial extent of attribute)

Initializing chains Train a ranker with global features [Parikh & Grauman, 2011] Select (say) 5 top ranked images StrongWeak... Slide Credit: Xiao and Lee

Initializing chains Search for locally similar-looking patches Slide Credit: Xiao and Lee

Search for locally similar-looking patches Solve for Slide Credit: Xiao and Lee Initializing chains

Search for locally similar-looking patches Solve for Solve with dynamic programming Slide Credit: Xiao and Lee Initializing chains

Compute multiple initial chains Slide Credit: Xiao and Lee Initializing chains

Iterative growing visual chains Train a detector for the chain Learn detector Slide Credit: Xiao and Lee

Select only a subset, not all. Because the model is still bad. (svm detector & svm ranker) Predicted Attribute Strength StrongWeak Slide Credit: Xiao and Lee iter 1 Iterative growing visual chains

Add the selected to a new training set Initial image set Slide Credit: Xiao and Lee

Solve for Initial image set Slide Credit: Xiao and Lee Add the selected to a new training set Search for patches again.

Search for patches Solve for Solve with dynamic programming Slide Credit: Xiao and Lee

Update the detector Learn detector Initial chain Slide Credit: Xiao and Lee

Iterative growing of visual chains Select image subset based on ranking Predicted Attribute Strength StrongWeak iter 1 iter 2 iter 3 Slide Credit: Xiao and Lee

Creating a chain ensemble Slide Credit: Xiao and Lee

Train a SVM ranker for each chain Validation set Attribute: Smile Score: 3/4 Then rank the validation set Slide Credit: Xiao and Lee

Creating a chain ensemble ScoresHigh Low Slide Credit: Xiao and Lee

Creating a chain ensemble Learn final image-level SVM ranker [Parikh & Grauman 2011] : Dense SIFT or Pool5 activation of AlexNet Slide Credit: Xiao and Lee

Evaluation

Dataset LFW10 Smile Visible teeth Strong Weak Bald head Dark hair Slide Credit: Xiao and Lee It finds the mouth part is related to #1,2 attribute, and the head part to #3,4 attribute.

Dataset UTZAP50K Pointy Sporty Comfort Strong Weak Open Slide Credit: Xiao and Lee For pointy shoes, it discovered not only the toe, but also the heel, because pointy shoes are often high-heeled

Results Dataset: LFW-10 Dataset: UTZAP50K Slide Credit: Xiao and Lee not much gain Global: 73.7% -> This: 83.5% Global:74.6% -> This: 84.6%

Discussions 1. Drawback: rely on good initialization –Every chain is grown using the initial top (say) 5 images as seed –Whether the algorithm used to initialize the first 5 images gives a good ranking is very important --- if the “local smoothness” does not hold for these 5 images, then Dynamic Programming cannot find good patches. 2. Is there a reason the author only tested on humans and shoes? 3. Given that the approach samples many features densely from many candidate patches, how well does the algorithm scale to large datasets where the key features are much harder to localize than ideal face and shoe views?