Recognition Using Visual Phrases

Slides:

Advertisements

Similar presentations

Applications of one-class classification

Advertisements

Semantic Contours from Inverse Detectors Bharath Hariharan et.al. (ICCV-11)

Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.

Limin Wang, Yu Qiao, and Xiaoou Tang

Human Action Recognition across Datasets by Foreground-weighted Histogram Decomposition Waqas Sultani, Imran Saleemi CVPR 2014.

Patch to the Future: Unsupervised Visual Prediction

Ivan Laptev IRISA/INRIA, Rennes, France September 07, 2006 Boosted Histograms for Improved Object Detection.

Lecture 31: Modern object recognition

Many slides based on P. FelzenszwalbP. Felzenszwalb General object detection with deformable part-based models.

Intelligent Systems Lab. Recognizing Human actions from Still Images with Latent Poses Authors: Weilong Yang, Yang Wang, and Greg Mori Simon Fraser University,

Robust Object Tracking via Sparsity-based Collaborative Model

Global spatial layout: spatial pyramid matching Spatial weighting the features Beyond bags of features: Adding spatial information.

Object-centric spatial pooling for image classification Olga Russakovsky, Yuanqing Lin, Kai Yu, Li Fei-Fei ECCV 2012.

Enhancing Exemplar SVMs using Part Level Transfer Regularization 1.

Large-Scale Object Recognition with Weak Supervision

Groups of Adjacent Contour Segments for Object Detection Vittorio Ferrari Loic Fevrier Frederic Jurie Cordelia Schmid.

Ghunhui Gu, Joseph J. Lim, Pablo Arbeláez, Jitendra Malik University of California at Berkeley Berkeley, CA

Contour Based Approaches for Visual Object Recognition Jamie Shotton University of Cambridge Joint work with Roberto Cipolla, Andrew Blake.

More sliding window detection: Discriminative part-based models Many slides based on P. FelzenszwalbP. Felzenszwalb.

Visual Phrases CSE 590V Supasorn Suwajanakorn Don’t eat me!

Quantifying and Transferring Contextual Information in Object Detection Professor: S. J. Wang Student : Y. S. Wang 1.

Good morning, everyone, thank you for coming to my presentation.

1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson

Object Recognition: Conceptual Issues Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and K. Grauman.

Object Recognition: Conceptual Issues Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and K. Grauman.

Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.

What, Where & How Many? Combining Object Detectors and CRFs

Online Learning Algorithms

Generic object detection with deformable part-based models

Salient Object Detection by Composition

Object Bank Presenter ： Liu Changyu Advisor ： Prof. Alex Hauptmann Interest ： Multimedia Analysis April 4 th, 2013.

A General Framework for Tracking Multiple People from a Moving Camera

Jifeng Dai 2011/09/27.  Introduction  Structural SVM  Kernel Design  Segmentation and parameter learning  Object Feature Descriptors  Experimental.

“Secret” of Object Detection Zheng Wu (Summer intern in MSRNE) Sep. 3, 2010 Joint work with Ce Liu (MSRNE) William T. Freeman (MIT) Adam Kalai (MSRNE)

Leo Zhu CSAIL MIT Joint work with Chen, Yuille, Freeman and Torralba 1.

Group Sparse Coding Samy Bengio, Fernando Pereira, Yoram Singer, Dennis Strelow Google Mountain View, CA (NIPS2009) Presented by Miao Liu July

Reading Between The Lines: Object Localization Using Implicit Cues from Image Tags Sung Ju Hwang and Kristen Grauman University of Texas at Austin Jingnan.

Efficient Region Search for Object Detection Sudheendra Vijayanarasimhan and Kristen Grauman Department of Computer Science, University of Texas at Austin.

Object Detection with Discriminatively Trained Part Based Models

Lecture 31: Modern recognition CS4670 / 5670: Computer Vision Noah Snavely.

Deformable Part Model Presenter ： Liu Changyu Advisor ： Prof. Alex Hauptmann Interest ： Multimedia Analysis April 11 st, 2013.

Beyond Sliding Windows: Object Localization by Efficient Subwindow Search The best paper prize at CVPR 2008.

Efficient Subwindow Search: A Branch and Bound Framework for Object Localization ‘PAMI09 Beyond Sliding Windows: Object Localization by Efficient Subwindow.

INTRODUCTION Heesoo Myeong and Kyoung Mu Lee Department of ECE, ASRI, Seoul National University, Seoul, Korea Tensor-based High-order.

Deformable Part Models (DPM) Felzenswalb, Girshick, McAllester & Ramanan (2010) Slides drawn from a tutorial By R. Girshick AP 12% 27% 36% 45% 49% 2005.

Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting.

Training and Evaluating of Object Bank Models Presenter ： Changyu Liu Advisor ： Prof. Alex Interest ： Multimedia Analysis May 16 th, 2013.

Object detection, deep learning, and R-CNNs

Layered Object Detection for Multi-Class Image Segmentation UC Irvine Yi Yang Sam Hallman Deva Ramanan Charless Fowlkes.

First-Person Activity Recognition: What Are They Doing to Me? M. S. Ryoo and Larry Matthies Jet Propulsion Laboratory, California Institute of Technology,

Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11.

Category Independent Region Proposals Ian Endres and Derek Hoiem University of Illinois at Urbana-Champaign.

Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011.

Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,

Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.

Recent developments in object detection

CS 4501: Introduction to Computer Vision Object Localization, Detection, Semantic Segmentation Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy.

Robust and Fast Collaborative Tracking with Two Stage Sparse Optimization Authors: Baiyang Liu, Lin Yang, Junzhou Huang, Peter Meer, Leiguang Gong and.

Object detection with deformable part-based models

Data Driven Attributes for Action Detection

Li Fei-Fei, UIUC Rob Fergus, MIT Antonio Torralba, MIT

HOGgles Visualizing Object Detection Features

Speaker: Lingxi Xie Authors: Lingxi Xie, Qi Tian, Bo Zhang

“The Truth About Cats And Dogs”

On-going research on Object Detection *Some modification after seminar

Outline Background Motivation Proposed Model Experimental Results

Human-object interaction

Presentation transcript:

Recognition Using Visual Phrases CVPR 2011 Best Student Paper

Outline Introduction Related Works Approach Results Discussion Phrasal Recognition Decoding Multiple Detections Results Discussion

Introduction

Introduction Visual Phrases Traditional approach Detect objects (person, dog, horse…) Relation between objects NMS(non-maximum suppression) PASCAL other Disadvantage

Introduction

Introduction Contributions Introducing visual phrases as categories for recognition Introducing a novel dataset for phrasal recognition The state of the art methods of modeling interactions A decoding algorithm Performance results in multi-class object recognition

Related Work Object Recognition Deformable templates [IEEE2001,CVPR1998] Part base model [CVPR2005,CVPR2003] Detectors Deformable based model [IEEE2010]

Related Work Object Interactions left, right, top, down Focus on relation [ECCV2008] Person with object [CVPR 2010] Objects [ECCV2010] Relation of objects [ICCV2010] left, right, top, down label weight, confidence

Related Work Scene understanding Represent scenes as with global features that take into account general information about images [Vision2001,CVPR2006] Cluster [ECCV2008]

Related Work Machine translation Statistical translation methods [Press2010] Translation model Language model A decoding algorithm Output: a query sentence Allow multiple to multiple translation

Phrasal Recognition Phrasal Recognition Dataset select 8 obj. class (Pascal VOC 2008) person, bike, car, dog, horse, bottle, sofa, chair A list of 17 visual phrases + background class Dog jumping ,horse jumping, person riding horse…

Phrasal Recognition

Phrasal Recognition Datasets The complexity of Visual Phrases crease 2769 images (822 negative image) 120 examples, average of each classes 5067 bounding boxes(1796 phrases,3271 objects) The complexity of Visual Phrases crease The number of training example decrease

Phrasal Recognition Appearance models Deformation part model 17 phrases in our dataset using provided bounding boxes 8 categories from Pascal are used as models for objects

Decoding Multiple Detections NMS decoding Perfect detectors with excellent tightly tuned models Natural decoding strategy better than NMS on interaction Greedily search the space of labels Well designed feature (nearby) All detector responses Final outcome Decoding

Decoding Multiple Detections Decoding process We compare our decoding algorithm with that of [2] on our phrase dataset Step1: construct the feature Step2: running algorithm to learn a set of weights that rescore the confidences of the bounding boxes based on interactions Step3: We again rescore until optimal

Discriminative models for multi-class object layout

Decoding Multiple Detections : a bounding box in an image An image is represented as a collection of overlapping Bounding boxes X = { : i=1….M},M is the total num of bounding box K is different categories 1 , 1 1 is the score of image X with Y is the set of weights that corresponds to the class of the bounding box

Decoding Multiple Detections Representation Image = bounding boxes Confidence Overlap Size ratio Relation Above, Below, overlapping Window, category, spatial bins Representation has K*3*3+1 dimensions

Decoding Multiple Detections Inference assume bounding boxes are independent given their features 1

Decoding Multiple Detections Learning A form of max margin structure learning 1

Decoding Multiple Detections 1 our inner maximization is exact and very fast. We solve this optimization problem by subgradient descent method as follows.

Result Single category detection deformable part models for 17 visual phrase the trained models from for objects Use PASCAL dataset : 50 positive and 150 negative examples Show Precision-Recall (PR) curves Trained these detectors with at most 50 positive examples

Result

Result

Result

Result

Result Decoding 0.319 0.313 0.308 0.495 0.493 0.491 Paper decoding *[2] NMS Overall AP 0.319 0.313 0.308 Mean per class AP 0.495 0.493 0.491 [2] C. F. C. Desai, D. Ramanan. Discriminative models for multi-class object layout. In ICCV, 2010.

Result

Result

Discussion Future Work Introduce visual phrases, phrasal recognition dataset A coding algorithm The dimensionality of our features grows with the number of categories Future Work the relations between attributes and objects parts and objects visual phrases and scenes objects and visual phrases mirror one another

Discussion Experience Low complexity Use less data to detection Features grows with the number of categories (exponential 2n) But we don’t need to consider all of the categories when we model the interactions Building long enough phrase tables is still a challenge