Discovering Objects and their Location in Images Josef Sivic 1, Bryan C. Russell 2, Alexei A. Efros 3, Andrew Zisserman 1 and William T. Freeman 2 Goal:

Slides:

Advertisements

Similar presentations

Topic models Source: Topic models, David Blei, MLSS 09.

Advertisements

Weakly supervised learning of MRF models for image region labeling Jakob Verbeek LEAR team, INRIA Rhône-Alpes.

Tamara Berg Object Recognition – BoF models Recognizing People, Objects, & Actions 1.

Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.

Clustering with k-means and mixture of Gaussian densities Jakob Verbeek December 3, 2010 Course website:

Part 1: Bag-of-words models by Li Fei-Fei (Princeton)

1 Part 1: Classical Image Classification Methods Kai Yu Dept. of Media Analytics NEC Laboratories America Andrew Ng Computer Science Dept. Stanford University.

Generative learning methods for bags of features

Statistical Topic Modeling part 1

CS4670 / 5670: Computer Vision Bag-of-words models Noah Snavely Object

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

Bag-of-features models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.

Generative Topic Models for Community Analysis

CVPR 2008 James Philbin Ondˇrej Chum Michael Isard Josef Sivic

Lecture 28: Bag-of-words models

Agenda Introduction Bag-of-words model Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.

Expectation Maximization Method Effective Image Retrieval Based on Hidden Concept Discovery in Image Database By Sanket Korgaonkar Masters Computer Science.

1 Unsupervised Modeling and Recognition of Object Categories with Combination of Visual Contents and Geometric Similarity Links Gunhee Kim Christos Faloutsos.

Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.

Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.

Latent Dirichlet Allocation a generative model for text

Bag-of-features models

Unsupervised discovery of visual object class hierarchies Josef Sivic (INRIA / ENS), Bryan Russell (MIT), Andrew Zisserman (Oxford), Alyosha Efros (CMU)

Generative learning methods for bags of features

Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.

Object Class Recognition Using Discriminative Local Features Gyuri Dorko and Cordelia Schmid.

“Bag of Words”: recognition using texture : Advanced Machine Perception A. Efros, CMU, Spring 2006 Adopted from Fei-Fei Li, with some slides from.

A Bayesian Hierarchical Model for Learning Natural Scene Categories L. Fei-Fei and P. Perona. CVPR 2005 Discovering objects and their location in images.

Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.

Object recognition. Object Classes Individual Recognition.

Discriminative and generative methods for bags of features

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

Exercise Session 10 – Image Categorization

Unsupervised Learning of Categories from Sets of Partially Matching Image Features Kristen Grauman and Trevor Darrel CVPR 2006 Presented By Sovan Biswas.

Step 3: Classification Learn a decision rule (classifier) assigning bag-of-features representations of images to different classes Decision boundary Zebra.

A Thousand Words in a Scene P. Quelhas, F. Monay, J. Odobez, D. Gatica-Perez and T. Tuytelaars PAMI, Sept

Introduction to Machine Learning for Information Retrieval Xiaolong Wang.

Example 16,000 documents 100 topic Picked those with large p(w|z)

Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.

Building Face Dataset Shijin Kong. Building Face Dataset Ramanan et al, ICCV 2007, Leveraging Archival Video for Building Face DatasetsLeveraging Archival.

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

A Statistically Selected Part-Based Probabilistic Model for Object Recognition Zhipeng Zhao, Ahmed Elgammal Department of Computer Science, Rutgers, The.

Classifying Images with Visual/Textual Cues By Steven Kappes and Yan Cao.

Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.

Eric Xing © Eric CMU, Machine Learning Latent Aspect Models Eric Xing Lecture 14, August 15, 2010 Reading: see class homepage.

MSRI workshop, January 2005 Object Recognition Collected databases of objects on uniform background (no occlusions, no clutter) Mostly focus on viewpoint.

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Unsupervised Learning: Kmeans, GMM, EM Readings: Barber

A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,

Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang

Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.

Probabilistic Models for Discovering E-Communities Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha The Pennsylvania State University WWW.

Topic Modeling using Latent Dirichlet Allocation

Latent Dirichlet Allocation

CS246 Latent Dirichlet Analysis. LSI  LSI uses SVD to find the best rank-K approximation  The result is difficult to interpret especially with negative.

CS654: Digital Image Analysis

Object-Graphs for Context-Aware Category Discovery Yong Jae Lee and Kristen Grauman University of Texas at Austin 1.

Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.

A PPLICATIONS OF TOPIC MODELS Daphna Weinshall B Slides credit: Joseph Sivic, Li Fei-Fei, Brian Russel and others.

B. Freeman, Tomasz Malisiewicz, Tom Landauer and Peter Foltz,

The topic discovery models

Video Google: Text Retrieval Approach to Object Matching in Videos

The topic discovery models

Object-Graphs for Context-Aware Category Discovery

Latent Dirichlet Analysis

The topic discovery models

Michal Rosen-Zvi University of California, Irvine

Topic Models in Text Processing

Video Google: Text Retrieval Approach to Object Matching in Videos

Example segmentations - unseen images

Part 1: Bag-of-words models

Presentation transcript:

Discovering Objects and their Location in Images Josef Sivic 1, Bryan C. Russell 2, Alexei A. Efros 3, Andrew Zisserman 1 and William T. Freeman 2 Goal: Discover visual object categories and their segmentation given a collection of unlabelled images Introduction Represent an image as a histogram of “visual words” The topic discovery models Probabilistic Latent Semantic Analysis (pLSA) [Hofmann’99] Experiment I: Caltech Dataset pLSA graphical model Five samples from a ‘motorbike’ visual word Improving localization using doublets 1 Oxford University 2 MIT 3 Carnegie Mellon University Experiment II: MIT dataset Overview Find topic vectors P(w|z) common to all documents and mixture coefficients P(z|d) specific to each document. Fit model by maximizing likelihood of data using EM. pLSA Model fitting: Assign each image to a topic with the highest P(z|d) Learn K = (5,6,7) topics Background is better modelled by multiple topics Pre-learning background topics on a separate bg dataset improves results Performance on novel images is comparable with weakly supervised method of [Fergus et al.’03] Confusion tables (K=5,6,7) learned topics Form a new vocabulary from pairs of locally co-occurring regions Doublet example IDoublet examle II Doublet segmentationSinglet segmentation 4 of the 10 learned topics shown by the 5 most probable images for each topic images, learn 10 topics Singlet segmentationAll detected visual words “Buildings”“Trees / Grass” “Bookshelves”“Computers” Example Images with multiple objects Image representation Approach: 1) Represent an image as a collection of visual words 2) Apply topic discovery models from statistical text analysis Results Histogram of visual words Detect affine covariant regions Represent each region by a SIFT descriptor Build visual vocabulary by k-means clustering (K~1,000) Assign each region to the nearest cluster centre Five samples from an ‘airplane’ visual word Mikolajczyk and Schmid’02, Schaffalitzky and Zisserman’02, Matas et al. ’02, Lowe’99, Sivic and Zisserman’03 Examples of visual words Doublet formation Segmentation For a given word w i in document d j examine posterior probability over topics. Faces Motorbikes Airplanes Cars Background I Background II Background III Visual words colour coded according to the topic with the highest probability Example motorbike segmentation Example airplane segmentation Image Classification Four object categories: faces, motorbikes, airplanes and cars rear (total of 3,190 images) and 900 background images LDA graphical model Latent Dirichlet Allocation (LDA) [Blei et al.’03] Treat multinomial weights over topics as random variables. Fit model using Gibbs sampling [Griffiths and Steyvers’04]. Results shown only for pLSA. LDA had very similar performance. Experiment III: Application to image retrieval Learn topic vectors on Caltech database Represent new query image in terms of learned topic vectors Retrieved images using visual word histograms Retrieved images using pLSA ‘object’ coefficients P(z|d) Example face segmentation Represent each keyframe using topic vectors learned on Caltech database Pretty Woman (6,641 keyframes) Retrieve images within Caltech database Query image pLSA Retrieve images in movie Pretty Woman Raw word histograms Precision – Recall plot Find visual words Form histograms Discover topics Visual Polysemy. Single visual word occurring on different (but locally similar) parts on different object categories. Visual Synonyms. Two different visual words representing a similar part of an object (wheel of a motorbike). w … visual words d … documents (images) z … topics (‘objects’) P(z|d) and P(w|z) are multinomial distributions CMU