The topic discovery models

Slides:

Advertisements

Similar presentations

Topic models Source: Topic models, David Blei, MLSS 09.

Advertisements

Weakly supervised learning of MRF models for image region labeling Jakob Verbeek LEAR team, INRIA Rhône-Alpes.

Tamara Berg Object Recognition – BoF models Recognizing People, Objects, & Actions 1.

Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.

1 Part 1: Classical Image Classification Methods Kai Yu Dept. of Media Analytics NEC Laboratories America Andrew Ng Computer Science Dept. Stanford University.

Generative learning methods for bags of features

Statistical Topic Modeling part 1

CS4670 / 5670: Computer Vision Bag-of-words models Noah Snavely Object

Bag-of-features models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.

Generative Topic Models for Community Analysis

CVPR 2008 James Philbin Ondˇrej Chum Michael Isard Josef Sivic

Lecture 28: Bag-of-words models

Agenda Introduction Bag-of-words model Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.

Expectation Maximization Method Effective Image Retrieval Based on Hidden Concept Discovery in Image Database By Sanket Korgaonkar Masters Computer Science.

1 Unsupervised Modeling and Recognition of Object Categories with Combination of Visual Contents and Geometric Similarity Links Gunhee Kim Christos Faloutsos.

Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.

Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.

Latent Dirichlet Allocation a generative model for text

Bag-of-features models

Unsupervised discovery of visual object class hierarchies Josef Sivic (INRIA / ENS), Bryan Russell (MIT), Andrew Zisserman (Oxford), Alyosha Efros (CMU)

Generative learning methods for bags of features

Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.

Object Class Recognition Using Discriminative Local Features Gyuri Dorko and Cordelia Schmid.

“Bag of Words”: recognition using texture : Advanced Machine Perception A. Efros, CMU, Spring 2006 Adopted from Fei-Fei Li, with some slides from.

A Bayesian Hierarchical Model for Learning Natural Scene Categories L. Fei-Fei and P. Perona. CVPR 2005 Discovering objects and their location in images.

Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.

Object recognition. Object Classes Individual Recognition.

Discriminative and generative methods for bags of features

Review: Intro to recognition Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear.

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

Exercise Session 10 – Image Categorization

Step 3: Classification Learn a decision rule (classifier) assigning bag-of-features representations of images to different classes Decision boundary Zebra.

A Thousand Words in a Scene P. Quelhas, F. Monay, J. Odobez, D. Gatica-Perez and T. Tuytelaars PAMI, Sept

Example 16,000 documents 100 topic Picked those with large p(w|z)

Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.

Building Face Dataset Shijin Kong. Building Face Dataset Ramanan et al, ICCV 2007, Leveraging Archival Video for Building Face DatasetsLeveraging Archival.

Recognition using Regions (Demo) Sudheendra V. Outline Generating multiple segmentations –Normalized cuts [Ren & Malik (2003)] Uniform regions –Watershed.

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

Classifying Images with Visual/Textual Cues By Steven Kappes and Yan Cao.

Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.

Eric Xing © Eric CMU, Machine Learning Latent Aspect Models Eric Xing Lecture 14, August 15, 2010 Reading: see class homepage.

Efficient Subwindow Search: A Branch and Bound Framework for Object Localization ‘PAMI09 Beyond Sliding Windows: Object Localization by Efficient Subwindow.

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Unsupervised Learning: Kmeans, GMM, EM Readings: Barber

A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,

Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang

Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.

Probabilistic Models for Discovering E-Communities Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha The Pennsylvania State University WWW.

Topic Modeling using Latent Dirichlet Allocation

Latent Dirichlet Allocation

Discovering Objects and their Location in Images Josef Sivic 1, Bryan C. Russell 2, Alexei A. Efros 3, Andrew Zisserman 1 and William T. Freeman 2 Goal:

CS246 Latent Dirichlet Analysis. LSI  LSI uses SVD to find the best rank-K approximation  The result is difficult to interpret especially with negative.

Towards Total Scene Understanding: Classiﬁcation, Annotation and Segmentation in an Automatic Framework N 工科所錢雅馨 2011/01/16 Li-Jia Li, Richard.

Automatic Labeling of Multinomial Topic Models

Object-Graphs for Context-Aware Category Discovery Yong Jae Lee and Kristen Grauman University of Texas at Austin 1.

Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.

A PPLICATIONS OF TOPIC MODELS Daphna Weinshall B Slides credit: Joseph Sivic, Li Fei-Fei, Brian Russel and others.

B. Freeman, Tomasz Malisiewicz, Tom Landauer and Peter Foltz,

The topic discovery models

Video Google: Text Retrieval Approach to Object Matching in Videos

The topic discovery models

Object-Graphs for Context-Aware Category Discovery

Latent Dirichlet Analysis

Topic Modeling Nick Jordan.

Bayesian Inference for Mixture Language Models

Stochastic Optimization Maximization for Latent Variable Models

Michal Rosen-Zvi University of California, Irvine

Topic Models in Text Processing

Video Google: Text Retrieval Approach to Object Matching in Videos

Example segmentations - unseen images

Part 1: Bag-of-words models

Presentation transcript:

The topic discovery models Discovering Objects and their Location in Images Josef Sivic1, Bryan C. Russell2, Alexei A. Efros3, Andrew Zisserman1 and William T. Freeman2 MIT CMU 1Oxford University 2MIT 3Carnegie Mellon University CMU MIT Introduction The topic discovery models Improving localization using doublets Goal: Discover visual object categories and their segmentation given a collection of unlabelled images Probabilistic Latent Semantic Analysis (pLSA) [Hofmann’99] Form a new vocabulary from pairs of locally co-occurring regions w … visual words d … documents (images) z … topics (‘objects’) Doublet formation Doublet example I Doublet examle II pLSA graphical model P(z|d) and P(w|z) are multinomial distributions pLSA Model fitting: Find topic vectors P(w|z) common to all documents and mixture coefficients P(z|d) specific to each document. Fit model by maximizing likelihood of data using EM. All detected visual words Singlet segmentation Singlet segmentation Doublet segmentation Bounding box overlap score for faces, singlets: 0.49, doublets: 0.61 Approach: 1) Represent an image as a collection of visual words 2) Apply topic discovery models from statistical text analysis Latent Dirichlet Allocation (LDA) [Blei et al.’03] (see paper for more details) Treat multinomial weights over topics as random variables. Fit model using Gibbs sampling [Griffiths and Steyvers’04]. Experiment II: MIT dataset 2873 images, learn 10 topics Image representation 4 of the 10 learned topics shown by the 5 most probable images for each topic Represent an image as a histogram of “visual words” LDA graphical model 2 1 ... Results “Buildings” “Trees / Grass” Results shown only for pLSA. LDA had very similar performance. Experiment I: Caltech Dataset Histogram of visual words Four object categories: faces, motorbikes, airplanes and cars rear (total of 3,190 images) and 900 background images “Computers” “Bookshelves” Detect affine covariant regions Represent each region by a SIFT descriptor Build visual vocabulary by k-means clustering (K~1,000) Assign each region to the nearest cluster centre Mikolajczyk and Schmid’02, Schaffalitzky and Zisserman’02, Matas et al. ’02, Lowe’99, Sivic and Zisserman’03 Image Classification Example Images with multiple objects Assign each image to a topic with the highest P(z|d) Learn K = (5,6,7) topics Background is better modelled by multiple topics Pre-learning background topics on a separate bg dataset improves results Performance on novel images is comparable with weakly supervised method of [Fergus et al.’03] Examples of visual words Confusion tables (K=5,6,7) learned topics Experiment III: Application to image retrieval Segmentation Five samples from a ‘motorbike’ visual word Visual Polysemy. Single visual word occurring on different (but locally similar) parts on different object categories. Learn topic vectors on Caltech database Represent new query image in terms of learned topic vectors For a given word wi in document dj examine posterior probability over topics. pLSA Retrieve images within Caltech database Visual words colour coded according to the topic with the highest probability Raw word histograms Five samples from an ‘airplane’ visual word Visual Synonyms. Two different visual words representing a similar part of an object (wheel of a motorbike). Faces Motorbikes Query image Airplanes Cars Precision – Recall plot Overview Retrieve images in movie Pretty Woman Background I Background II Retrieved images using pLSA ‘object’ coefficients P(z|d) Find visual words Form histograms Background III Example face segmentation Pretty Woman (6,641 keyframes) Retrieved images using visual word histograms Represent each keyframe using topic vectors learned on Caltech database Discover topics Example motorbike segmentation Example airplane segmentation