The topic discovery models

Slides:



Advertisements
Similar presentations
Topic models Source: Topic models, David Blei, MLSS 09.
Advertisements

Weakly supervised learning of MRF models for image region labeling Jakob Verbeek LEAR team, INRIA Rhône-Alpes.
Tamara Berg Object Recognition – BoF models Recognizing People, Objects, & Actions 1.
Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.
1 Part 1: Classical Image Classification Methods Kai Yu Dept. of Media Analytics NEC Laboratories America Andrew Ng Computer Science Dept. Stanford University.
Generative learning methods for bags of features
Statistical Topic Modeling part 1
CS4670 / 5670: Computer Vision Bag-of-words models Noah Snavely Object
Bag-of-features models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Generative Topic Models for Community Analysis
CVPR 2008 James Philbin Ondˇrej Chum Michael Isard Josef Sivic
Lecture 28: Bag-of-words models
Agenda Introduction Bag-of-words model Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.
Expectation Maximization Method Effective Image Retrieval Based on Hidden Concept Discovery in Image Database By Sanket Korgaonkar Masters Computer Science.
1 Unsupervised Modeling and Recognition of Object Categories with Combination of Visual Contents and Geometric Similarity Links Gunhee Kim Christos Faloutsos.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.
Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Latent Dirichlet Allocation a generative model for text
Bag-of-features models
Unsupervised discovery of visual object class hierarchies Josef Sivic (INRIA / ENS), Bryan Russell (MIT), Andrew Zisserman (Oxford), Alyosha Efros (CMU)
Generative learning methods for bags of features
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
Object Class Recognition Using Discriminative Local Features Gyuri Dorko and Cordelia Schmid.
“Bag of Words”: recognition using texture : Advanced Machine Perception A. Efros, CMU, Spring 2006 Adopted from Fei-Fei Li, with some slides from.
A Bayesian Hierarchical Model for Learning Natural Scene Categories L. Fei-Fei and P. Perona. CVPR 2005 Discovering objects and their location in images.
Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.
Object recognition. Object Classes Individual Recognition.
Discriminative and generative methods for bags of features
Review: Intro to recognition Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Exercise Session 10 – Image Categorization
Step 3: Classification Learn a decision rule (classifier) assigning bag-of-features representations of images to different classes Decision boundary Zebra.
A Thousand Words in a Scene P. Quelhas, F. Monay, J. Odobez, D. Gatica-Perez and T. Tuytelaars PAMI, Sept
Example 16,000 documents 100 topic Picked those with large p(w|z)
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Building Face Dataset Shijin Kong. Building Face Dataset Ramanan et al, ICCV 2007, Leveraging Archival Video for Building Face DatasetsLeveraging Archival.
Recognition using Regions (Demo) Sudheendra V. Outline Generating multiple segmentations –Normalized cuts [Ren & Malik (2003)] Uniform regions –Watershed.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Classifying Images with Visual/Textual Cues By Steven Kappes and Yan Cao.
Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.
Eric Xing © Eric CMU, Machine Learning Latent Aspect Models Eric Xing Lecture 14, August 15, 2010 Reading: see class homepage.
Efficient Subwindow Search: A Branch and Bound Framework for Object Localization ‘PAMI09 Beyond Sliding Windows: Object Localization by Efficient Subwindow.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Unsupervised Learning: Kmeans, GMM, EM Readings: Barber
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang
Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.
Probabilistic Models for Discovering E-Communities Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha The Pennsylvania State University WWW.
Topic Modeling using Latent Dirichlet Allocation
Latent Dirichlet Allocation
Discovering Objects and their Location in Images Josef Sivic 1, Bryan C. Russell 2, Alexei A. Efros 3, Andrew Zisserman 1 and William T. Freeman 2 Goal:
CS246 Latent Dirichlet Analysis. LSI  LSI uses SVD to find the best rank-K approximation  The result is difficult to interpret especially with negative.
Towards Total Scene Understanding: Classification, Annotation and Segmentation in an Automatic Framework N 工科所 錢雅馨 2011/01/16 Li-Jia Li, Richard.
Automatic Labeling of Multinomial Topic Models
Object-Graphs for Context-Aware Category Discovery Yong Jae Lee and Kristen Grauman University of Texas at Austin 1.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
A PPLICATIONS OF TOPIC MODELS Daphna Weinshall B Slides credit: Joseph Sivic, Li Fei-Fei, Brian Russel and others.
B. Freeman, Tomasz Malisiewicz, Tom Landauer and Peter Foltz,
The topic discovery models
Video Google: Text Retrieval Approach to Object Matching in Videos
The topic discovery models
Object-Graphs for Context-Aware Category Discovery
Latent Dirichlet Analysis
Topic Modeling Nick Jordan.
Bayesian Inference for Mixture Language Models
Stochastic Optimization Maximization for Latent Variable Models
Michal Rosen-Zvi University of California, Irvine
Topic Models in Text Processing
Video Google: Text Retrieval Approach to Object Matching in Videos
Example segmentations - unseen images
Part 1: Bag-of-words models
Presentation transcript:

The topic discovery models Discovering Objects and their Location in Images Josef Sivic1, Bryan C. Russell2, Alexei A. Efros3, Andrew Zisserman1 and William T. Freeman2 MIT CMU 1Oxford University 2MIT 3Carnegie Mellon University CMU MIT Introduction The topic discovery models Improving localization using doublets Goal: Discover visual object categories and their segmentation given a collection of unlabelled images Probabilistic Latent Semantic Analysis (pLSA) [Hofmann’99] Form a new vocabulary from pairs of locally co-occurring regions w … visual words d … documents (images) z … topics (‘objects’) Doublet formation Doublet example I Doublet examle II pLSA graphical model P(z|d) and P(w|z) are multinomial distributions pLSA Model fitting: Find topic vectors P(w|z) common to all documents and mixture coefficients P(z|d) specific to each document. Fit model by maximizing likelihood of data using EM. All detected visual words Singlet segmentation Singlet segmentation Doublet segmentation Bounding box overlap score for faces, singlets: 0.49, doublets: 0.61 Approach: 1) Represent an image as a collection of visual words 2) Apply topic discovery models from statistical text analysis Latent Dirichlet Allocation (LDA) [Blei et al.’03] (see paper for more details) Treat multinomial weights over topics as random variables. Fit model using Gibbs sampling [Griffiths and Steyvers’04]. Experiment II: MIT dataset 2873 images, learn 10 topics Image representation 4 of the 10 learned topics shown by the 5 most probable images for each topic Represent an image as a histogram of “visual words” LDA graphical model 2 1 ... Results “Buildings” “Trees / Grass” Results shown only for pLSA. LDA had very similar performance. Experiment I: Caltech Dataset Histogram of visual words Four object categories: faces, motorbikes, airplanes and cars rear (total of 3,190 images) and 900 background images “Computers” “Bookshelves” Detect affine covariant regions Represent each region by a SIFT descriptor Build visual vocabulary by k-means clustering (K~1,000) Assign each region to the nearest cluster centre Mikolajczyk and Schmid’02, Schaffalitzky and Zisserman’02, Matas et al. ’02, Lowe’99, Sivic and Zisserman’03 Image Classification Example Images with multiple objects Assign each image to a topic with the highest P(z|d) Learn K = (5,6,7) topics Background is better modelled by multiple topics Pre-learning background topics on a separate bg dataset improves results Performance on novel images is comparable with weakly supervised method of [Fergus et al.’03] Examples of visual words Confusion tables (K=5,6,7) learned topics Experiment III: Application to image retrieval Segmentation Five samples from a ‘motorbike’ visual word Visual Polysemy. Single visual word occurring on different (but locally similar) parts on different object categories. Learn topic vectors on Caltech database Represent new query image in terms of learned topic vectors For a given word wi in document dj examine posterior probability over topics. pLSA Retrieve images within Caltech database Visual words colour coded according to the topic with the highest probability Raw word histograms Five samples from an ‘airplane’ visual word Visual Synonyms. Two different visual words representing a similar part of an object (wheel of a motorbike). Faces Motorbikes Query image Airplanes Cars Precision – Recall plot Overview Retrieve images in movie Pretty Woman Background I Background II Retrieved images using pLSA ‘object’ coefficients P(z|d) Find visual words Form histograms Background III Example face segmentation Pretty Woman (6,641 keyframes) Retrieved images using visual word histograms Represent each keyframe using topic vectors learned on Caltech database Discover topics Example motorbike segmentation Example airplane segmentation