a chicken and egg problem…

Slides:



Advertisements
Similar presentations
Sinead Williamson, Chong Wang, Katherine A. Heller, David M. Blei
Advertisements

Yansong Feng and Mirella Lapata
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Topic models Source: Topic models, David Blei, MLSS 09.
Weakly supervised learning of MRF models for image region labeling Jakob Verbeek LEAR team, INRIA Rhône-Alpes.
Context-based object-class recognition and retrieval by generalized correlograms by J. Amores, N. Sebe and P. Radeva Discussion led by Qi An Duke University.
Hierarchical Dirichlet Processes
Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.
Simultaneous Image Classification and Annotation Chong Wang, David Blei, Li Fei-Fei Computer Science Department Princeton University Published in CVPR.
UCB Computer Vision Animals on the Web Tamara L. Berg CSE 595 Words & Pictures.
Patch to the Future: Unsupervised Visual Prediction
Transferable Dictionary Pair based Cross-view Action Recognition Lin Hong.
Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Computer Vision CS 143, Brown James Hays 02/22/11 Many slides from Derek Hoiem.
Global spatial layout: spatial pyramid matching Spatial weighting the features Beyond bags of features: Adding spatial information.
Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 03/15/12.
EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.
Caimei Lu et al. (KDD 2010) Presented by Anson Liang.
Li-Jia Li Yongwhan Lim Li Fei-Fei Chong Wang David M. Blei B UILDING AND U SING A S EMANTIVISUAL I MAGE H IERARCHY CVPR, 2010.
1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.
Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson
Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Unsupervised discovery of visual object class hierarchies Josef Sivic (INRIA / ENS), Bryan Russell (MIT), Andrew Zisserman (Oxford), Alyosha Efros (CMU)
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Visual Recognition Tutorial
SIEVE—Search Images Effectively through Visual Elimination Ying Liu, Dengsheng Zhang and Guojun Lu Gippsland School of Info Tech,
Wang, Z., et al. Presented by: Kayla Henneman October 27, 2014 WHO IS HERE: LOCATION AWARE FACE RECOGNITION.
Unsupervised Learning of Categories from Sets of Partially Matching Image Features Kristen Grauman and Trevor Darrel CVPR 2006 Presented By Sovan Biswas.
SVCL Automatic detection of object based Region-of-Interest for image compression Sunhyoung Han.
A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA
(Infinitely) Deep Learning in Vision Max Welling (UCI) collaborators: Ian Porteous (UCI) Evgeniy Bart UCI/Caltech) Pietro Perona (Caltech)
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Topic Modelling: Beyond Bag of Words By Hanna M. Wallach ICML 2006 Presented by Eric Wang, April 25 th 2008.
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
Category Discovery from the Web slide credit Fei-Fei et. al.
80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
In Defense of Nearest-Neighbor Based Image Classification Oren Boiman The Weizmann Institute of Science Rehovot, ISRAEL Eli Shechtman Adobe Systems Inc.
MSRI workshop, January 2005 Object Recognition Collected databases of objects on uniform background (no occlusions, no clutter) Mostly focus on viewpoint.
UNBIASED LOOK AT DATASET BIAS Antonio Torralba Massachusetts Institute of Technology Alexei A. Efros Carnegie Mellon University CVPR 2011.
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li and Li Fei-Fei Dept. of Computer Science, Princeton University, USA CVPR ImageNet1.
Bayesian Parameter Estimation Liad Serruya. Agenda Introduction Bayesian decision theory Scale-Invariant Learning Bayesian “One-Shot” Learning.
Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.
Tell Me What You See and I will Show You Where It Is Jia Xu 1 Alexander G. Schwing 2 Raquel Urtasun 2,3 1 University of Wisconsin-Madison 2 University.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.
Discovering Objects and their Location in Images Josef Sivic 1, Bryan C. Russell 2, Alexei A. Efros 3, Andrew Zisserman 1 and William T. Freeman 2 Goal:
Towards Total Scene Understanding: Classification, Annotation and Segmentation in an Automatic Framework N 工科所 錢雅馨 2011/01/16 Li-Jia Li, Richard.
Automatic Labeling of Multinomial Topic Models
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 02/22/11.
A PPLICATIONS OF TOPIC MODELS Daphna Weinshall B Slides credit: Joseph Sivic, Li Fei-Fei, Brian Russel and others.
Topic Modeling for Short Texts with Auxiliary Word Embeddings
Big data classification using neural network
Figure 5: Change in Blackjack Posterior Distributions over Time.
Bayesian Generalized Product Partition Model
The topic discovery models
Learning Mid-Level Features For Recognition
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
8.1 Sampling Distributions
Li Fei-Fei, UIUC Rob Fergus, MIT Antonio Torralba, MIT
Incremental Boosting Incremental Learning of Boosted Face Detector ICCV 2007 Unsupervised Incremental Learning for Improved Object Detection in a Video.
The topic discovery models
Trevor Savage, Bogdan Dit, Malcom Gethers and Denys Poshyvanyk
MURI Kickoff Meeting Randolph L. Moses November, 2008
The topic discovery models
Outline Background Motivation Proposed Model Experimental Results
Learning From Observed Data
Unsupervised learning of visual sense models for Polysemous words
Presentation transcript:

OPTIMOL: automatic Object Picture collecTion via Incremental MOdel Learning

a chicken and egg problem…

…among users, researchers, and data Users of the web search engines would like better image results; developer of these search engines would like more robust visual models to improve these results; computer vision researchers are developing the visual models and algorithms for this purpose; but in order to do so, it is critical to have large and diverse object image datasets for training and evaluations; this, goes back to the same problem that the users face. Currently, there is no optimal solution for this problem. Researchers have to manually select the desire images. Well known datasets such as Caltech, LabelMe and LHI etc are collected this way. e.g. Caltech101, LabelMe, LHI Images

Framework Dataset Category model Classification Keyword: accordion The intuition here is: Since the web is vast open, we want to use it as our resource in an iterative fashion which scales well while accumulates the knowledge. Given a very small number of seed images of an object class (either provided by human or automatically), our algorithm learns a model that best describes this class. Serving as a classifier, the algorithm can pull from the web those images that belong to the object class. The newly collected images go to the object dataset, serving as new training data to update and improve the object model. With this new model, the algorithm can then go back to the web and pull more relevant images. This is an iterative process that continuously gathers a highly accurate image dataset while learning a more and more robust object model. Start from a very small number of seed images of an object class (either provided by human or automatically), our algorithm can learn a category model that best describes this class . Then this model will classify the images downloaded from the internet. And the good ones will be appended to the dataset. For the next iteration, a subset of the newly incorporated images will be used to update the category model. With this new model, the algorithm can then go back to the downloaded images and pull more relevant images. Keyword: accordion Li, Wang & Fei-Fei, CVPR 2007

Framework Dataset Category model Classification Keyword: accordion Now let’s talk about the category model Keyword: accordion Li, Wang & Fei-Fei, CVPR 2007

Image representation … Kadir&Brady interest point detector Codewords representation How do we obtain the ‘bag of words’ visual representation? Kb is used …blah blah … Compute SIFT descriptor [Lowe’99]

Nonparametric topic model -Hierarchical Dirichlet Process (HDP) Each image Latent topic models are widely used for unsupervised learning since they can offer natural clustering. Topics are used to discover subsets of the data with common attributes. There are a lot of latent topic models including plSA, LDA and HDP etc. In the scenario of incremental learning, we don’t know the number of topics in advance. Therefore we choose nonparametric topic model, specifically, Hierarchical Dirichlet Process as our visual model. We first define some notation for the graphical model. Start from the inner rectangular, a patch x is the basic unit of an image, each patch is defined by a codeword member of the visual dictionary. Codewords with common attributes share the same topic. Z is the topic index of a particular patch. An image is a collection of N patches which form the inner rectangular. Pi is the mixture proportion of the image. M Images in a corpus (the outer rectangular) have mixture proportions sampled from the same parameters gamma and alpha. beta determines the average weight of topics (E[pijk] = betak), while alpha controls the variability of topics weights across groups. H is the prior distribution for thetas, which correspond to the parameters of topics shared among different images and therefore determine the distribution of the patches. We have such an model per category. Each patch N M Teh, et al. 2004; Sudderth et al. CVPR 2006; Wang, Zhang & Fei-Fei, CVPR 2006

Nonparametric topic model -Hierarchical Dirichlet Process (HDP) Latent topic models are widely used for unsupervised learning since they can offer natural clustering. Topics are used to discover subsets of the data with common attributes. There are a lot of latent topic models including plSA, LDA and HDP etc. In the scenario of incremental learning, we don’t know the number of topics in advance. Therefore we choose nonparametric topic model, specifically, Hierarchical Dirichlet Process as our visual model. We first define some notation for the graphical model. Start from the inner rectangular, a patch x is the basic unit of an image, each patch is defined by a codeword member of the visual dictionary. Codewords with common attributes share the same topic. Z is the topic index of a particular patch. An image is a collection of N patches which form the inner rectangular. Pi is the mixture proportion of the image. M Images in a corpus (the outer rectangular) have mixture proportions sampled from the same parameters gamma and alpha. beta determines the average weight of topics (E[pijk] = betak), while alpha controls the variability of topics weights across groups. H is the prior distribution for thetas, which correspond to the parameters of topics shared among different images and therefore determine the distribution of the patches. We have such an model per category. N M Teh, et al. 2004; Sudderth et al. CVPR 2006; Wang, Zhang & Fei-Fei, CVPR 2006

Classification Category likelihood for I: Likelihood ratio for decision: During data collecting, we do a binary classification by using the likelihood ratio of a foreground object model as well as a background model learned from unrelated images. For a dataset collecting approach, incorporating a bad image into the dataset (False Positive) is way worse than missing a good image (False Negative). Hence, a risk function is also introduced to penalize more heavily the False Positive. Here R is the shorthand of risk. Li, Wang & Fei-Fei, CVPR 2007

Annotation Li, Wang & Fei-Fei, CVPR 2007 With the object model, we can retrieve a large number of clean images with few mistake. Furthermore, we can do meaningful annotation by integrating out the topics parameter to find the most likely local patches given the object category. Li, Wang & Fei-Fei, CVPR 2007

Pitfall #1: model drift … … Object Model Object Model However, model updating based on a few strong cues is likely to be biased. Suppose we always update the object model using very similar images like these two. It’s very likely the collected images are very similar to these two images. Meanwhile, images like these with lower likelihood ratio will be missed. We can only get very good images like this. But, we can also get images like this because we use local patches and only appearance is considered in learning and classification without any globe information. Kid in strawberry has the most similar local patches as the two training images, thus it has a high likelihood ratio. Li, Wang & Fei-Fei, CVPR 2007

Pitfall #2: model diversity Object Model … However, model updating based on a few strong cues is likely to be biased. Suppose we always update the object model using very similar images like these two. It’s very likely the collected images are very similar to these two images. Meanwhile, images like these with lower likelihood ratio will be missed. We can only get very good images like this. But, we can also get images like this because we use local patches and only appearance is considered in learning and classification without any globe information. Kid in strawberry has the most similar local patches as the two training images, thus it has a high likelihood ratio. Good Images Bad Images Li, Wang & Fei-Fei, CVPR 2007

The “cache set” Li, Wang & Fei-Fei, CVPR 2007 Hence, a cache set is designed for our approach. First, images with high likelihood will be accepted. Then accepted images will be measured by their entropy. High entropy ones, which indicate new topics, will go to the cache. While low entropy ones, which we are very sure about, will be appended to the dataset. Incremental learning is only conducted on the cache set. H(z|I)=-sum(p(z|I)lnp(z|I)) Li, Wang & Fei-Fei, CVPR 2007

Raw image dataset Category Model Enlarged dataset Cache classification Incremental learning Category Model Enlarged dataset Cache classification Raw image dataset

Result Li, Wang & Fei-Fei, CVPR 2007 Here we use accordion as an example to show the result of the dataset collecting and the annotation. The four images on the left are the annotation results, our approach can precisely locate the accordion even in a very clutter background. On the right, the bar plot is the comparison of number of images collected by OPTIMOL versus existing datasets. Y axis represents the number of images. The blue bar is the labelme dataset, which has no accordion images, thus the bar is invisible. The yellow bar is the manually selected images from caltech 101 raw dataset. The red bar is images collected by OPTIMOL also from the caltech 101 raw dataset. In this case, OPTIMOL is comparable to human with only a few mistake, which is represented by the darker red part on the top. What we want to emphasize here is the two datasets are extracting from exactly the same raw dataset. The green bar is the number of clean images retrieved from our own raw web images. From the figure, we can see OPTIMOL can collect a lot more images than the existing dataset with few mistake. Li, Wang & Fei-Fei, CVPR 2007

Given in previous slide, human also made mistake when they collect the dataset, our result is reasonable. Li, Wang & Fei-Fei, CVPR 2007

OPTIMOL also learns good models Li, Wang & Fei-Fei, CVPR 2007

Team OPTIMOL (UIUC-Princeton): 1st Place in the Software League