Tamara Berg Object Recognition – BoF models 790-133 Recognizing People, Objects, & Actions 1.

Slides:

Advertisements

Similar presentations

University Paderborn 07 January 2009 RG Knowledge Based Systems Prof. Dr. Hans Kleine Büning Reinforcement Learning.

Advertisements

AP STUDY SESSION 2.

Reinforcement Learning

Kapitel 14 Recognition – p. 1 Recognition Scene understanding / visual object categorization Pose clustering Object recognition by local features Image.

Slide 1Fig 26-CO, p.795. Slide 2Fig 26-1, p.796 Slide 3Fig 26-2, p.797.

Slide 1Fig 25-CO, p.762. Slide 2Fig 25-1, p.765 Slide 3Fig 25-2, p.765.

STATISTICS INTERVAL ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.

David Burdett May 11, 2004 Package Binding for WS CDL.

Create an Application Title 1Y - Youth Chapter 5.

CHAPTER 18 The Ankle and Lower Leg

The 5S numbers game..

A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.

Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)

Break Time Remaining 10:00.

The basics for simulations

Factoring Quadratics — ax² + bx + c Topic

EE, NCKU Tien-Hao Chang (Darby Chang)

PP Test Review Sections 6-1 to 6-6

Briana B. Morrison Adapted from William Collins

Regression with Panel Data

Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)

Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.

Progressive Aerobic Cardiovascular Endurance Run

Biology 2 Plant Kingdom Identification Test Review.

Chapter 1: Expressions, Equations, & Inequalities

Adding Up In Chunks.

MaK_Full ahead loaded 1 Alarm Page Directory (F11)

Facebook Pages 101: Your Organization’s Foothold on the Social Web A Volunteer Leader Webinar Sponsored by CACO December 1, 2010 Andrew Gossen, Senior.

Artificial Intelligence

When you see… Find the zeros You think….

Midterm Review Part II Midterm Review Part II 40.

Before Between After.

Slide R - 1 Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Prentice Hall Active Learning Lecture Slides For use with Classroom Response.

12 October, 2014 St Joseph's College ADVANCED HIGHER REVISION 1 ADVANCED HIGHER MATHS REVISION AND FORMULAE UNIT 2.

Subtraction: Adding UP

1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)

Static Equilibrium; Elasticity and Fracture

Converting a Fraction to %

Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)

Clock will move after 1 minute

Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 11 Simple Linear Regression.

Select a time to count down from the clock above

Copyright Tim Morris/St Stephen's School

1.step PMIT start + initial project data input Concept Concept.

9. Two Functions of Two Random Variables

A Data Warehouse Mining Tool Stephen Turner Chris Frala

1 Dr. Scott Schaefer Least Squares Curves, Rational Representations, Splines and Continuity.

Meat Identification Quiz

1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)

CS4670 / 5670: Computer Vision Bag-of-words models Noah Snavely Object

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

Bag-of-features models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.

Lecture 28: Bag-of-words models

Bag-of-features models

5/30/2006EE 148, Spring Visual Categorization with Bags of Keypoints Gabriella Csurka Christopher R. Dance Lixin Fan Jutta Willamowski Cedric Bray.

Review: Intro to recognition Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear.

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

Step 3: Classification Learn a decision rule (classifier) assigning bag-of-features representations of images to different classes Decision boundary Zebra.

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.

Discovering Objects and their Location in Images Josef Sivic 1, Bryan C. Russell 2, Alexei A. Efros 3, Andrew Zisserman 1 and William T. Freeman 2 Goal:

CS654: Digital Image Analysis

The topic discovery models

The topic discovery models

The topic discovery models

Presentation transcript:

Tamara Berg Object Recognition – BoF models Recognizing People, Objects, & Actions 1

Topic Presentations Hopefully you have met your topic presentations group members? Group 1 – see me to run through slides this week or Monday at the latest (I’m traveling Thurs/Friday). Send me links to 2-3 papers for the class to read. Sign up for class google group ( ). To find the group go to groups.google.com and search for (sorted by date). Use this to post/answer questions related to the class. 2

Object Bag of ‘features’ Bag-of-features models source: Svetlana Lazebnik 3

Exchangeability De Finetti Theorem of exchangeability (bag of words theorem): the joint probability distribution underlying the data is invariant to permutation. 4

Origin 2: Bag-of-words models US Presidential Speeches Tag Cloud Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983) source: Svetlana Lazebnik 5

Bag of words for text  Represent documents as a “bags of words” 6

Example Doc1 = “the quick brown fox jumped” Doc2 = “brown quick jumped fox the” Would a bag of words model represent these two documents differently? 7

Bag of words for images  Represent images as a “bag of features” 8

Bag of features: outline 1.Extract features source: Svetlana Lazebnik 9

Bag of features: outline 1.Extract features 2.Learn “visual vocabulary” source: Svetlana Lazebnik 10

Bag of features: outline 1.Extract features 2.Learn “visual vocabulary” 3.Represent images by frequencies of “visual words” source: Svetlana Lazebnik 11

2. Learning the visual vocabulary Clustering … Slide credit: Josef Sivic 12

2. Learning the visual vocabulary Clustering … Slide credit: Josef Sivic Visual vocabulary 13

K-means clustering (reminder) Want to minimize sum of squared Euclidean distances between points x i and their nearest cluster centers m k Algorithm: Randomly initialize K cluster centers Iterate until convergence: Assign each data point to the nearest center Recompute each cluster center as the mean of all points assigned to it source: Svetlana Lazebnik 14

Example visual vocabulary Fei-Fei et al

Image Representation For a query image Extract features Associate each feature with the nearest cluster center (visual word) Accumulate visual word frequencies over the image Visual vocabulary x x x x x x x x x x

3. Image representation ….. frequency codewords source: Svetlana Lazebnik 17

4. Image classification ….. frequency codewords source: Svetlana Lazebnik 18 Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them? CAR

Image Categorization Choose from many categories What is this? helicopter

Image Categorization Choose from many categories What is this? SVM/NB Csurka et al (Caltech 4/7) Nearest Neighbor Berg et al (Caltech 101) Kernel + SVM Grauman et al (Caltech 101) Multiple Kernel Learning + SVMs Varma et al (Caltech 101) …

Visual Categorization with Bags of Keypoints Gabriella Csurka, Christopher R. Dance, Lixin Fan, Jutta Willamowski, Cédric Bray 21

Data Images in 7 classes: faces, buildings, trees, cars, phones, bikes, books Caltech 4 dataset: faces, airplanes, cars (rear and side), motorbikes, background 22

Method Steps: – Detect and describe image patches. – Assign patch descriptors to a set of predetermined clusters (a visual vocabulary). – Construct a bag of keypoints, which counts the number of patches assigned to each cluster. – Apply a classifier (SVM or Naïve Bayes), treating the bag of keypoints as the feature vector – Determine which category or categories to assign to the image. 23

Bag-of-Keypoints Approach Interesting Point Detection Key Patch Extraction Feature Descriptors Bag of Keypoints Multi-class Classifier 24 Slide credit: Yun-hsueh Liu

25 SIFT Descriptors Interesting Point Detection Key Patch Extraction Feature Descriptors Bag of Keypoints Multi-class Classifier Slide credit: Yun-hsueh Liu

26 Bag of Keypoints (1) Construction of a vocabulary – Kmeans clustering  find “ centroids ” (on all the descriptors we find from all the training images) – Define a “ vocabulary ” as a set of “ centroids ”, where every centroid represents a “ word ”. Interesting Point Detection Key Patch Extraction Feature Descriptors Bag of Keypoints Multi-class Classifier Slide credit: Yun-hsueh Liu

27 Bag of Keypoints (2) Histogram – Counts the number of occurrences of different visual words in each image Interesting Point Detection Key Patch Extraction Feature Descriptors Bag of Keypoints Multi-class Classifier Slide credit: Yun-hsueh Liu

28 Multi-class Classifier In this paper, classification is based on conventional machine learning approaches – Support Vector Machine (SVM) – Naïve Bayes Interesting Point Detection Key Patch Extraction Feature Descriptors Bag of Keypoints Multi-class Classifier Slide credit: Yun-hsueh Liu

SVM 29

Reminder: Linear SVM x1x1 x2x2 Margin w T x + b = 0 w T x + b = -1 w T x + b = 1 x+x+ x+x+ x-x- Support Vectors Slide credit: Jinwei Gu Slide 30 of 113 s.t.

Nonlinear SVMs: The Kernel Trick With this mapping, our discriminant function becomes: No need to know this mapping explicitly, because we only use the dot product of feature vectors in both the training and test. A kernel function is defined as a function that corresponds to a dot product of two feature vectors in some expanded feature space: Slide credit: Jinwei Gu 31

Nonlinear SVMs: The Kernel Trick  Linear kernel: Examples of commonly-used kernel functions:  Polynomial kernel:  Gaussian (Radial-Basis Function (RBF) ) kernel:  Sigmoid: Slide credit: Jinwei Gu 32

Reminder: Support Vector Machine 1. Choose a kernel function 2. Choose a value for C and any other parameters (e.g. σ) 3. Solve the quadratic programming problem (many software packages available) 4. Classify held out validation instances using the learned model 5. Select the best learned model based on validation accuracy 6. Classify test instances using the final selected model Slide 33 of 113

SVM for image classification Train k binary 1-vs-all SVMs (one per class) For a test instance, evaluate with each classifier Assign the instance to the class with the largest SVM output 34

Naïve Bayes 35

Naïve Bayes Model C – Class F - Features We only specify (parameters): prior over class labels how each feature depends on the class 36

Slide from Dan Klein 37 Example:

Slide from Dan Klein 38

Slide from Dan Klein 39

Percentage of documents in training set labeled as spam/ham Slide from Dan Klein 40

In the documents labeled as spam, occurrence percentage of each word (e.g. # times “the” occurred/# total words). Slide from Dan Klein 41

In the documents labeled as ham, occurrence percentage of each word (e.g. # times “the” occurred/# total words). Slide from Dan Klein 42

Classification The class that maximizes: 43

Classification In practice 44

Classification In practice – Multiplying lots of small probabilities can result in floating point underflow 45

Classification In practice – Multiplying lots of small probabilities can result in floating point underflow – Since log(xy) = log(x) + log(y), we can sum log probabilities instead of multiplying probabilities. 46

Classification In practice – Multiplying lots of small probabilities can result in floating point underflow – Since log(xy) = log(x) + log(y), we can sum log probabilities instead of multiplying probabilities. – Since log is a monotonic function, the class with the highest score does not change. 47

Classification In practice – Multiplying lots of small probabilities can result in floating point underflow – Since log(xy) = log(x) + log(y), we can sum log probabilities instead of multiplying probabilities. – Since log is a monotonic function, the class with the highest score does not change. – So, what we usually compute in practice is: 48

Naïve Bayes on images 49

Naïve Bayes C – Class F - Features We only specify (parameters): prior over class labels how each feature depends on the class 50

Naive Bayes Parameters Problem: Categorize images as one of k object classes using Naïve Bayes classifier: – Classes: object categories (face, car, bicycle, etc) – Features – Images represented as a histogram of visual words. are visual words. treated as uniform. learned from training data – images labeled with category. Probability of a visual word given an image category. 51

52 Multi-class classifier – Naïve Bayes (1) Let V = {v i }, i = 1,…,N, be a visual vocabulary, in which each v i represents a visual word (cluster centers) from the feature space. A set of labeled images I = {I i }. Denote C j to represent our Classes, where j = 1,..,M N(t,i) = number of times v i occurs in image I i Compute P(C j |I i ): Slide credit: Yun-hsueh Liu

53 Multi-class Classifier – Naïve Bayes (2) Goal - Find maximum probability class C j : In order to avoid zero probability, use Laplace smoothing: Slide credit: Yun-hsueh Liu

Results

55

Results 56

Results 57 Results on Dataset 2

Results 58

Results 59

Results 60

Thoughts? Pros? Cons?

Related BoF models pLSA, LDA, … 62

pLSA 63 word topicdocument

pLSA 64

Marginalizing over topics determines the conditional probability: Joint Probability: 65

Fitting the model Need to: Determine the topic vectors common to all documents. Determine the mixture components specific to each document. Goal: a model that gives high probability to the words that appear in the corpus. Maximum likelihood estimation of the parameters is obtained by maximizing the objective function: 66

pLSA on images 67

Discovering objects and their location in images Josef Sivic, Bryan C. Russell, Alexei A. Efros, Andrew Zisserman, William T. Freeman Documents – Images Words – visual words (vector quantized SIFT descriptors) Topics – object categories Images are modeled as a mixture of topics (objects). 68

Goals They investigate three areas: – (i) topic discovery, where categories are discovered by pLSA clustering on all available images. – (ii) classification of unseen images, where topics corresponding to object categories are learnt on one set of images, and then used to determine the object categories present in another set. – (iii) object detection, where you want to determine the location and approximate segmentation of object(s) in each image. 69

(i) Topic Discovery Most likely words for 4 learnt topics (face, motorbike, airplane, car) 70

(ii) Image Classification Confusion table for unseen test images against pLSA trained on images containing four object categories, but no background images. 71

(ii) Image Classification Confusion table for unseen test images against pLSA trained on images containing four object categories, and background images. Performance is not quite as good. 72

(iii) Topic Segmentation 73

(iii) Topic Segmentation 74

(iii) Topic Segmentation 75