Instructor: Mircea Nicolescu Lecture 17

Name: Instructor: Mircea Nicolescu Lecture 17
Uploaded: 2017-07-19T11:14:39+00:00
Duration: PTM14S8
Channel: Linda Logan
Description: Instructor: Mircea Nicolescu Lecture 17

Instructor: Mircea Nicolescu Lecture 17
CS 485 / 685 Computer Vision Instructor: Mircea Nicolescu Lecture 17 Good afternoon and thank you everyone for coming. My talk today will describe the research I performed at the IRIS at USC, the object of this work being to build a computational framework that addresses the problem of motion analysis and interpretation.

Object Recognition Using SIFT Features
Match individual SIFT features from an image to a database of SIFT features from known objects (i.e., find nearest neighbors) 2. Find clusters of SIFT features belonging to a single object (hypothesis generation) 2

Object Recognition Using SIFT Features
3. Estimate object pose (i.e., recover the transformation that the model has undergone) using at least three matches 4. Verify that additional features agree on object pose 3

Nearest Neighbor Search
Linear search: too slow for large database kD trees: become slow when k > 10

Nearest Neighbor Search
Approximate nearest neighbor search: Best-bin-first [Beis et al. 97] (modification to kD-tree algorithm) Examine only the N closest bins of the kD-tree Use a heap to identify bins in order by their distance from query. Can give speedup by factor of 1000 while finding nearest neighbor (of interest) 95% of the time. FLANN - Fast Library for Approximate Nearest Neighbors Marius Muja and David G. Lowe, "Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration", International Conference on Computer Vision Theory and Applications, 2009.

Estimate Object Pose Now, given feature matches…
Find clusters of features corresponding to a single object Solve for transformation (e.g., affine transformation)

Estimate Object Pose Need to consider clusters of size >=3
How do we find three “good” (true) matches?

Estimate Object Pose Pose clustering (2D location, scale, orientation)
Each feature is associated with four parameters: For every model-scene match (mi, sj), estimate a similarity transformation between mi and sj (2D location, scale, orientation) (tx, ty, s, θ) vote 8

Estimate Object Pose Transformation space is 4D: (tx, ty, s, θ)
…. votes

Estimate Object Pose Partial voting: vote for neighboring bins as well, and use large bin size to better tolerate errors Transformations that accumulate at least three votes are selected (hypothesis generation) Using model-scene matches, compute object pose (i.e., affine transformation) and apply verification

Verification Back-project model on the scene and look for additional matches. Discard outliers (incorrect matches) by imposing stricter matching constraints (e.g., half error). Find additional matches by refining the transformation computed (i.e., iterative affine refinements).

Verification Evaluate probability that match is correct.
Use a Bayesian (probabilistic) model, to estimate the probability that a model is present based on the actual number of matching features. Bayesian model takes into account: Object size in image Textured regions Model feature count in database Accuracy of fit Lowe, D.G Local feature view clustering for 3D object recognition. IEEE Conference on Computer Vision and Pattern Recognition, Kauai, Hawaii, pp. 682–688.

Planar Recognition Training images (models)

Planar Recognition Reliably recognized at a rotation of 60° away from the camera. Affine fit approximates perspective projection. Only 3 points are needed for recognition.

3D Object Recognition Training images

3D Object Recognition Only 3 keypoints are needed for recognition; extra keypoints provide robustness. Affine model is no longer as accurate.

Recognition Under Occlusion

Illumination Invariance

Object Categorization

Bag-of-Features (BoF) Models
Good for object categorization 20

Origin 1: Texture Recognition
Texture is characterized by the repetition of basic elements or textons. Many times, it is the identity of the textons, not their spatial arrangement, that matters. 21

Origin 1: Texture Recognition
histogram universal texton dictionary 22

Origin 2: Document Retrieval
Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983) 23

BoF for Object Categorization
Need a “visual” dictionary! G. Csurka et al., "Visual Categorization with Bags of Keypoints", European Conference on Computer Vision, Czech Republic, 2004. 24

Characterize objects in terms of parts or local features
BoF: Main Steps Characterize objects in terms of parts or local features 25

BoF: main steps Step 1: Feature extraction (e.g., SIFT features) … 26

BoF: main steps (cont’d)
Step 2: Learn “visual” vocabulary … Feature extraction & clustering “visual” vocabulary 27

BoF: Main Steps … Features 28

BoF: Main Steps … Clustering 29

BoF: Main Steps “Visual” vocabulary: cluster centers … Clustering 30

Example: K-Means Clustering
Algorithm: Randomly initialize K cluster centers Iterate until convergence: Assign each data point to the nearest center. Re-compute each cluster center as the mean of all points assigned to it. 31

BoF: Main Steps Step 3: Quantize features using “visual” vocabulary
(i.e., represent each feature by the closest cluster center) 32

BoF: Main Steps Step 4: Represent images by frequencies of “visual words” (i.e., bags of features) 33

BoF: Main Steps 34

BoF Object Categorization
How do we use BoF for object categorization? 35

Nearest Neighbor (NN) Classifier 36

K-Nearest Neighbor (KNN) Classifier Find the k closest points from training data. Labels of the k points “vote” to classify. Works well provided there is lots of data and the distance function is good. 37

Functions for comparing histograms 38

SVM classifier SVM 39

Example 40

Example Dictionary quality and size are very important parameters! 41

Appearance-Based Recognition
Represent an object by the set of its possible appearances (i.e., under all possible viewpoints and illumination conditions). Identifying an object implies finding the closest stored image.

Appearance-Based Recognition
In practice, a subset of all possible appearances is used. Images are highly correlated, so “compress” them into a low-dimensional space that captures key appearance characteristics (e.g., use Principal Component Analysis (PCA)). M. Turk and A. Pentland, Eigenfaces for Recognition, Journal of Cognitive Neuroscience, vol. 3, no. 1, pp , 1991. H. Murase and S. Nayar, Visual Learning and Recognition of 3D Objects from Appearance, International Journal of Computer Vision, vol 14, pp. 5-24, 1995.

Image Segmentation Goals and Difficulties
The goal of segmentation is to partition an image into regions (e.g., separate objects from background) The results of segmentation are very important in determining the eventual success or failure of image analysis Segmentation is a very difficult problem in general !!

Image Segmentation Increasing accuracy and robustness
Introduce enough knowledge about the application domain Assume control over the environment (e.g., in industrial applications) Select type of sensors to enhance the objects of interest (e.g., use infrared imaging for target recognition applications)

Image Segmentation Segmentation approaches Edge-based approaches:
Use the boundaries of regions to segment the image Detect abrupt changes in intensity (discontinuities) Region-based approaches: Use similarity among pixels to find different regions Theoretically, both approaches should give identical results but this is not true in practice

Region Detection A region is a group of connected pixels with similar properties. Region-based approaches use similarity and spatial proximity among pixels to find different regions. The goal is to divide the image into regions, so that: each region is homogeneous in some sense adjacent regions are not homogeneous if taken together, in the same sense.

Region Detection Properties for region-based segmentation
Partition an image R into sub-regions R1, R2,..., Rn Assume P(Ri) is a logical predicate – a property that pixel values of region Ri satisfy (e.g., intensity between 100 and 120). The following properties must be true:

Region Detection Main approaches for region detection
Thresholding (pixel classification) Region growing (splitting and merging) Relaxation

if f(x,y) < T then f(x,y) = 0 else f(x,y) = 255
Thresholding The simplest approach to image segmentation is by thresholding: if f(x,y) < T then f(x,y) = 0 else f(x,y) = 255

Thresholding Automatic thresholding
To make segmentation more robust, the threshold should be automatically selected by the system. Knowledge about the objects, the application, the environment should be used to choose the threshold automatically: Intensity characteristics of the objects Sizes of the objects Fractions of an image occupied by the objects Number of different types of objects appearing in an image

Thresholding Choosing the threshold using the image histogram
Regions with uniform intensity give rise to strong peaks in the histogram Multilevel thresholding is also possible In general, good thresholds can be selected if the histogram peaks are tall, narrow, symmetric, and separated by deep valleys.

Instructor: Mircea Nicolescu Lecture 17

Similar presentations

Presentation on theme: "Instructor: Mircea Nicolescu Lecture 17"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Instructor: Mircea Nicolescu Lecture 17

Similar presentations

Presentation on theme: "Instructor: Mircea Nicolescu Lecture 17"— Presentation transcript:

Similar presentations

About project

Feedback