Attention Model Based SIFT Keypoints Filtration for Image Retrieval

Slides:

Advertisements

Similar presentations

Distinctive Image Features from Scale-Invariant Keypoints

Advertisements

Distinctive Image Features from Scale-Invariant Keypoints David Lowe.

Presented by Xinyu Chang

The SIFT (Scale Invariant Feature Transform) Detector and Descriptor

TP14 - Local features: detection and description Computer Vision, FCUP, 2014 Miguel Coimbra Slides by Prof. Kristen Grauman.

A NOVEL LOCAL FEATURE DESCRIPTOR FOR IMAGE MATCHING Heng Yang, Qing Wang ICME 2008.

Object Recognition using Invariant Local Features Applications l Mobile robots, driver assistance l Cell phone location or object recognition l Panoramas,

Distinctive Image Features from Scale- Invariant Keypoints Mohammad-Amin Ahantab Technische Universität München, Germany.

Instructor: Mircea Nicolescu Lecture 15 CS 485 / 685 Computer Vision.

Matching with Invariant Features

Content Based Image Retrieval

Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.

Fast High-Dimensional Feature Matching for Object Recognition David Lowe Computer Science Department University of British Columbia.

A Study of Approaches for Object Recognition

Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.

Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.

Scale Invariant Feature Transform

Distinctive Image Feature from Scale-Invariant KeyPoints

Distinctive image features from scale-invariant keypoints. David G. Lowe, Int. Journal of Computer Vision, 60, 2 (2004), pp Presented by: Shalomi.

Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.

Object Recognition Using Distinctive Image Feature From Scale-Invariant Key point D. Lowe, IJCV 2004 Presenting – Anat Kaspi.

Scale Invariant Feature Transform (SIFT)

Automatic Matching of Multi-View Images

1 Invariant Local Feature for Object Recognition Presented by Wyman 2/05/2006.

776 Computer Vision Jan-Michael Frahm, Enrique Dunn Spring 2013.

Distinctive Image Features from Scale-Invariant Keypoints David G. Lowe – IJCV 2004 Brien Flewelling CPSC 643 Presentation 1.

Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Overview Introduction to local features

Interest Point Descriptors

Distinctive Image Features from Scale-Invariant Keypoints By David G. Lowe, University of British Columbia Presented by: Tim Havinga, Joël van Neerbos.

Computer vision.

Internet-scale Imagery for Graphics and Vision James Hays cs195g Computational Photography Brown University, Spring 2010.

Bag of Visual Words for Image Representation & Visual Search Jianping Fan Dept of Computer Science UNC-Charlotte.

Object Tracking/Recognition using Invariant Local Features Applications l Mobile robots, driver assistance l Cell phone location or object recognition.

Overview Harris interest points Comparing interest points (SSD, ZNCC, SIFT) Scale & affine invariant interest points Evaluation and comparison of different.

Local invariant features Cordelia Schmid INRIA, Grenoble.

Reporter: Fei-Fei Chen. Wide-baseline matching Object recognition Texture recognition Scene classification Robot wandering Motion tracking.

A Statistical Approach to Speed Up Ranking/Re-Ranking Hong-Ming Chen Advisor: Professor Shih-Fu Chang.

776 Computer Vision Jan-Michael Frahm Fall SIFT-detector Problem: want to detect features at different scales (sizes) and with different orientations!

CANONICAL IMAGE SELECTION FROM THE WEB ACM International Conference on Image and Video Retrieval, 2007 Yushi Jing Shumeet Baluja Henry Rowley.

CVPR 2003 Tutorial Recognition and Matching Based on Local Invariant Features David Lowe Computer Science Department University of British Columbia.

CSCE 643 Computer Vision: Extractions of Image Features Jinxiang Chai.

Local invariant features Cordelia Schmid INRIA, Grenoble.

Distinctive Image Features from Scale-Invariant Keypoints Ronnie Bajwa Sameer Pawar * * Adapted from slides found online by Michael Kowalski, Lehigh University.

A DISTRIBUTION BASED VIDEO REPRESENTATION FOR HUMAN ACTION RECOGNITION Yan Song, Sheng Tang, Yan-Tao Zheng, Tat-Seng Chua, Yongdong Zhang, Shouxun Lin.

Harris Corner Detector & Scale Invariant Feature Transform (SIFT)

Distinctive Image Features from Scale-Invariant Keypoints David Lowe Presented by Tony X. Han March 11, 2008.

Overview Introduction to local features Harris interest points + SSD, ZNCC, SIFT Scale & affine invariant interest point detectors Evaluation and comparison.

CSE 185 Introduction to Computer Vision Feature Matching.

A Tutorial on using SIFT Presented by Jimmy Huff (Slightly modified by Josiah Yoder for Winter )

Presented by David Lee 3/20/2006

Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.

Instructor: Mircea Nicolescu Lecture 10 CS 485 / 685 Computer Vision.

Recognizing specific objects Matching with SIFT Original suggestion Lowe, 1999,2004.

Distinctive Image Features from Scale-Invariant Keypoints Presenter :JIA-HONG,DONG Advisor : Yen- Ting, Chen 1 David G. Lowe International Journal of Computer.

WLD: A Robust Local Image Descriptor Jie Chen, Shiguang Shan, Chu He, Guoying Zhao, Matti Pietikäinen, Xilin Chen, Wen Gao 报告人：蒲薇榄.

Blob detection.

SIFT Scale-Invariant Feature Transform David Lowe

Presented by David Lee 3/20/2006

Distinctive Image Features from Scale-Invariant Keypoints

Scale Invariant Feature Transform (SIFT)

Video Google: Text Retrieval Approach to Object Matching in Videos

Feature description and matching

CAP 5415 Computer Vision Fall 2012 Dr. Mubarak Shah Lecture-5

The SIFT (Scale Invariant Feature Transform) Detector and Descriptor

Video Google: Text Retrieval Approach to Object Matching in Videos

CSE 185 Introduction to Computer Vision

Feature descriptors and matching

Presented by Xu Miao April 20, 2005

Presentation transcript:

Attention Model Based SIFT Keypoints Filtration for Image Retrieval Ke Gao1,2, Shouxun Lin1, Yongdong Zhang1, Sheng Tang1, Huamin Ren1,3 1Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China, 100080 2Graduate University of the Chinese Academy of Sciences, Beijing, China, 100080 3Beijing University of Chinese Medicine Seventh IEEE/ACIS International Conference on Computer and Information Science 2008

Outline Introduction Review of Attention Model SIFT Keypoints Filtration using Attention Model SIFT Keypoints Extraction Attention Model based Keypoints Filtration Experiment Evaluation Conclusion

Introduction Local Image Descriptors is applied in object recognition and image retrieval [1], [2] distinctive, robust, and do not require segmentation [1] Mikolajczyk K, Schmid, C, “A Performance Evaluation of Local Descriptors”. IEEE Trans.Pattern Analysis and Machine Intelligence, 2005, 27(10), p1615-1630 [2] V. Ferrari, T. Tuytelaars, and L. Van Gool. “Simultaneous Object Recognition and Segmentation by Image Exploration”, Proc. Eighth European Conf. Computer Vision, 2004, p40-54

Introduction Two considerations to using local image descriptors keypoints should be placed at local peaks in a scale-space search (remain stable over transformations) a description of each keypoint must be distinctive, concise, and invariant over transformations caused by changes in camera pose and lighting

Introduction [3] D. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints”, Int’l J. Computer Vision, vol. 2, no. 60, 2004, p91-110 [4] Abdel-Hakim AE, Farag AA, “CSIFT: A SIFT Descriptor with Color Invariant Characteristics”. Computer Vision and Pattern Recognition, 2006,Vol. 2, p1978-1983 [5] T. Tuytelaars and L. Van Gool, “Matching Widely Separated Views Based on Affine Invariant Regions”, Int’l J. Computer Vision, 2004,Vol. 1, no. 59, p61-85

Introduction Scale Invariant Feature Transform (SIFT) most robust among the other local invariant feature descriptors It combines a scale invariant region detector and a descriptor based on the gradient distribution in the detected regions The descriptor is represented by a 3D histogram of gradient locations and orientations

Introduction

Introduction PCA-SIFT has been developed based on SIFT algorithm [6] It applies Principal Components Analysis (PCA) to the normalized image gradient patch accelerates matching speed by reducing feature dimensions from 128 to 36 for each patch [6] Yan Ke, Rahul Sukthankar, “PCA-SIFT: A More Distinctive Representation for Local Image Descriptors”. In Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, 2004,Vol.2, p506-513

Introduction Shortage On a typical image, it returns a large number of features Especially when the object appears small in the image This paper proposes a novel method to filter the SIFT keypoints based on attention model

Review of Attention Model Attention is at the nexus between cognition and perception Select a subset of the available sensory information before further processing A number of computational attention models were developed, such as the models proposed in [7], [8] [7] J. K. Tsotsos, S. M. Culhane, W.Y.K. Wai, et al, “Modeling visual attention via selective tuning”, Artificial Intelligence, 1995,78: p507-545 [8] Itti L, Gold C, Koch C, “Visual attention and target detection in cluttered natural scenes”. Optical Engineering, 2001,40(9), p1784-1 793

Review of Attention Model saliency-based attention model for scene analysis [8] “saliency region” means the region which has evident contrast with its surrounding

Review of Attention Model

SIFT Keypoints Filtration using Attention Model Content-based image retrieval can be looked as the problem of transforming the image into a set of feature vectors For good retrieval performance, the extracted features should satisfy two criteria Distinctiveness Matching speed

SIFT Keypoints Filtration using Attention Model SIFT descriptors are accurate enough Too many keypoints from each image most of them are “noise points” come from background This paper uses attention model to filter SIFT keypoints

SIFT Keypoints Filtration using Attention Model SIFT Keypoints Extraction Attention Model based Keypoints Filtration

SIFT Keypoints Extraction Four major stages of SIFT： scale-space peak selection keypoints localization orientation assignment keypoint descriptor

SIFT Keypoints Extraction In 1st stage potential interest points are identified by scanning over all possible scales and image locations the only possible scale-space kernel is the Gaussian function the scale space of an image is defined as a function

SIFT Keypoints Extraction To efficiently detect stable keypoint locations in scale space, a series of difference-of-Gaussian (DoG) images are established DoG function provides a close approximation to the scale-normalized Laplacian of Gaussia, the maxima and minima of produce the most stable image features ex. gradient, Hessian, or Harris corner function

SIFT Keypoints Extraction In 2nd stage, candidate keypoints are localized to sub-pixel accuracy and eliminated if found to be unstable The third identifies the dominant orientations for each keypoint based on its local image patch The final stage builds a local image descriptor for each keypoint, based upon the image gradients in its local neighborhood

SIFT Keypoints Extraction The dimension of standard SIFT descriptor for each keypoint is 128 PCA-SIFT reduces the dimension to 36 This work is based on the first three stages, and further uses attention model to filter these keypoints Provides benefits both in retrieval accuracy and matching speed

Attention Model based Keypoints Filtration After the SIFT keypoints extraction, attention model is used to generate saliency map Fuzzy growing [9] is performed to find all of the saliency regions for original image Considering the calculation complexity, the number of saliency regions per image is limited to 3 [9] Ma Y F, Zhang H J, “Contrast-based image attention analysis by using fuzzy growing”. Proceedings of the 11th ACM International Conference on Multimedia. Berkeley, CA, USA: ACM, 2003, p374 – 381

Attention Model based Keypoints Filtration

Attention Model based Keypoints Filtration saliency regions (SR) in saliency map can be in arbitrary shapes use rectangle for simplicity assume that no rectangle will overlap with each other SR is defined as represents center denotes the size of SR

Attention Model based Keypoints Filtration Based on the definition of SR, each SIFT keypoint is attached with a saliency weight is 1 if (x,y) is in the center 0 if (x,y) is NOT subject to any SR

Attention Model based Keypoints Filtration denotes the saliency weight of this saliency region Observe that the importance of a detected region is usually reflected by its region area weight and position weight If a region is too small to provide any useful information, it would not be considered ranked the regions bigger than 5% of image only the top 3 regions will be reserved as SRs

Attention Model based Keypoints Filtration Area weight of the current SR is calculated as the following function: Since people often pay more attention to the region near the image center use normalized Gaussian template to assign the position weight

Attention Model based Keypoints Filtration

Attention Model based Keypoints Filtration The saliency weight of each SIFT keypoint is generated Rank all keypoints in an image with their Only the top N keypoints will be reserved to extract SIFT descriptors N is determined between retrieval accuracy and speed

Experiment Evaluation Data sets consists of three categories The same object with different background or under different viewpoints Video frames extracted from some movies Usual images with different size and content Most of the original photos are downloaded from ALOI (http://staff.science.uva.nl/~aloi/) Caltech (http://vision.caltech.edu/Image_Datasets/Caltech256/)

Experiment Evaluation

Experiment Evaluation Some geometric and photometric transformations have been made to evaluate the algorithm under different conditions According to different objects, the data set is divided into about 50 classes, and each class has more than 20 relevant images 6,000 images and 7,240,000 standard SIFT keypoints

Experiment Evaluation Evaluation Metrics use the method Bag of Words proposed in [10] which vector quantizes the SIFT descriptors into clusters uses k-means represents an image as a bag of “words” Using ‘term frequency’ as standard weighting all of the images are organized as an inverted file [10] J.Sivic, A.Zisserman, “Video Google: A Text Retrieval Approach to Object Matching in Videos”. Proceedings of the International Conference on Computer Vision, 2003, p1470-1477

Experiment Evaluation Evaluation Metrics Image matching is based on cosine between these quantized vectors This method can ensure in-time retrieval, and proven to be very useful If the cosines distance between image vectors larger than the chosen threshold, this pair of images is called a match all of the images will be ranked with the matching degree

Experiment Evaluation Evaluation Metrics To describe the image ranking sequence of image retrieval in this data set adopt average retrieval precision is the query image, denotes each image of ranking result, and n is 20

Experiment Evaluation Evaluation Metrics

Experiment Evaluation Experimental Results and Discussion Comparing attention model based SIFT keypoints filtration algorithm (AF-SIFT) to the standard SIFT and PCA-SIFT The dimension of standard SIFT is 128 A 4*4 array of histograms, each with 8 orientation bins PCA-SIFT descriptor dimension for each keypoint is 36

Experiment Evaluation Experimental Results and Discussion Using two methods to compare its performance AF-SIFT1 uses 128-dimension descriptors in the standard way AF-SIFT2 uses a 2*2 array with 8 orientation bins, and its dimension is 32

Experiment Evaluation Experimental Results and Discussion It’s a bit time-consuming for filtering algorithms, but the processing is completed off-line effectively reduce the background features, so it in fact decreases the whole calculation time

Experiment Evaluation Experimental Results and Discussion ALOI has few background confusion varying illumination or view point Movie frames have little obvious difference between foreground and background Coral Gallery are nature photos with confusion background

Experiment Evaluation Experimental Results and Discussion

Experiment Evaluation Experimental Results and Discussion filtrated SIFT keypoints provides information of saliency regions the ranking of keypionts is based on the global distribution, not only relies on local patches the most distinctive keypoints are reserved avoid the infection of background features made the cluster result become more exact

Experiment Evaluation Experimental Results and Discussion Fig. 9 shows how the matching reliability varies as a function of N N denotes the number of SIFT keypoints left behind the filtration A good tradeoff between accuracy and speed should be achieved

Conclusion AF-SIFT provides an effective alternative of standard SIFT Region-based image retrieval Seeking for ways to apply this idea to large image database retrieval