Content-Based Image Retrieval

Slides:

Advertisements

Similar presentations

Patient information extraction in digitized X-ray imagery Hsien-Huang P. Wu Department of Electrical Engineering, National Yunlin University of Science.

Advertisements

Distinctive Image Features from Scale-Invariant Keypoints David Lowe.

Presented by Xinyu Chang

CS Spring 2009 CS 414 – Multimedia Systems Design Lecture 4 – Digital Image Representation Klara Nahrstedt Spring 2009.

Computer vision: models, learning and inference Chapter 13 Image preprocessing and feature extraction.

Large-Scale Image Retrieval From Your Sketches Daniel Brooks 1,Loren Lin 2,Yijuan Lu 1 1 Department of Computer Science, Texas State University, TX, USA.

MIT CSAIL Vision interfaces Approximate Correspondences in High Dimensions Kristen Grauman* Trevor Darrell MIT CSAIL (*) UT Austin…

Chapter 8 Content-Based Image Retrieval. Query By Keyword: Some textual attributes (keywords) should be maintained for each image. The image can be indexed.

CS4670 / 5670: Computer Vision Bag-of-words models Noah Snavely Object

1 Content-Based Retrieval (CBR) -in multimedia systems Presented by: Chao Cai Date: March 28, 2006 C SC 561.

Group 3 Akash Agrawal and Atanu Roy 1 Raster Database.

Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.

Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.

Information Retrieval in Practice

Robust and large-scale alignment Image from

Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.

1 Content Based Image Retrieval Using MPEG-7 Dominant Color Descriptor Student: Mr. Ka-Man Wong Supervisor: Dr. Lai-Man Po MPhil Examination Department.

1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.

Image Search Presented by: Samantha Mahindrakar Diti Gandhi.

A Study of Approaches for Object Recognition

RETRIEVAL OF MULTIMEDIA OBJECTS USING COLOR SEGMENTATION AND DIMENSION REDUCTION OF FEATURES Mingming Lu, Qiyu Zhang, Wei-Hung Cheng, Cheng-Chang Lu Department.

Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.

CS335 Principles of Multimedia Systems Content Based Media Retrieval Hao Jiang Computer Science Department Boston College Dec. 4, 2007.

Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.

Content-Based Image Retrieval (CBIR) Student: Mihaela David Professor: Michael Eckmann Most of the database images in this presentation are from the Annotated.

Visual Querying By Color Perceptive Regions Alberto del Bimbo, M. Mugnaini, P. Pala, and F. Turco University of Florence, Italy Pattern Recognition, 1998.

Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.

Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.

Overview of Search Engines

Object Recognition and Augmented Reality

Information Retrieval in Practice

Indexing Techniques Mei-Chen Yeh.

Bag-of-Words based Image Classification Joost van de Weijer.

Computer vision.

MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.

The MPEG-7 Color Descriptors

Image and Video Retrieval INST 734 Doug Oard Module 13.

Multimedia Information Retrieval

Recognition and Matching based on local invariant features Cordelia Schmid INRIA, Grenoble David Lowe Univ. of British Columbia.

Watch, Listen and Learn Sonal Gupta, Joohyun Kim, Kristen Grauman and Raymond Mooney -Pratiksha Shah.

COLOR HISTOGRAM AND DISCRETE COSINE TRANSFORM FOR COLOR IMAGE RETRIEVAL Presented by 2006/8.

1 Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval Ondrej Chum, James Philbin, Josef Sivic, Michael Isard and.

Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.

Advanced Multimedia Image Content Analysis Tamara Berg.

Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.

2005/12/021 Content-Based Image Retrieval Using Grey Relational Analysis Dept. of Computer Engineering Tatung University Presenter: Tienwei Tsai ( 蔡殿偉.

2005/12/021 Fast Image Retrieval Using Low Frequency DCT Coefficients Dept. of Computer Engineering Tatung University Presenter: Yo-Ping Huang ( 黃有評 )

Kylie Gorman WEEK 1-2 REVIEW. CONVERTING AN IMAGE FROM RGB TO HSV AND DISPLAY CHANNELS.

Advanced Multimedia Image Content Analysis Tamara Berg.

Lecture 08 27/12/2011 Shai Avidan הבהרה: החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.

CSE 185 Introduction to Computer Vision Feature Matching.

A Tutorial on using SIFT Presented by Jimmy Huff (Slightly modified by Josiah Yoder for Winter )

CS654: Digital Image Analysis

Scale Invariant Feature Transform (SIFT)

Computer Vision Group Department of Computer Science University of Illinois at Urbana-Champaign.

Image features and properties. Image content representation The simplest representation of an image pattern is to list image pixels, one after the other.

Recognizing specific objects Matching with SIFT Original suggestion Lowe, 1999,2004.

Information Retrieval in Practice

Visual homing using PCA-SIFT

SIFT Scale-Invariant Feature Transform David Lowe

CS262: Computer Vision Lect 09: SIFT Descriptors

Feature description and matching

Multimedia Information Retrieval

From a presentation by Jimmy Huff Modified by Josiah Yoder

Brief Review of Recognition + Context

REU Week 1 Ivette Carreras UCF.

Multimedia Information Retrieval

CSE 185 Introduction to Computer Vision

Feature descriptors and matching

Recognition and Matching based on local invariant features

Presentation transcript:

Content-Based Image Retrieval Rong Jin

Content-based Image Retrieval Retrieval by text Label database images by text tags Image retrieval as text retrieval Find images for textual queries using standard text search engines

Example: Flickr.com Con: require manually labeling

Image Labeling by Human Computing ESP game http://www.gwap.com/gwap/gamesPreview/espgame Collect annotations for web images via a game

Content-based Image Retrieval Retrieval based on visual content Represent images by their visual contents Each query is an image Search for images that have similar visual content as the query image

Content-based Image Retrieval Given a query image, try to find visually similar images from an image database Image Database Query Answer

Example: www.like.com

CBIR Challenges: How to represent visual content of images What are “visual contents” ? Colors, shapes, textures, objects, or meta-data (e.g., tags) derived from images Which type of “visual content” should be used for representing image ? Difficult to understand the information needs of an user from a query image How to retrieve images efficiently Should avoid linear scan of the entire database

Image Representation Similar color distribution Histogram matching Similar texture pattern Texture analysis Image Segmentation, Pattern recognition Similar shape/pattern Degree of difficulty Similar real content Life-time goal :-)

Vector based Image Representation Represent an image by a vector of fixed number of elements Color histogram: discretize color space; count pixels for each discretized color bin Texture: Gabor filters  texture features …

Vector based Image Representation 0.5 0.1 0.4 V2 R G B 0.3 0.5 0.2 Vq 0.4 0.5 0.1 V1 |V1 – Vq| < |V2 – Vq| >

Images with Similar Colors

Images with Similar Shapes

Images with Similar Content

Challenges in CBIR That’s what it’s like to be a CBIR system! You get drunk, REALLY drunk Hit over the head Kidnapped to another city in a country on the other side of the world When you wake up, You try to figure out what city are you in, and what is going on That’s what it’s like to be a CBIR system!

Near Duplicate Image Retrieval Given a query image, identify gallery images with high visual similarity.

Appearance based Image Matching Parts-based image representation Parts (appearance) + shape (spatial relation) Parts: local features by interesting point operator Shape: graphical models or neighborhood relationship

Interesting Point Detection Local features have been shown to be effective for representing images They are image patterns which differ from their immediate neighborhood. They could be points, edges, small patches. We call local features key points or interesting points of an image

Interesting Point Detection An image example with key points detected by a corner detector.

Interesting Point Detection The detection of interesting point needs to be robust to various geometric transformations Original Scaling+Rotation+Translation Projection

Interesting Point Detection The detection of interesting point needs to be robust to imaging conditions, e.g. lighting, blurring.

Descriptor Representing each detected key point Take measurements from a region centered on a interesting point E.g., texture, shape, … Each descriptor is a vector with fixed length E.g. SIFT descriptor is a vector of 128 dimension

They should have similar descriptors The descriptor should also be robust under different image transformation. They should have similar descriptors

Image Representation Bag-of-features representation: an example Each descriptor is 5 dimension 22 19 23 1 66 103 45 6 38 232 44 11 48 29 55 129 78 110 32 220 30 34 21 Original image Detected key points Descriptors of the key points

How to measure similarity? Retrieval 22 19 23 1 66 103 45 6 38 232 44 11 48 29 55 129 ... How to measure similarity?

Count number of matches ! Retrieval 22 19 23 1 66 103 45 6 38 232 44 11 48 29 55 129 ... Count number of matches !

Retrieval If the distance between two vectors is smaller than the threshold, we get one match

Retrieval Matched points: 1 Matched points: 5

Problems Computationally expensive Requiring linear scan of the entire data base Example: match a query image to a database of 1 million images 0.1 second for computing the match between two images Take more than one day to answer a single query

Bag-of-words Model Compare to the bag-of-words representation in text retrieval An image A document What is the difference A collection of the words in the document A collection of the key points of the image

Bag-of-words An image A document What is the difference A collection of the words in the document A collection of the key points of the image The same word appears in many documents No “same key point”, but “similar key point” appears in many images which have similar “visual content” Group “similar key point” in different images in to “visual words”

Bag-of-words Model … b1 b2 b3 b1 b2 b3 b4 b5 b6 b7 b8 b4 Represent images by histograms of visual words Group key points into visual words

Bag-of-words The “grouping” is usually done by clustering. Clustering the key points of all images into a number of cluster centers (e.g 100,000 clusters). Each cluster center is called a “visual word” The collection of all cluster centers is called “ visual vocabulary”

Retrieval by Bag-of-words Model Generate “visual vocabulary” Represent each key point by its nearest “visual word” Represent an image by “a bag of visual words” Text retrieval technique can be applied directly.

Project Build a system for near duplicate image retrieval A database with 10,000 images Construct bag-of-words models for each image (offline) Construct a bag-of-words model for a query image Retrieve first 10 visually most “similar” images from the database for the given query

Step 1: Dataset 10,000 color images under the folder ‘./img’ The key points of each image have already been extracted Key points of all images are saved in a single file ‘./feature/esp.feature’ Each line corresponds to a key point with 128 attributes Attributes in each line are separated by tabs

Step 1: Dataset To locate key points for individual images, two other files are needed: ‘./imglist.txt’: the order of images when saving their keypoints ‘./feature/esp.size’: the number of key points an image have.

Step 1: Dataset Example: Three images imgA, imgB, imgC. imgA : 2 key points; imgB: 3 key points; imgC: 2 key points. imglist.txt esp.size esp.feature imgB.jpg imgC.jpg imgA.jpg 3 2 imgB-key point 1 imgB-key point 2 imgB-key point 3 imgC-key point 1 imgC-key point 2 imgA-key point 1 imgA-key point 2

Step 2: Key Point Quantization Represent each image by a bag of visual words: Construct the visual vocabulary Clustering all the key points into 10,000 clusters Each cluster center is a visual word Map each key point to a visual word Find the nearest cluster center for each key point (nearest neighbor search)

Step 2: Key Point Quantization Clustering 7 key points into 3 clusters The cluster centers are: cnt1, cnt2, cnt3 Each center is a visual word: w1, w2, w3 Find the nearest center to each key point imglist.txt esp.size esp.feature imgB.jpg imgC.jpg imgA.jpg 3 2 imgB-key point 1 imgB-key point 2 imgB-key point 3 imgC-key point 1 imgC-key point 2 imgA-key point 1 imgA-key point 2

Step 2: Key Point Quantization imgA.jpg 1st key point  w2 2nd key point  w1 imgB.jpg 1st key point  w3 2nd key point  w3 3rd key point  w2 imgC.jpg 2nd key point  w2 Bag-of-words Rep. imgA.jpg: w2 w1 imgB.jpg: w3 w3 w2 imgC.jpg: w3 w2

Step 2: Key Point Quantization We provide FLANN library for clustering and nearest neighbor search. For clustering, use flann_compute_cluster_centers( float* dataset, // your key points int rows, // number of key points int cols, // 128, dim of a key point int clusters, // number of clusters float* result, // cluster centers struct IndexParameters* index_params, struct FLANN

Step 2: Key Point Quantization For nearest neighbor search Build index for the cluster centers flann_build_index( float* dataset, // your cluster centers int rows, int cols, float* speedup, struct IndexParameters* index_params, struct FLANNParameters* flann_params); For each key point, search nearest cluster center flann_find_nearest_neighbors_index( FLANN_INDEX index_id, // your index above float* testset, // your key points int trows, int* result, int nn, int checks, struct FLANNParameters* flann_params);

Step 2: Key Point Quantization In this step, you need to save: the cluster centers to a file. You will use this later on for quantizing key points of query images bag-of-words representation of each image in “trec” format. Bag-of-words Rep. imgA.jpg: w2 w1 imgB.jpg: w3 w3 w2 imgC.jpg: w3 w2 <DOC> <DOCNO>imgB</DOCNO> <TEXT> w3 w3 w2 </TEXT> </DOC> <DOC> <DOCNO>imgA</DOCNO> <TEXT> w2 w1 </TEXT> </DOC> <DOC> <DOCNO>imgC</DOCNO> <TEXT> w3 w2 </TEXT> </DOC>

Step 3: Build index using Lemur The same as what we did in the previous home work Use “KeyfileIncIndex” index No stemming No stop words

Step 4: Extract key points for a query Three sample query images under ‘./sample query/’ The query images are in the format of .pgm Extracting tool is under ‘./sift tool/’ For windows, use “siftW32.exe” For Linux, use “sift” Example: issue command Sift < input.pgm > output.keypoints

Step 5: Generate a bag-of-words model for a query Map each key point of a given query to a visual word. Use the cluster center file generated in step 2 Build index for the cluster centers using flann_build_index() For each key point, search nearest cluster center using flann_find_nearest_neighbors_index()

Step 5: Generate a bag-of-words model for a query Write the bag-of-words model for a query image in the Lemur format. <DOC 1> The mapped cluster ID for the 1st key point The mapped cluster ID for the 2nd key point … </DOC>

Step 6: Image Retrieval by Lemur Use the Lemur command ‘RetEval’as: RetEval <parameter_file> An example of parameter file <parameters> <index>/home/user1/myindex/myindex.key</index> <retModel>tfidf</retModel> <textQuery>/home/user1/query/q1.query</textQuery> <resultFile>/home/user1/result/ret.result</resultFile> <TRECResultFormat>1</TRECResultFormat> <resultCount>10</resultCount> </parameters>

Step 7: Graphical User Interface Build a GUI for the image retrieval system Browse the image database Select an image from the database to query the database and display the top 10 retrieved results Extract the bag-of-words representation of the query Write it into the file with the format specified in step7 Run the “RetEval” command for retrieval Load in the external query image, search the images in the database and display the top 10 retrieved results

Step 8: Evaluation Demo your system in the classes of the last week. We will provide a number of test query images Run your GUI, load in each test query image and display the first ten most similar images from the database