1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman 23-1-2007.

Slides:



Advertisements
Similar presentations
Google News Personalization: Scalable Online Collaborative Filtering
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Memory.
Relevance Feedback and User Interaction for CBIR Hai Le Supervisor: Dr. Sid Ray.
VisualRank: Applying PageRank to Large-Scale Image Search Yushi Jing, Member, IEEE, and Shumeet Baluja, Member, IEEE.
Similarity Search in High Dimensions via Hashing
Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.
A Novel Scheme for Video Similarity Detection Chu-Hong Hoi, Steven March 5, 2003.
Relevance Feedback Content-Based Image Retrieval Using Query Distribution Estimation Based on Maximum Entropy Principle Irwin King and Zhong Jin Nov
Multimedia Indexing and Retrieval Kowshik Shashank Project Advisor: Dr. C.V. Jawahar.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Coherency Sensitive Hashing (CSH) Simon Korman and Shai Avidan Dept. of Electrical Engineering Tel Aviv University ICCV2011 | 13th International Conference.
Young Deok Chun, Nam Chul Kim, Member, IEEE, and Ick Hoon Jang, Member, IEEE IEEE TRANSACTIONS ON MULTIMEDIA,OCTOBER 2008.
Fast and Compact Retrieval Methods in Computer Vision Part II A. Torralba, R. Fergus and Y. Weiss. Small Codes and Large Image Databases for Recognition.
Image Search Presented by: Samantha Mahindrakar Diti Gandhi.
Improving Lookup Performance over a Widely-Deployed DHT Daniel Stutzbach Reza Rejaie The ION P2P Project University of.
Oral Examination Presented by Wyman Wong
ACM Multimedia th Annual Conference, October , 2004
Expectation Maximization Method Effective Image Retrieval Based on Hidden Concept Discovery in Image Database By Sanket Korgaonkar Masters Computer Science.
Scalable and Distributed Similarity Search in Metric Spaces Michal Batko Claudio Gennaro Pavel Zezula.
1 Integrating User Feedback Log into Relevance Feedback by Coupled SVM for Content-Based Image Retrieval 9-April, 2005 Steven C. H. Hoi *, Michael R. Lyu.
1998/5/21by Chang I-Ning1 ImageRover: A Content-Based Image Browser for the World Wide Web Introduction Approach Image Collection Subsystem Image Query.
Visual Querying By Color Perceptive Regions Alberto del Bimbo, M. Mugnaini, P. Pala, and F. Turco University of Florence, Italy Pattern Recognition, 1998.
Scaling Personalized Web Search Glen Jeh, Jennfier Widom Stanford University Presented by Li-Tal Mashiach Search Engine Technology course (236620) Technion.
Similarity Search in High Dimensions via Hashing Aristides Gionis, Protr Indyk and Rajeev Motwani Department of Computer Science Stanford University presented.
Presentation in IJCNN 2004 Biased Support Vector Machine for Relevance Feedback in Image Retrieval Hoi, Chu-Hong Steven Department of Computer Science.
Presented by Zeehasham Rasheed
Nearest Neighbor Retrieval Using Distance-Based Hashing Michalis Potamias and Panagiotis Papapetrou supervised by Prof George Kollios A method is proposed.
Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.
Optimizing Learning with SVM Constraint for Content-based Image Retrieval* Steven C.H. Hoi 1th March, 2004 *Note: The copyright of the presentation material.
FLANN Fast Library for Approximate Nearest Neighbors
Indexing Techniques Mei-Chen Yeh.
Content-Based Image Retrieval
Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,
An Efficient Approach to Clustering in Large Multimedia Databases with Noise Alexander Hinneburg and Daniel A. Keim.
Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.
Nearest Neighbor Paul Hsiung March 16, Quick Review of NN Set of points P Query point q Distance metric d Find p in P such that d(p,q) < d(p’,q)
HPCLatAm 2013 HPCLatAm 2013 Permutation Index and GPU to Solve efficiently Many Queries AUTORES  Mariela Lopresti  Natalia Miranda  Fabiana Piccoli.
Online Learning for Collaborative Filtering
IEEE Int'l Symposium on Signal Processing and its Applications 1 An Unsupervised Learning Approach to Content-Based Image Retrieval Yixin Chen & James.
The Anatomy of a Large-Scale Hyper textual Web Search Engine S. Brin, L. Page Presenter :- Abhishek Taneja.
NEAREST NEIGHBORS ALGORITHM Lecturer: Yishay Mansour Presentation: Adi Haviv and Guy Lev 1.
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
Non-Photorealistic Rendering and Content- Based Image Retrieval Yuan-Hao Lai Pacific Graphics (2003)
CS 347Notes101 CS 347 Parallel and Distributed Data Processing Distributed Information Retrieval Hector Garcia-Molina Zoltan Gyongyi.
Similarity Searching in High Dimensions via Hashing Paper by: Aristides Gionis, Poitr Indyk, Rajeev Motwani.
2005/12/021 Content-Based Image Retrieval Using Grey Relational Analysis Dept. of Computer Engineering Tatung University Presenter: Tienwei Tsai ( 蔡殿偉.
Competence Centre on Information Extraction and Image Understanding for Earth Observation 29th March 2007 Category - based Semantic Search Engine 1 Mihai.
Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.
2005/12/021 Fast Image Retrieval Using Low Frequency DCT Coefficients Dept. of Computer Engineering Tatung University Presenter: Yo-Ping Huang ( 黃有評 )
1 A Compact Feature Representation and Image Indexing in Content- Based Image Retrieval A presentation by Gita Das PhD Candidate 29 Nov 2005 Supervisor:
P ROBING THE L OCAL -F EATURE S PACE OF I NTEREST P OINTS Wei-Ting Lee, Hwann-Tzong Chen Department of Computer Science National Tsing Hua University,
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
Chittampally Vasanth Raja 10IT05F vasanthexperiments.wordpress.com.
Chittampally Vasanth Raja vasanthexperiments.wordpress.com.
Content Based Color Image Retrieval vi Wavelet Transformations Information Retrieval Class Presentation May 2, 2012 Author: Mrs. Y.M. Latha Presenter:
Ohio State University Department of Computer Science and Engineering Servicing Range Queries on Multidimensional Datasets with Partial Replicas Li Weng,
Relevance Feedback in Image Retrieval System: A Survey Tao Huang Lin Luo Chengcui Zhang.
Similarity Measurement and Detection of Video Sequences Chu-Hong HOI Supervisor: Prof. Michael R. LYU Marker: Prof. Yiu Sang MOON 25 April, 2003 Dept.
Content-Based Image Retrieval Using Color Space Transformation and Wavelet Transform Presented by Tienwei Tsai Department of Information Management Chihlee.
Author : Lynn Choi, Hyogon Kim, Sunil Kim, Moon Hae Kim Publisher/Conf : IEEE/ACM TRANSACTIONS ON NETWORKING Speaker : De yu Chen Data :
INFORMATION RETRIEVAL MEASUREMENT OF RELEVANCE EFFECTIVENESS 1Adrienn Skrop.
1 Double-Patterning Aware DSA Template Guided Cut Redistribution for Advanced 1-D Gridded Designs Zhi-Wen Lin and Yao-Wen Chang National Taiwan University.
Cross-modal Hashing Through Ranking Subspace Learning
Efficient Image Classification on Vertically Decomposed Data
The Anatomy of a Large-Scale Hypertextual Web Search Engine
Invariant Local Feature for Image Matching
Efficient Image Classification on Vertically Decomposed Data
Locality Sensitive Hashing
Minwise Hashing and Efficient Search
Presentation transcript:

1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman

2 Introduction Content-based Image Retrieval (CBIR) –The process of searching for digital images in large databases based on image contents –It consists of four modules in general: data acquisition and processing feature representation data indexing query and feedback processing

3 Introduction Content-based Image Retrieval (CBIR) –Extensively studied in both academia and industry –Yet, traditional data indexing techniques are not scalable to high dimensional data –Image contents in high dimensional feature space –Thus, traditional CBIR systems are not efficient

4 Contribution Propose a scalable CBIR scheme by applying locality- sensitive hashing (LSH) –LSH is good at indexing high dimensional data Conduct a comprehensive empirical evaluation of CBIR over a half million images –There is very limited empirical study on that large CBIR systems Address some challenges for building scalable CBIR systems on large-scale data –some innovative ideas are suggested for tackling these issues

5 Hashing Algorithm Hashing algorithm is commonly employed for indexing large-scale database –Due to its fast database lookup capability –O(1) on average However, it is never used for similarity indexing –It builds index for searching identical copy of the query key, but not for searching near neighbors

6 Locality-Sensitive Hashing An emerging new indexing algorithm –Proposed to solve high-dimensional near neighbor searching problem in Euclidean space l 2 –It can answer queries in sublinear time –Each near neighbor being reported with a fixed probability Principles –Two close points most likely share the same hash value –By looking into the hash bucket of the query point, we obtain many near neigbhors of the query points –Large fraction of data points are not processed

7 Locality-Sensitive Hashing we employ the E 2 LSH (Exact Euclidean LSH) package The probability of finding near neighbors can be controlled by two parameters L and k –L: number of hash functions g(v) = (h 1 (v), … h k (v)) L: larger L increases the probability of finding all R-near neighbors k: larger k reduces the chance of hitting data points that are not R-near neighbors h(v) controlled by two parameters a and b –a: a d dimensional vector with entries chosen independently from a Gaussian distribution –b: a real number chosen uniformly from the range [0, w] Same h if this is within [ (n-1)w, (n)w )

8 Main problem of E 2 LSH One main problem is that E 2 LSH is a memory based implementation –All the data points and the data structures are stored in the main memory –Maximum database size limited by the amount of free main memory available

9 Our Scalable Implementation We propose a multi-partition indexing approach –We divide the whole database into multiple partitions –Each of them is small enough to be loaded into the main memory –Then we can process queries on each partition respectively

10 The Detailed Procedure 1.Divide the database into n partitions, where the number of partitions n = ceiling of (database size / max partition’s size). 2.Run E 2 LSH indexing on each of the partitions to create the hash tables; 3.Given a query q, load the pre-built hash tables T of one partition into the main memory; 4.Run E 2 LSH query on T to retrieve top k ranking images with respect to q; 5.Repeat (3) and (4) steps until all partitions are processed. 6.Collect the results from all partitions and return top k ranking results with respect to the query q.

11 Some Critical Issues Disk-access overhead for loading the hash tables into the main memory We can consider some parallel solutions to overcome this overhead issue and speedup the overall solution in our future work

12 Feature Representation We represent an image with three types of visual feature: color, shape, and texture For color, we use Grid Color Moment For shape, we employ an edge direction histogram For texture, we adopt Gabor feature In total, a 238-dimensional feature vector is employed to represent each image in our image database

13 Empirical Evaluation Experimental Testbed –Testbed containing 500,000 images crawled from Web –5,000 images from COREL image data set are engaged as the query set Contains 50 categories Each category consists of exactly 100 images that are randomly selected Every category represents a different semantic topic, such as butterfly, car, cat, dog, etc

14 Performance Metrics Two standard performance metrics –Precision and recall –Relevant image if it is one of the 100 COREL images in the database which share the same category with the query images –Evaluate efficiency by average CPU time elapsed on a given query

15 Experimental Setup Form a query set of 1000 image examples by randomly select 20 images from each of the 50 COREL image categories Prepare 10 image databases of different sizes ranging from 50,000 images to 500,000 images A database of size N contains the followings: –5,000 COREL images –N-5000 other images selected from our testbed Extract the image features using the techniques discussed

16 Experimental Setup Perform a series of experiments using our CBIR system with LSH indexing on all the 10 databases –LSH’s parameters: L = 550 and k = 34 –Retrieve top 20 and top 50 ranking images for each query image –Calculate recall and precision Simulate two rounds of relevance feedback –re-querying the database with relevant examples with respect to the given query Environments –3.2GHz Intel Pentium 4 PC with 2GB memory running Linux kernel 2.6 –All implementations are programmed by C++ language The same experiments are repeated with Exhaustive Searching indexing

17 Average Precision of TOP20 The results of LSH is very close to the ES results Their maximal difference is no more than 5% at any database size Average Precision of TOP20

18 Average Recall of TOP50 Average recall and average precision of our CBIR system decrease with the database size, yet the decreasing rate Diminishes when the database size increases Average Recall of TOP50

19 Average Query Time the query time for ES is linear to the database size, while the one for LSH is sublinear

20 Time Performance of LSH over ES on different databases LSH approach is much faster than the ES solution with an average speedup greater than 4 times The gap of time performance between them grows even faster when the database size increases

21 Conclusion and Future Work Proposed a scalable CBIR scheme based on a fast high dimensional indexing technique, LSH Conducted extensive empirical evaluations on a large testbed of a half million images –Our scalable CBIR system is more efficient than traditional exhaustive linear searching methods –Our system is scalable to large-scale CBIR applications Addressed some limitations and challenges in our current solution Consider parallel solutions in the coming future

22 Q & A