Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.

Slides:

Advertisements

Similar presentations

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Advertisements

Ali Husseinzadeh Kashan Spring 2010

ECG Signal processing (2)

CS6800 Advanced Theory of Computation

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

1 Transportation problem The transportation problem seeks the determination of a minimum cost transportation plan for a single commodity from a number.

Machine learning continued Image source:

On the Genetic Evolution of a Perfect Tic-Tac-Toe Strategy

Learning similarity measure for natural image retrieval with relevance feedback Reporter: Francis 2005/3/3.

Non-Linear Problems General approach. Non-linear Optimization Many objective functions, tend to be non-linear. Design problems for which the objective.

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Morris LeBlanc.  Why Image Retrieval is Hard?  Problems with Image Retrieval  Support Vector Machines  Active Learning  Image Processing ◦ Texture.

1 Lecture 8: Genetic Algorithms Contents : Miming nature The steps of the algorithm –Coosing parents –Reproduction –Mutation Deeper in GA –Stochastic Universal.

Expectation Maximization Method Effective Image Retrieval Based on Hidden Concept Discovery in Image Database By Sanket Korgaonkar Masters Computer Science.

Basic Data Mining Techniques

Image Categorization by Learning and Reasoning with Regions Yixin Chen, University of New Orleans James Z. Wang, The Pennsylvania State University Published.

Intro to AI Genetic Algorithm Ruth Bergman Fall 2002.

Chapter 14 Genetic Algorithms.

Presentation in IJCNN 2004 Biased Support Vector Machine for Relevance Feedback in Image Retrieval Hoi, Chu-Hong Steven Department of Computer Science.

Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.

Intro to AI Genetic Algorithm Ruth Bergman Fall 2004.

Chapter 6: Transform and Conquer Genetic Algorithms The Design and Analysis of Algorithms.

Brandon Andrews.  What are genetic algorithms?  3 steps  Applications to Bioinformatics.

Genetic Algorithm.

Evolutionary Intelligence

Active Learning for Class Imbalance Problem

Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.

SOFT COMPUTING (Optimization Techniques using GA) Dr. N.Uma Maheswari Professor/CSE PSNA CET.

1 Local search and optimization Local search= use single current state and move to neighboring states. Advantages: –Use very little memory –Find often.

Intro. ANN & Fuzzy Systems Lecture 36 GENETIC ALGORITHM (1)

Zorica Stanimirović Faculty of Mathematics, University of Belgrade

Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.

An Introduction to Genetic Algorithms Lecture 2 November, 2010 Ivan Garibay

Computational Complexity Jang, HaYoung BioIntelligence Lab.

Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.

Combining Audio Content and Social Context for Semantic Music Discovery José Carlos Delgado Ramos Universidad Católica San Pablo.

2005MEE Software Engineering Lecture 11 – Optimisation Techniques.

BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.

Non-Photorealistic Rendering and Content- Based Image Retrieval Yuan-Hao Lai Pacific Graphics (2003)

Genetic Algorithms What is a GA Terms and definitions Basic algorithm.

EE749 I ntroduction to Artificial I ntelligence Genetic Algorithms The Simple GA.

1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin.

1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

Improving Support Vector Machine through Parameter Optimized Rujiang Bai, Junhua Liao Shandong University of Technology Library Zibo , China { brj,

D Nagesh Kumar, IIScOptimization Methods: M8L5 1 Advanced Topics in Optimization Evolutionary Algorithms for Optimization and Search.

A Genetic Algorithm-Based Approach to Content-Based Image Retrieval Bo-Yen Wang( 王博彥 )

6.S093 Visual Recognition through Machine Learning Competition Image by kirkh.deviantart.com Joseph Lim and Aditya Khosla Acknowledgment: Many slides from.

An Introduction to Genetic Algorithms Lecture 2 November, 2010 Ivan Garibay

Jianping Fan Department of Computer Science University of North Carolina at Charlotte Charlotte, NC Relevance Feedback for Image Retrieval.

Genetic Algorithms. Underlying Concept  Charles Darwin outlined the principle of natural selection.  Natural Selection is the process by which evolution.

Genetic Algorithm Dr. Md. Al-amin Bhuiyan Professor, Dept. of CSE Jahangirnagar University.

Next, this study employed SVM to classify the emotion label for each EEG segment. The basic idea is to project input data onto a higher dimensional feature.

Artificial Intelligence By Mr. Ejaz CIIT Sahiwal Evolutionary Computation.

1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.

On the Ability of Graph Coloring Heuristics to Find Substructures in Social Networks David Chalupa By, Tejaswini Nallagatla.

Chapter 14 Genetic Algorithms.

Genetic Algorithms.

Comparing Genetic Algorithm and Guided Local Search Methods

K-means and Hierarchical Clustering

CS621: Artificial Intelligence

Content-Based Image Retrieval

Content-Based Image Retrieval

EE368 Soft Computing Genetic Algorithms.

Boltzmann Machine (BM) (§6.4)

Support Vector Machine _ 2 (SVM)

Searching for solutions: Genetic Algorithms

Calibration and homographies

SVMs for Document Ranking

Presentation transcript:

Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department of Computer and Information Sciences, University of Alabama at Birmingham CBIR Procedure Preprocessing: Images are segmented into a set of image instances. Features of each instance are extracted. Clustering: Image instances are clustered to reduce the search space of the learning algorithm. User Query: User provide a query image. SVM Learning: One-class SVM is the similarity calculation algorithm, which learns user query and feedback and return results to the user. Return Results: In each iteration, user give feedback to the returned results. SVM refines the retrieval results accordingly in the next iteration. Content Based Image Retrieval System OverviewImage Segmentation and Feature Extraction Clustering with Genetic Algorithm: Algorithm Design One-Class Support Vector Machine (SVM) Experimental Results Segmentation We use Blobworld[1] automatic image segmentation method to partition all the images in image database into several segments. Each segment represent an individual semantic region of the original image. E.g. grass, lake, deer, tree etc. Feature Extraction Then we extract features for each image segment. In our experiment, 8 features are used – 3 color features, 3 texture features and 2 shape features. Clustering with Genetic Algorithm: Overview Motivation The amount of image data involved is very large. We use clustering to preprocess the data and reduce the search space in image retrieval. For this reason, we choose Genetic Algorithm in the hope that the clustering result can reach global optimum. The objects in clustering are image segments. Therefore, if a certain image segment belongs to a cluster, the image that contains this object also belongs to this cluster. In this way, we can reduce the amount of image searched in the final retrieval process. The possible solutions to a real world problem are evaluated. Each solution is given a “score”. Then they are encoded as chromosomes and fed into Genetic World. These chromosomes are parents or the t th generation. Genetic World operators (e.g. Selection, Crossover and Mutation) manipulate these chromosomes and generate the offspring i.e. the (t+1) th generation. When doing this, a mechanism is implemented to make sure that the higher the score the higher the chance a certain chromosome is inherited. Offspring are decoded into real world solutions. Evaluation in Real World is performed again upon the generated solutions. The evaluated solutions are encoded and fed back into Genetic World for another round of evaluation. End of iterations until a certain criteria is satisfied. Key Points in Design Encoding Object Function Genetic Operators: Selection, Recombination and Mutation Encoding For example ， to partition 1000 objects into 10 clusters. Give each item an ID: 1 ， 2 ， … ， A feasible solution is ： Each integer in the allele is the ID of a centroid. Object Function Clustering is reduced to find the optimal combination. The goal is to find a set of centroids that make the function below has its minimum ： Where R is the subset of P ， and it has k items. d is Euclidean distance. Selection Operator For each item, compute its fitness (i.e. the inverse value of object function): f 1, f 2,…, f n. Total fitness: Relative fitness of each item is f i /F. It is its probability of being inherited to the next generation. Simulate roulette to choose pairs of chromosomes that will be used to generate the next generation. In the example below, C1 on Roulette stands for Chromosome 1 whose fitness score is 30%. C1 C 4 C3 C2 30% 40% 20% 10% Recombination Operator Different from crossover operator in simple genetic algorithm, this is to guarantee there’s no repeated genes in one chromosome, i.e. centroids in one solution are all distinct. C1 and C2 Parents Randomly Generate a Small Number <cut Y N Nul l N Y Y N Experiment shows cut = 2/3. Adjust according to specific problem. Iterate K times offspring   Mutation Change a gene to an item that is not included in this chromosome at a very low frequency. Randomly choose mutation point Wrap-up At the end of clustering we have k clusters. Given a query image region, all the other image regions in this cluster can be located. In some cases, the query region is also very close to image regions in another cluster. We choose three clusters whose centroids are the closest to the query region. We then group all the images that have at least one semantic region fall into the three clusters and take it as the reduced search space. Multiple Instance Learning Bag: Each image is a bag that has a certain number of instances. MIL: A training sample is a labeled bag in which the labels of the instances in the bag are unknown. In each iteration, user provides feedback to the returned images. If a image is relevant, it is labeled positive, otherwise it is labeled negative. In this way, bag (image as a whole) label is known while instance (image segments) labels are unknown. But we know the relationship between these two kinds of labels. If the bag label is positive, there exist at least one positive instance label in the bag. If the bag label is negative, all the instances are negative. This can be mathematically depicted below. One-Class Support Vector Machine Motivation: Good image regions are good in the same way. But bad image regions are bad in their own way. Basic Idea: The idea is to model the dense region as a “ball” – a hyper-sphere. In MIL problem, positive instances are inside the “ball” and negative instances are outside. Learning Procedure Apply the strategy of Schölkopf’s One-Class SVM[2]. First do a mapping to transform the data into a feature space F corresponding to the kernel K: where u and v are two data points. We choose to use Radial Basis Function (RBF) Machine: Mathematically, One-Class SVM solves the following quadratic problem: subject to where is the slack variable, and α  (0,1) is a parameter that controls the trade off between maximizing the distance from the origin and containing most of the data in the region created by the hyper-sphere and corresponds to the ratio of “outliers” in the training dataset. When it is applied to the MIL problem, It is also subject the MIL Equation. If w and  are a solution to this problem, then the decision function is It will be 1 for most examples xi contained in the training set. ρ)θ(x)sign(wf(x)  Given a query image, in initial query, the user needs to identify a semantic region of interest. Since no training sample is available at this point, we simply compute the Euclidean distances between the query semantic region and all the other semantic regions in the image database. The similarity score for each image is then set to the inverse of the minimum distance between its regions and the query region. The training sample set is then constructed according to user’s feedback. If an image is labeled positive, its semantic region that is the least distant from the query region is labeled positive. In this way, most of the positive regions can be identified. For some images, Blob-world may “over-segment” such that one semantic region is segmented into two or more “blobs”. In addition, some images may actually contain more than one positive region. Suppose the number of positive images is h and the number of all semantic regions in the training set is H. Then the ratio of “outliers” in the training set is set to: z is a small number used to adjust the α in order to alleviate the above mentioned problem. The training set as well as the parameter α are fed into One-Class SVM to obtain and , which are used to calculate the value of the decision function for the test data. Each image region will be assigned a “score” by in the decision function. The higher the score, the more likely this region is in the positive class. The similarity score of each image is then set to the highest score of all its regions. 9,800 images with 82,556 instances are used as test data. We tested on 65 query images. The search space is reduced to 28.6% on average. In initial query, the system gets the feature vector of the query region and compares it with all the other image regions in the database using Euclidean distance. The top thirty images are returned to the user. After the initial query, user gives feedback to the retrieved images and these feedbacks are returned to the system. Our One-Class SVM based algorithm learns from these feedbacks and starts another round of retrieval. Retrieved Segments Third Query Result Through each iteration, the number of positive images increases steadily.