Wei Zhang Akshat Surve Xiaoli Fern Thomas Dietterich.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Florida International University COP 4770 Introduction of Weka.
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Random Forest Predrag Radenković 3237/10
Zhimin CaoThe Chinese University of Hong Kong Qi YinITCS, Tsinghua University Xiaoou TangShenzhen Institutes of Advanced Technology Chinese Academy of.
Integrated Instance- and Class- based Generative Modeling for Text Classification Antti PuurulaUniversity of Waikato Sung-Hyon MyaengKAIST 5/12/2013 Australasian.
An Overview of Machine Learning
1 Part 1: Classical Image Classification Methods Kai Yu Dept. of Media Analytics NEC Laboratories America Andrew Ng Computer Science Dept. Stanford University.
CS4670 / 5670: Computer Vision Bag-of-words models Noah Snavely Object
On feature distributional clustering for text categorization Bekkerman, El-Yaniv, Tishby and Winter The Technion. June, 27, 2001.
Bag-of-features models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Lecture 28: Bag-of-words models
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
Lesson 8: Machine Learning (and the Legionella as a case study) Biological Sequences Analysis, MTA.
Bag-of-features models
Object Class Recognition Using Discriminative Local Features Gyuri Dorko and Cordelia Schmid.
5/30/2006EE 148, Spring Visual Categorization with Bags of Keypoints Gabriella Csurka Christopher R. Dance Lixin Fan Jutta Willamowski Cedric Bray.
Wayne State University, 1/31/ Multiple-Instance Learning via Embedded Instance Selection Yixin Chen Department of Computer Science University of.
Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.
Boosting Main idea: train classifiers (e.g. decision trees) in a sequence. a new classifier should focus on those cases which were incorrectly classified.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
3 ème Journée Doctorale G&E, Bordeaux, Mars 2015 Wei FENG Geo-Resources and Environment Lab, Bordeaux INP (Bordeaux Institute of Technology), France Supervisor:
Review: Intro to recognition Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear.
Exercise Session 10 – Image Categorization
CSE 185 Introduction to Computer Vision Pattern Recognition.
Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN.
Text mining.
Data mining and machine learning A brief introduction.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Learning Visual Bits with Direct Feature Selection Joel Jurik 1 and Rahul Sukthankar 2,3 1 University of Central Florida 2 Intel Research Pittsburgh 3.
Presented by Tienwei Tsai July, 2005
Watch, Listen and Learn Sonal Gupta, Joohyun Kim, Kristen Grauman and Raymond Mooney -Pratiksha Shah.
Marcin Marszałek, Ivan Laptev, Cordelia Schmid Computer Vision and Pattern Recognition, CVPR Actions in Context.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Xiaoxiao Shi, Qi Liu, Wei Fan, Philip S. Yu, and Ruixin Zhu
Benk Erika Kelemen Zsolt
Boosting of classifiers Ata Kaban. Motivation & beginnings Suppose we have a learning algorithm that is guaranteed with high probability to be slightly.
Special topics on text mining [ Part I: text classification ] Hugo Jair Escalante, Aurelio Lopez, Manuel Montes and Luis Villaseñor.
Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.
Group Sparse Coding Samy Bengio, Fernando Pereira, Yoram Singer, Dennis Strelow Google Mountain View, CA (NIPS2009) Presented by Miao Liu July
Text mining. The Standard Data Mining process Text Mining Machine learning on text data Text Data mining Text analysis Part of Web mining Typical tasks.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
Kylie Gorman WEEK 1-2 REVIEW. CONVERTING AN IMAGE FROM RGB TO HSV AND DISPLAY CHANNELS.
HAITHAM BOU AMMAR MAASTRICHT UNIVERSITY Transfer for Supervised Learning Tasks.
Boosted Particle Filter: Multitarget Detection and Tracking Fayin Li.
Ivica Dimitrovski 1, Dragi Kocev 2, Suzana Loskovska 1, Sašo Džeroski 2 1 Faculty of Electrical Engineering and Information Technologies, Department of.
COP5992 – DATA MINING TERM PROJECT RANDOM SUBSPACE METHOD + CO-TRAINING by SELIM KALAYCI.
Latent Dirichlet Allocation
Iowa State University Department of Computer Science Center for Computational Intelligence, Learning, and Discovery Harris T. Lin, Sanghack Lee, Ngot Bui.
NTU & MSRA Ming-Feng Tsai
Object Recognition by Discriminative Combinations of Line Segments and Ellipses Alex Chia ^˚ Susanto Rahardja ^ Deepu Rajan ˚ Maylor Leung ˚ ^ Institute.
Non-parametric Methods for Clustering Continuous and Categorical Data Steven X. Wang Dept. of Math. and Stat. York University May 13, 2010.
Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Detecting Semantic Concepts In Consumer Videos Using Audio Junwei Liang, Qin Jin, Xixi He, Gang Yang, Jieping Xu, Xirong Li Multimedia Computing Lab,
Bag-of-Visual-Words Based Feature Extraction
Source: Procedia Computer Science(2015)70:
Introductory Seminar on Research: Fall 2017
COMP61011 : Machine Learning Ensemble Models
Using Transductive SVMs for Object Classification in Images
Ensemble learning Reminder - Bagging of Trees Random Forest
Chapter 7: Transformations
Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable to.
Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable to.
Presentation transcript:

Wei Zhang Akshat Surve Xiaoli Fern Thomas Dietterich Learning Non-Redundant Codebooks for Classifying Complex Objects

Contents Learning codebooks for object classification Learning non-redundant codebooks Framework Boost-Resampling algorithm Boost-Reweighting algorithm Experiments Conclusions and future work 2

Contents Learning codebooks for object classification Learning non-redundant codebooks Framework Boost-Resampling algorithm Boost-Reweighting algorithm Experiments Conclusions and future work 3

Problem 1: Stonefly Recognition Cal Dor Hes Iso Mos Pte Swe Yor Zap 4

Visual Codebook for Object Recognition Interest Region Detector Region Descriptors Visual Codebook Image Attribute Vector (Term Frequency) Classifier 6 Training image Testing image 5

Problem 2: Document Classification Through the first half of the 20th century, most of the scientific community believed dinosaurs to have been slow, unintelligent cold-blooded animals. Most research conducted since the 1970s, however, has supported the view that dinosaurs were active animals with elevated metabolisms and numerous adaptations for social interaction. The resulting transformation in the scientific understanding of dinosaurs has gradually filtered … 6 Variable-length Document … absent: 0 … active: 1 … animal: 2 … believe: 1 … dinosaur: 3 … social:1 … Fixed-length Bag-of-words

Codebook for Document Classification Cluster the words to form code-words Through the first half of the 20th century, most of the scientific community believed dinosaurs to have been slow, unintelligent cold-blooded animals. Most research conducted since the 1970s, however, has supported the view that dinosaurs were active animals with elevated metabolisms and numerous adaptations for social interaction. The resulting transformation in the scientific understanding of dinosaurs has gradually filtered … Training corpus dog, canine, hound,... cluster 1 cluster 2 car, automobile, vehicle, … … Through the first half of the 20th century, most of the scientific community believed dinosaurs to have been slow, unintelligent cold-blooded animals. Most research conducted since the 1970s, however, has supported the view that dinosaurs were active animals with elevated metabolisms and numerous adaptations for social interaction. The resulting transformation in the scientific understanding of dinosaurs has gradually filtered … codebook Input document … cluster K 201 … 02 Classifier 7

Contents Learning codebooks for object classification Learning non-redundant codebooks Framework Boost-Resampling algorithm Boost-Reweighting algorithm Experiments Conclusions and future work 8

Learning Non-Redundant Codebooks Motivation: Improve the discriminative performance of any codebook and classifier learning approach by encouraging non-redundancy in the learning process. Approach: learn multiple codebooks and classifiers; wrap the codebook and classifier learning process inside a boosting procedure [1]. Codebook Approaches: k-means, Gaussian Mixture Modeling, Information Bottleneck, Vocabulary trees, Spatial pyramid … Non-Redundant Learning [1] Freund, Y. and Schapire, R. (1996). Experiments with a new boosting algorithm. ICML. 9

Non-Redundant Codebook and Classifier Learning Framework Classifier C 1 Clustering X based on weights W 1 (B) X X Update boosting weights ………… Codebook D 1 Classifier C t Clustering X based on weights W t (B) X X Codebook D t Classifier C T Clustering X based on weights W T (B) X X Codebook D T W1(B)W1(B) Predictions L 1 Final Predictions L Wt(B)Wt(B) Predictions L t WT(B)WT(B) Predictions L T Update boosting weights ………… 10

Instantiations of the Framework Boost-Reweighting (discrete feature space): Supervised clustering features X based on the joint distribution table P t (X, Y) (Y represents the class labels). This table is updated at each iteration based on the new boosting weights. Boost-Resampling (continuous feature space): Generate a non-redundant clustering set by sampling the training examples according to the updated boosting weights. The codebook is constructed by clustering the features in this clustering set. 11

Codebook Learning and Classification Algorithms Documents: Codebook Learning: Information Bottleneck (IB) [1]: L = I(X ; X’) − β I(X’ ; Y) Classification: Naïve Bayes Objects: Codebook Learning: K-Means Classification: Bagged Decision Trees [1] Bekkerman, R., El-yaniv, R., Tishby, N., Winter, Y., Guyon, I. and Elisseeff, A. (2003). Distributional word clusters vs. words for text categorization. JMLR. 12

Image Attributes: tf − idf Weights [1] Salton, G. and Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management. Term-frequency−inverse document frequency (tf−idf) weight [1]: "Document" = Image "Term" = Instance of a visual word Interest Regions Region Descriptors Visual Codebook Image Attribute Vector 6 13 Classifier tf-idf

Contents Learning codebooks for object classification Learning non-redundant codebooks Framework Boost-Resampling algorithm Boost-Reweighting algorithm Experiments Conclusions and future work 14

Experimental Results − Stonefly Recognition DatasetBoostLarios [1]Opelt [2] STONEFLY STONEFLY / [1] Larios, N., Deng, H., Zhang, W., Sarpola, M., Yuen, J., Paasch, R., Moldenke, A., Lytle, D., Ruiz Correa, S., Mortensen, E., Shapiro, L. and Dietterich, T. (2008). Automated insect identification through concatenated histograms of local appearance features. Machine Vision and Applications. [2] Opelt, A., Pinz, A., Fussenegger, M. and Auer, P. (2006). Generic object recognition with boosting. PAMI. 3-fold cross validation experiments The size of each codebook K = 100 The number of boosting iterations T = 50 15

Experimental Results − Stonefly Recognition (cont.) DatasetBoostSingleRandom STONEFLY STONEFLY STONEFLY Single: learns only a single codebook of size K×T = Random: weighted sampling is replaced with uniform random sampling that neglects the boosting weights. Boost achieves 77% error reduction comparing with Single on STONEFLY9. 16

Experimental Results − Stonefly Recognition (cont.) 17

Experimental Results − Document Classification S1000: learns a single codebook of size S100: learns a single codebook of size 100. Random: 10 bagged samples of the original training corpus are used to estimate the joint distribution table P t (X, Y). DatasetBoostRandomS1000S100 NG ENRON

Experimental Results − Document Classification (cont.) [TODO]: add Figure 5 in a similar format as Figure 4 19

Contents Learning codebooks for object classification Learning non-redundant codebooks Framework Boost-Resampling algorithm Boost-Reweighting algorithm Experiments Conclusions and future work 20

Conclusions and Future Work Conclusions: Non-redundant learning is a simple and general framework to effectively improve the performance of codebooks. Future work: Explore the underlying reasons for the effectiveness of non- redundant codebooks – discriminative analysis, non-redundancy tests; More comparison experiments on well-established datasets. 21

Acknowledgements Supported by Oregon State University insect ID project: Supported by NSF under grant number IIS Thank you ! 22