Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin.

Slides:

Advertisements

Similar presentations

Context-based object-class recognition and retrieval by generalized correlograms by J. Amores, N. Sebe and P. Radeva Discussion led by Qi An Duke University.

Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

An Introduction of Support Vector Machine

Level set based Image Segmentation Hang Xiao Jan12, 2013.

1 Semi-supervised learning for protein classification Brian R. King Chittibabu Guda, Ph.D. Department of Computer Science University at Albany, SUNY Gen*NY*sis.

Carolina Galleguillos, Brian McFee, Serge Belongie, Gert Lanckriet Computer Science and Engineering Department Electrical and Computer Engineering Department.

Pattern Recognition and Machine Learning

An Introduction of Support Vector Machine

Support vector machine

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Iowa State University Department of Computer Science Artificial Intelligence Research Laboratory Research supported in part by grants from the National.

Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.

Multiple Instance Learning

Pattern Recognition and Machine Learning

Fuzzy Support Vector Machines (FSVMs) Weijia Wang, Huanren Zhang, Vijendra Purohit, Aditi Gupta.

Solving the Multiple-Instance Problem with Axis-Parallel Rectangles By Thomas G. Dietterich, Richard H. Lathrop, and Tomás Lozano-Pérez Appeared in Artificial.

1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.

Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.

Support Vector Machines for Multiple- Instance Learning Authors: Andrews, S.; Tsochantaridis, I. & Hofmann, T. (Advances in Neural Information Processing.

Support Vector Machines Based on Burges (1998), Scholkopf (1998), Cristianini and Shawe-Taylor (2000), and Hastie et al. (2001) David Madigan.

Image Categorization by Learning and Reasoning with Regions Yixin Chen, University of New Orleans James Z. Wang, The Pennsylvania State University Published.

Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University

Region Based Image Annotation Through Multiple-Instance Learning By: Changbo Yang Wayne State University Department of Computer Science.

Bing LiuCS Department, UIC1 Learning from Positive and Unlabeled Examples Bing Liu Department of Computer Science University of Illinois at Chicago Joint.

Wayne State University, 1/31/ Multiple-Instance Learning via Embedded Instance Selection Yixin Chen Department of Computer Science University of.

Large-Scale Text Categorization By Batch Mode Active Learning Steven C.H. Hoi †, Rong Jin ‡, Michael R. Lyu † † CSE Department, Chinese University of Hong.

Classification and Prediction: Regression Analysis

Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.

Active Learning for Class Imbalance Problem

Learning with Positive and Unlabeled Examples using Weighted Logistic Regression Wee Sun Lee National University of Singapore Bing Liu University of Illinois,

Mining Discriminative Components With Low-Rank and Sparsity Constraints for Face Recognition Qiang Zhang, Baoxin Li Computer Science and Engineering Arizona.

CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.

Watch, Listen and Learn Sonal Gupta, Joohyun Kim, Kristen Grauman and Raymond Mooney -Pratiksha Shah.

Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.

1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.

An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.

Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.

Multiple Instance Real Boosting with Aggregation Functions Hossein Hajimirsadeghi and Greg Mori School of Computing Science Simon Fraser University International.

Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.

Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.

SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.

CS Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct

Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.

1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.

Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science ＆ Information Engineering.

Extending the Multi- Instance Problem to Model Instance Collaboration Anjali Koppal Advanced Machine Learning December 11, 2007.

D. M. J. Tax and R. P. W. Duin. Presented by Mihajlo Grbovic Support Vector Data Description.

Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.

Multiple Instance Learning via Successive Linear Programming Olvi Mangasarian Edward Wild University of Wisconsin-Madison.

Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University.

A New Supervised Over-Sampling Algorithm with Application to Protein-Nucleotide Binding Residue Prediction Li Lihong (Anna Lee) Cumputer science 22th,Apr.

Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:

Machine Learning 5. Parametric Methods.

Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.

A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.

Computational Intelligence: Methods and Applications Lecture 24 SVM in the non-linear case Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.

Department of Computer Science The University of Texas at Austin USA Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit J. Kate Raymond.

Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on.

Roughly overview of Support vector machines Reference: 1.Support vector machines and machine learning on documents. Christopher D. Manning, Prabhakar Raghavan.

Semi-Supervised Clustering

Computational Intelligence: Methods and Applications

Yu-Feng Li 1, James T. Kwok2, Ivor W. Tsang3 and Zhi-Hua Zhou1

Constrained Clustering -Semi Supervised Clustering-

CSE 4705 Artificial Intelligence

Machine Learning Basics

Learning to Extract Relations from the Web using Minimal Supervision

Learning with information of features

Biointelligence Laboratory, Seoul National University

Semi-Supervised Learning

SVMs for Document Ranking

Presentation transcript:

Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin Raymond J. Mooney Machine Learning Group Department of Computer Sciences University of Texas at Austin

Two Types of Supervision Single Instance Learning (SIL): –the traditional type of supervision in machine learning. –a dataset of positive and negative training instances. Multiple Instance Learning (MIL): –a dataset of positive and negative training bags of instances. –a bag is positive if at least one instance in the bag is positive. –a bag is negative if all instances in the bag are negative. –the bag instance labels are hidden. 1

MIL Background: Domains Originally introduced to solve a Drug Activity prediction problem in biochemistry [Dietterich et al., 1997] Content Based Image Retrieval [Zhang et al., 2002] Text categorization [Andrews et al., 03], [Ray et al., 05]. 2

MIL Background: Algorithms Axis Parallel Rectangles [Dietterich, 1997] Diverse Density [Maron, 1998] Multiple Instance Logistic Regression [Ray & Craven, 05] Multi-Instance SVM kernels of [Gartner et al., 2002] –Normalized Set Kernel. –Statistic Kernel. 3

Outline Introduction MIL as SIL with one-side noise The Normalized Set Kernel (NSK) Three SVM approaches to MIL: –An SVM approach to sparse MIL (sMIL) –A transductive SVM approach to sparse MIL (stMIL) –A balanced SVM approach to MIL (sbMIL) Experimental Results Future Work & Conclusion 4

SIL Approach to MIL Apply bag label to all bag instances. Formulate as SVM problem. minimize: subject to: 5

SIL Approach to MIL Apply bag label to all bag instances. Formulate as SVM problem. minimize: subject to: Negative Bags 6

SIL Approach to MIL Apply bag label to all bag instances. Formulate as SVM problem. minimize: subject to: Positive Bags 7

SIL Approach to MIL Apply bag label to all bag instances. Formulate as SVM problem. minimize: subject to: Regularization term 8

SIL Approach to MIL Apply bag label to all bag instances. Formulate as SVM problem. minimize: subject to: Error on negative bags 9

SIL Approach to MIL Apply bag label to all bag instances. Formulate as SVM problem. minimize: subject to: Error on positive bags 10

Outline Introduction MIL as SIL with one-side noise The Normalized Set Kernel (NSK) Three SVM approaches to MIL: –An SVM approach to sparse MIL (sMIL) –A transductive SVM approach to sparse MIL (stMIL) –A balanced SVM approach to MIL (sbMIL) Experimental Results Future Work & Conclusion 11

From SIL to the Normalized Set Kernel Apply bag label to all bag instances. Formulate as SVM problem. minimize: subject to: 12

From SIL to the Normalized Set Kernel Apply bag label to all bag instances. Formulate as SVM problem. minimize: subject to: 13

From SIL to the Normalized Set Kernel Apply bag label to all bag instances. Formulate as SVM problem. minimize: subject to:  14

From SIL to the Normalized Set Kernel Apply bag label to all bag instances. Formulate as SVM problem. minimize: subject to:   (X) 15

From SIL to the Normalized Set Kernel Apply bag label to all bag instances. Formulate as SVM problem. minimize: subject to:  XX 16

From SIL to the Normalized Set Kernel Apply bag label to all bag instances. Formulate as SVM problem. minimize: subject to:  Normalized Set Kernel 17

The Normalized Set Kernel A bag is represented as the normalized sum of its instances. Use bags as examples in an SVM formulation. minimize: subject to: [Gartner et al., 2002] 18

The Normalized Set Kernel A bag is represented as the normalized sum of its instances. Use bags as examples in an SVM formulation. minimize: subject to: [Gartner et al., 2002] 19 

The Normalized Set Kernel (NSK) A positive bag is the normalized sum of its instances. Use positive bags and negative instances as examples. minimize: subject to: 20

Outline Introduction MIL as SIL with one-side noise The Normalized Set Kernel (NSK) Three SVM approaches to MIL: –An SVM approach to sparse MIL (sMIL) –A transductive SVM approach to sparse MIL (stMIL) –A balanced SVM approach to MIL (sbMIL) Experimental Results Future Work & Conclusion 21

The Normalized Set Kernel (NSK) A positive bag is the normalized sum of its instances. Use positive bags and negative instances as examples. minimize: subject to: 22 too strong, especially when sparse positive bags

Inequality Constraints for Positive Bags 23 NSK constraint  Balancing constraint implicitly assumes that all instances inside the bag X are positive

Inequality Constraints for Positive Bags 24 want balancing contraint to express that at least one instance in the bag X is positive  sparse MIL constraint

The Sparse MIL (sMIL) minimize: subject to: 25 larger for smaller bags  small positive bags are more informative than large positive bags

Outline Introduction MIL as SIL with one-side noise The Normalized Set Kernel (NSK) Three SVM approaches to MIL: An SVM approach to sparse MIL (sMIL) –A transductive SVM approach to sparse MIL (stMIL) –A balanced SVM approach to MIL (sbMIL) Experimental Results Future Work & Conclusion 26

Inequality Constraints for Positive Bags sMIL is closer than NSK at expressing the constraint that at least one instance from a positive bag is positive. However, sMIL does not guarantee that at least one instance is positive –Problem: constraint may be satisfied when all instances have negative scores that are very close to zero. –Solution: force all negative instances to have scores  –1 +  X using the transductive constraint: 27 sparse MIL constraint

Inequality Constraints for Positive Bags 28 sparse MIL constraint transductive constraint shared slacks  mixed integer programming problem at least one instance is positive

Inequality Constraints for Positive Bags 29 sparse MIL constraint transductive constraint independent slacks  easier problem, solve with CCCP [Yuille et al., 2002] at least one instance is positive

The Sparse Transductive MIL (stMIL) minimize: subject to: 30 solve with CCCP, as in [Collobert et al. 2006]

Outline Introduction MIL as SIL with one-side noise The Normalized Set Kernel (NSK) Three SVM approaches to MIL: An SVM approach to sparse MIL (sMIL) A transductive SVM approach to sparse MIL (stMIL) –A balanced SVM approach to MIL (sbMIL) Experimental Results Future Work & Conclusion 31

A Balanced SVM Approach to MIL SIL ideal when bags are dense in positive instances. sMIL ideal when bags are sparse in positive instances. If expected density of positive instances  is known, design a method that: –converges to SIL when   1. –converges to sMIL when   0. If  is unknown, can set it using cross-validation. 32

The Balanced MIL (sbMIL) Input: –Training negative bags X n, define X n  {x | x  X  X n }. –Training positive bags X p, define X p  {x | x  X  X p } –Features  (x), or kernel K(x,y). –Capacity parameter C  0 and balance parameter  [0,1]. Output: –Decision function f(x)  w  (x)+b. 33 solve_sMIL  (w,b)  solve_sMIL(X n, X p, , C).  order all instances x  X p using f(x).  label instances x  X p :  the top  | X p | as positive.  the rest (1–  ) | X p | as negative. solve_SIL  (w,b)  solve_SIL( X n, X p, , C).

Outline Introduction MIL as SIL with one-side noise The Normalized Set Kernel (NSK) Three SVM approaches to MIL: An SVM approach to sparse MIL (sMIL) A transductive SVM approach to sparse MIL (stMIL) A balanced SVM approach to MIL (sbMIL) Experimental Results Future Work & Conclusion 34

Experimental Results: Datasets [AIMed] An artificial, maximally sparse dataset : –Created from AIMed [Bunescu et al., 2005]: A dataset of documents annotated for protein interactions; A sentence example contains a pair of proteins – the sentence is positive iff it asserts an interaction between the two proteins; –Create positive bags of sentences: choose bag size randomly between S min and S max. start with exactly one positive instance, randomly add negative instances. –Create negative bags of sentences: choose bag size randomly between S min and S max. randomly add negative instances. Use subsequence kernel from [Bunescu & Mooney, 2005]. 35

Experimental Results: Datasets [CBIR] Content Based Image Retrieval: –categorize images as to whether they contain an object of interest. –an image is a bag of image regions. –the number of regions varies widely between images. –for every image, expect that relatively few regions contain object of interest  naturally sparse positive bags. –Evaluate on [Tiger], [Elephant], [Fox] datasets from [Andrews et al., 2003]. Use a quadratic kernel with the original feature vectors. 36

Experimental Results: Datasets [TST] Text categorization datasets: –Medline articles are bags of overlapping text passages. –Articles are annotated with Mesh terms – use them as classes. –Use [TST1] and [TST2] from [Andrews et al., 2003]. [MUSK] Drug Activity prediction: –Bags of 3D low energy conformations for every molecule. –A bag is positive is at least one conformation binds to target. –[MUSK1] and MUSK2] datasets from [Dietterich et al., 1997] –A bag is positive if the molecule smells “musky”. Use a quadratic kernel with the original feature vectors. 37

Experimental Results: Systems [SIL] The MIL as SIL with one-side noise. [NSK] The Normalized Set Kernel. [STK] The Statistic Kernel. [sMIL] The SVM approach to sparse MIL. [stMIL] The transductive SVM approach to sparse MIL. [sbMIL] The balanced SVM approach to MIL. 38

Experimental Results 39 DatasetSILNSKSTKsMILsbMILstMIL AIMed N/A AIMed½ N/A Tiger Elephant Fox MUSK MUSK TST TST

Experimental Results 40 DatasetSILNSKSTKsMILsbMILstMIL AIMed N/A AIMed½ N/A Tiger Elephant Fox MUSK MUSK TST TST

Experimental Results 41 DatasetSILNSKSTKsMILsbMILstMIL AIMed N/A AIMed½ N/A Tiger Elephant Fox MUSK MUSK TST TST

Future Work Capture distribution imbalance in the MIL model: –instances belonging to the same bag are, in general, more similar than instances belonging to different bags. Incorporate estimates of bag-level densitiy in the MIL model: –in some applications, estimates of density of positive instances are available for every bag. 42

Conclusion Proposed an SVM approach to MIL that is particularly effective when bags are sparse in positive instances. Modeling a global density of positive instancs in positive bags further improves the accuracy. Treating instances from positive bags as unlabeled data in a transductive setting is useful when negative instances in positive and negative bags come from the same distribution. 43

Questions 44 ?