Theoretical Analysis of Multi-Instance Leaning 张敏灵 周志华 南京大学软件新技术国家重点实验室 2002.10.11.

Slides:



Advertisements
Similar presentations
PAC Learning 8/5/2005. purpose Effort to understand negative selection algorithm from totally different aspects –Statistics –Machine learning What is.
Advertisements

Applications of one-class classification
CHAPTER 2: Supervised Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Learning a Class from Examples.
SVM—Support Vector Machines
Label Distribution Learning and Its Applications
Multiple Instance Learning
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Solving the Multiple-Instance Problem with Axis-Parallel Rectangles By Thomas G. Dietterich, Richard H. Lathrop, and Tomás Lozano-Pérez Appeared in Artificial.
International Workshop on Computer Vision - Institute for Studies in Theoretical Physics and Mathematics, April , Tehran 1 IV COMPUTING SIZE.
Northwestern University Winter 2007 Machine Learning EECS Machine Learning Lecture 13: Computational Learning Theory.
Multiple-Instance Learning Paper 1: A Framework for Multiple-Instance Learning [Maron and Lozano-Perez, 1998] Paper 2: EM-DD: An Improved Multiple-Instance.
Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.
Computational Learning Theory
Evaluating Hypotheses
Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Introduction to Machine Learning course fall 2007 Lecturer: Amnon Shashua Teaching Assistant: Yevgeny Seldin School of Computer Science and Engineering.
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
CS 4700: Foundations of Artificial Intelligence
Experimental Evaluation
Wayne State University, 1/31/ Multiple-Instance Learning via Embedded Instance Selection Yixin Chen Department of Computer Science University of.
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Machine Learning in Simulation-Based Analysis 1 Li-C. Wang, Malgorzata Marek-Sadowska University of California, Santa Barbara.
PAC learning Invented by L.Valiant in 1984 L.G.ValiantA theory of the learnable, Communications of the ACM, 1984, vol 27, 11, pp
COMP3503 Intro to Inductive Modeling
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
1 CSI 5388:Topics in Machine Learning Inductive Learning: A Review.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
CS344: Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 36-37: Foundation of Machine Learning.
Benk Erika Kelemen Zsolt
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
1 Machine Learning: Lecture 8 Computational Learning Theory (Based on Chapter 7 of Mitchell T.., Machine Learning, 1997)
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
Extending the Multi- Instance Problem to Model Instance Collaboration Anjali Koppal Advanced Machine Learning December 11, 2007.
Computational Learning Theory IntroductionIntroduction The PAC Learning FrameworkThe PAC Learning Framework Finite Hypothesis SpacesFinite Hypothesis Spaces.
Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University.
ECML 2001 A Framework for Learning Rules from Multi-Instance Data Yann Chevaleyre and Jean-Daniel Zucker University of Paris VI – LIP6 - CNRS.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin.
Goal of Learning Algorithms  The early learning algorithms were designed to find such an accurate fit to the data.  A classifier is said to be consistent.
Classification Ensemble Methods 1
COMP24111: Machine Learning Ensemble Models Gavin Brown
Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University Today: Computational Learning Theory Probably Approximately.
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Carla P. Gomes CS4700 Computational Learning Theory Slides by Carla P. Gomes and Nathalie Japkowicz (Reading: R&N AIMA 3 rd ed., Chapter 18.5)
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
CS 9633 Machine Learning Support Vector Machines
Chapter 7. Classification and Prediction
Computational Learning Theory
Yu-Feng Li 1, James T. Kwok2, Ivor W. Tsang3 and Zhi-Hua Zhou1
Computational Learning Theory
Computational Learning Theory
COMP61011 : Machine Learning Ensemble Models
INF 5860 Machine learning for image classification
Computational Learning Theory
Computational Learning Theory
3.1.1 Introduction to Machine Learning
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
CS344 : Introduction to Artificial Intelligence
Machine Learning: UNIT-3 CHAPTER-2
CS639: Data Management for Data Science
Supervised machine learning: creating a model
INTRODUCTION TO Machine Learning
INTRODUCTION TO Machine Learning 3rd Edition
Presentation transcript:

Theoretical Analysis of Multi-Instance Leaning 张敏灵 周志华 南京大学软件新技术国家重点实验室

Outline Introduction Theoretical analysis PAC learning model PAC learnablility of APR Real-valued multi-instance learning Future work

Introduction Origin Multi-instance learning originated from the problem of “ drug activity prediction ”, and was first formalized by T. G. Dietterich et al. in their seminal paper “ Solving the multiple-instance problem with axis-parallel rectangles ” (1997) Later in 2001, J. D. Zuker and Y. Chevaleyre extended the concept of “ multi-instance learning ” to “ multi-part learning ”, and pointed out that many previously studied problems are “ multi-part ” problems rather than “ multi-instance ” ones.

Introduction-cont ’ d Comparisons Fig.1. The shape of a molecule changes as it rotates it ’ s bonds Fig.2. Classical and multi-instance learning frameworks Drug activity prediction problem

Introduction-cont ’ d Experiment data Data set #dim#bags#pos bags #neg bags #instan- ces #instances/bag maxminave musk musk APR(Axis-Parallel Rectangles) algorithms Fig.3. APR algorithms  GFS elim-count APR(standard)  GFS elim-kde APR(outside-in)  Iterated discrim APR(inside-out) musk1: 92.4% musk2: 89.2%

Introduction-cont ’ d Various algorithms APR (T. G. Dietterich et al.1997) MULTINST (P. Auer 1997) Diverse Density (O. Maron 1998) Bayesian-kNN, Citation-kNN (J. Wang et al. 2000) Relic (G. Ruffo 2000) EM-DD (Q. Zhang & S. A. Goldman 2001) ……

Introduction-cont ’ d Comparison on benchmark data sets AlgorithmsMusk1 (%correct) Musk2 (%correct) iterated-discrim APR Citation-kNN Diverse Density RELIC MULTINST BP C Fig.4. A comparison of several multi-instance learning algorithm

Introduction-cont ’ d Application area  Drug activity prediction (T. G. Dietterich et al. 1997)  Stock prediction (O. Maron 1998)  Learn a simple description of a person from a series of images (O. Maron 1998)  Natural scene classification (O. Maron & A. L. Ratan 1998)  Event prediction (G. M. Weiss & H. Hirsh 1998)  Data mining and computer security (G. Ruffo 2000)  …… Multi-instance learning has been regarded as the fourth machine learning framework parallel to supervised learning, unsupervised learning, and reinforcement learning.

Theoretical analysis PAC learning model Definition and it ’ s properties VC dimension PAC learnability of APR Real-valued multi-instance learning

Theoretical Analysis - PAC model Computational learning theory L. G. Valiant (1984) A theory of learnable Deductive learning Used for constructing a mathematical model of a cognitive process. W P Actual example M Coded example 0/1 Fig.5. Diagram of a framework for learning

PAC model-cont ’ d Definition of PAC learning We say that a learning algorithm L is a pac(probably approximately correct) learning algorithm for the hypothesis space H if, given A confidence parameter δ (0< δ<1); An accuracy parameter ε (0< ε<1); then there is a positive integer m L = m L (δ,ε) such that For any target concept t ∈ H For any probability distribution µ on X whenever m m L, µ m {s ∈ S(m,t) | er µ (L(s), t) 1- δ

PAC model-cont ’ d Properties of a pac learning algorithm It is probable that a useful training sample is presented. One can only expect that the output hypothesis is approximately correct. m L depends upon δ and ε, but not on t and µ. If there is a pac learning algorithm for a hypothesis space H, then we say that H is pac-learnable. Efficient pac learning algorithm If the running time of a pac learning algorithm L is polynomial in 1/ δ and 1/ ε, then L is said to be efficient. It is usually necessary to require a pac learning algorithm to be efficient.

PAC model-cont ’ d VC dimension VC (Vapnik-Chervonenkis) dimension of a hypothesis space H is a notion originally defined by Vapnik and Chervonenkis(1971), and was introduced into computational learning theory by Blumer et al.(1986) VC dimension of a hypothesis space H, denoted by VCdim(H), describes the ‘ expressive power ’ of H in a sense.Generally, the greater of VCdim(H), the greater ‘ expressive power ’ of H, so H is more difficult to learn.

PAC model-cont ’ d Consistency If for any target concept t ∈ H and any training sample s=((x 1,b 1 ),(x 2,b 2 ),..., (x m,b m )) for t, the corresponding hypothesis L(s) ∈ H agrees with s, i.e. L(s)(x i )=t(x i )=b i, then we say that L is a consistent algorithm. VC dimension and pac learnability L is a consistent learning algorithm for H H has finite VC dimension H is pac-learnable

Theoretical Analysis - PAC learning of APR Early work While T. G. Dietterich et al. have proposed three APR algorithms for multi-instance learning, P. M. Long & L. Tan (1997) had some theoretical analysis of the pac learnability of APR and showed that if, Each instance in a bag is draw from a product distribution. All instance in a bag are drawn independently. then APR is pac learnable under the multi-instance learning framework with sample complexity and time complexity.

PAC learning of APR-cont ’ d A hardness result Via the analysis of VC dimension, P. Auer et al.(1998) gave a much more efficient pac learning algorithm than with sample complexity and time complexity. More important, they proved that if the instances in a bag are not independent, then learning APR under multi-instance learning framework is as hard as learning DNF formulas, which is a NP- Complete problem.

PAC learning of APR-cont ’ d A further reduction A. Blum & A. Kalai (1998) further studied the problem of pac learning APR from multi-instance examples, and proved that If H is pac learnable from 1-sided (or 2-sided) random classification noise, then H is pac learnable from multi- instance examples. Via a reduction to the “ Statistical Query ” model ( M. Kearns 1993), APR is pac learnable from multi-instance examples with sample complexity and with time complexity.

PAC learning of APR-cont ’ d Summary Sample complexity Time complexity Constrains Theoretical tools P. M. Long et al. product distribution, independent instances p-concept, VC dimension P. Auer et al. independent instances VC dimension A. Blum et al. independent instances statistical query model, VC dimension Fig.6. A comparison of three theoretical algorithm

Theoretical Analysis - Real-valued multi-instance learning Real-valued multi-instance learning It is worthwhile to note that in several applications of the multiple instance problem, the actual predictions desired are real valued. For example, the binding affinity between a molecule and receptor is quantitative, so a real-valued label of binding strength is preferable. S. Ray & D. Page (2001) showed that the problem of multi- instance regression is NP-Complete, furthermore, D. R. Dooly et al. (2001) showed that learning from real-valued multi-instance examples is as hard as learning DNF. Nearly at the same time, R. A. Amar et al.(2001) extended the KNN, Citation-kNN and Diverse Density algorithms for real-valued multi-instance learning, they also provided a flexible procedure for generating chemically realistic artificial data sets and studied the performance of these modified algorithms on them.

Future work Further theoretical analysis of multi-instance learning. Design multi-instance modifications for neural networks, decision trees, and other popular machine learning algorithms. Explore more issues which can be translated into multi-instance learning problems. Design appropriate bag generating methods. ……

Thanks