Theoretical Analysis of Multi-Instance Leaning 张敏灵 周志华 南京大学软件新技术国家重点实验室
Outline Introduction Theoretical analysis PAC learning model PAC learnablility of APR Real-valued multi-instance learning Future work
Introduction Origin Multi-instance learning originated from the problem of “ drug activity prediction ”, and was first formalized by T. G. Dietterich et al. in their seminal paper “ Solving the multiple-instance problem with axis-parallel rectangles ” (1997) Later in 2001, J. D. Zuker and Y. Chevaleyre extended the concept of “ multi-instance learning ” to “ multi-part learning ”, and pointed out that many previously studied problems are “ multi-part ” problems rather than “ multi-instance ” ones.
Introduction-cont ’ d Comparisons Fig.1. The shape of a molecule changes as it rotates it ’ s bonds Fig.2. Classical and multi-instance learning frameworks Drug activity prediction problem
Introduction-cont ’ d Experiment data Data set #dim#bags#pos bags #neg bags #instan- ces #instances/bag maxminave musk musk APR(Axis-Parallel Rectangles) algorithms Fig.3. APR algorithms GFS elim-count APR(standard) GFS elim-kde APR(outside-in) Iterated discrim APR(inside-out) musk1: 92.4% musk2: 89.2%
Introduction-cont ’ d Various algorithms APR (T. G. Dietterich et al.1997) MULTINST (P. Auer 1997) Diverse Density (O. Maron 1998) Bayesian-kNN, Citation-kNN (J. Wang et al. 2000) Relic (G. Ruffo 2000) EM-DD (Q. Zhang & S. A. Goldman 2001) ……
Introduction-cont ’ d Comparison on benchmark data sets AlgorithmsMusk1 (%correct) Musk2 (%correct) iterated-discrim APR Citation-kNN Diverse Density RELIC MULTINST BP C Fig.4. A comparison of several multi-instance learning algorithm
Introduction-cont ’ d Application area Drug activity prediction (T. G. Dietterich et al. 1997) Stock prediction (O. Maron 1998) Learn a simple description of a person from a series of images (O. Maron 1998) Natural scene classification (O. Maron & A. L. Ratan 1998) Event prediction (G. M. Weiss & H. Hirsh 1998) Data mining and computer security (G. Ruffo 2000) …… Multi-instance learning has been regarded as the fourth machine learning framework parallel to supervised learning, unsupervised learning, and reinforcement learning.
Theoretical analysis PAC learning model Definition and it ’ s properties VC dimension PAC learnability of APR Real-valued multi-instance learning
Theoretical Analysis - PAC model Computational learning theory L. G. Valiant (1984) A theory of learnable Deductive learning Used for constructing a mathematical model of a cognitive process. W P Actual example M Coded example 0/1 Fig.5. Diagram of a framework for learning
PAC model-cont ’ d Definition of PAC learning We say that a learning algorithm L is a pac(probably approximately correct) learning algorithm for the hypothesis space H if, given A confidence parameter δ (0< δ<1); An accuracy parameter ε (0< ε<1); then there is a positive integer m L = m L (δ,ε) such that For any target concept t ∈ H For any probability distribution µ on X whenever m m L, µ m {s ∈ S(m,t) | er µ (L(s), t) 1- δ
PAC model-cont ’ d Properties of a pac learning algorithm It is probable that a useful training sample is presented. One can only expect that the output hypothesis is approximately correct. m L depends upon δ and ε, but not on t and µ. If there is a pac learning algorithm for a hypothesis space H, then we say that H is pac-learnable. Efficient pac learning algorithm If the running time of a pac learning algorithm L is polynomial in 1/ δ and 1/ ε, then L is said to be efficient. It is usually necessary to require a pac learning algorithm to be efficient.
PAC model-cont ’ d VC dimension VC (Vapnik-Chervonenkis) dimension of a hypothesis space H is a notion originally defined by Vapnik and Chervonenkis(1971), and was introduced into computational learning theory by Blumer et al.(1986) VC dimension of a hypothesis space H, denoted by VCdim(H), describes the ‘ expressive power ’ of H in a sense.Generally, the greater of VCdim(H), the greater ‘ expressive power ’ of H, so H is more difficult to learn.
PAC model-cont ’ d Consistency If for any target concept t ∈ H and any training sample s=((x 1,b 1 ),(x 2,b 2 ),..., (x m,b m )) for t, the corresponding hypothesis L(s) ∈ H agrees with s, i.e. L(s)(x i )=t(x i )=b i, then we say that L is a consistent algorithm. VC dimension and pac learnability L is a consistent learning algorithm for H H has finite VC dimension H is pac-learnable
Theoretical Analysis - PAC learning of APR Early work While T. G. Dietterich et al. have proposed three APR algorithms for multi-instance learning, P. M. Long & L. Tan (1997) had some theoretical analysis of the pac learnability of APR and showed that if, Each instance in a bag is draw from a product distribution. All instance in a bag are drawn independently. then APR is pac learnable under the multi-instance learning framework with sample complexity and time complexity.
PAC learning of APR-cont ’ d A hardness result Via the analysis of VC dimension, P. Auer et al.(1998) gave a much more efficient pac learning algorithm than with sample complexity and time complexity. More important, they proved that if the instances in a bag are not independent, then learning APR under multi-instance learning framework is as hard as learning DNF formulas, which is a NP- Complete problem.
PAC learning of APR-cont ’ d A further reduction A. Blum & A. Kalai (1998) further studied the problem of pac learning APR from multi-instance examples, and proved that If H is pac learnable from 1-sided (or 2-sided) random classification noise, then H is pac learnable from multi- instance examples. Via a reduction to the “ Statistical Query ” model ( M. Kearns 1993), APR is pac learnable from multi-instance examples with sample complexity and with time complexity.
PAC learning of APR-cont ’ d Summary Sample complexity Time complexity Constrains Theoretical tools P. M. Long et al. product distribution, independent instances p-concept, VC dimension P. Auer et al. independent instances VC dimension A. Blum et al. independent instances statistical query model, VC dimension Fig.6. A comparison of three theoretical algorithm
Theoretical Analysis - Real-valued multi-instance learning Real-valued multi-instance learning It is worthwhile to note that in several applications of the multiple instance problem, the actual predictions desired are real valued. For example, the binding affinity between a molecule and receptor is quantitative, so a real-valued label of binding strength is preferable. S. Ray & D. Page (2001) showed that the problem of multi- instance regression is NP-Complete, furthermore, D. R. Dooly et al. (2001) showed that learning from real-valued multi-instance examples is as hard as learning DNF. Nearly at the same time, R. A. Amar et al.(2001) extended the KNN, Citation-kNN and Diverse Density algorithms for real-valued multi-instance learning, they also provided a flexible procedure for generating chemically realistic artificial data sets and studied the performance of these modified algorithms on them.
Future work Further theoretical analysis of multi-instance learning. Design multi-instance modifications for neural networks, decision trees, and other popular machine learning algorithms. Explore more issues which can be translated into multi-instance learning problems. Design appropriate bag generating methods. ……
Thanks