ECML 2001 A Framework for Learning Rules from Multi-Instance Data Yann Chevaleyre and Jean-Daniel Zucker University of Paris VI – LIP6 - CNRS.

Slides:



Advertisements
Similar presentations
1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005.
Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.
Random Forest Predrag Radenković 3237/10
1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)
CHAPTER 2: Supervised Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Learning a Class from Examples.
ADBIS 2007 A Clustering Approach to Generalized Pattern Identification Based on Multi-instanced Objects with DARA Rayner Alfred Dimitar Kazakov Artificial.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Fast Algorithms For Hierarchical Range Histogram Constructions
RIPPER Fast Effective Rule Induction
My name is Dustin Boswell and I will be presenting: Ensemble Methods in Machine Learning by Thomas G. Dietterich Oregon State University, Corvallis, Oregon.
Deriving rules from data Decision Trees a.j.m.m (ton) weijters.
Combining Inductive and Analytical Learning Ch 12. in Machine Learning Tom M. Mitchell 고려대학교 자연어처리 연구실 한 경 수
Decision Tree Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Multiple Instance Learning
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.
Evaluating Hypotheses
Machine Learning: Symbol-Based
Introduction to Machine Learning course fall 2007 Lecturer: Amnon Shashua Teaching Assistant: Yevgeny Seldin School of Computer Science and Engineering.
1 Computational Learning Theory and Kernel Methods Tianyi Jiang March 8, 2004.
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 22 Jim Martin.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
MACHINE LEARNING. What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.
Part I: Classification and Bayesian Learning
Online Learning Algorithms
Experiment Databases: Towards better experimental research in machine learning and data mining Hendrik Blockeel Katholieke Universiteit Leuven.
Issues with Data Mining
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Monday, April 3, 2000 DingBing.
Inductive learning Simplest form: learn a function from examples
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
For Wednesday No new reading Homework: –Chapter 18, exercises 3, 4, 7.
1 Machine Learning What is learning?. 2 Machine Learning What is learning? “That is what learning is. You suddenly understand something you've understood.
Chapter 10. Sampling Strategy for Building Decision Trees from Very Large Databases Comprising Many Continuous Attributes Jean-Hugues Chauchat and Ricco.
For Wednesday Read chapter 18, section 7 Homework: –Chapter 18, exercise 11.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
Learning from Observations Chapter 18 Through
Introduction to Machine Learning Supervised Learning 姓名 : 李政軒.
For Wednesday No reading Homework: –Chapter 18, exercise 6.
Machine Learning, Decision Trees, Overfitting Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 14,
Extending the Multi- Instance Problem to Model Instance Collaboration Anjali Koppal Advanced Machine Learning December 11, 2007.
Machine Learning Chapter 5. Artificial IntelligenceChapter 52 Learning 1. Rote learning rote( โรท ) n. วิถีทาง, ทางเดิน, วิธีการตามปกติ, (by rote จากความทรงจำ.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
1 Decision Tree Learning Original slides by Raymond J. Mooney University of Texas at Austin.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Theme 2: Data & Models One of the central processes of science is the interplay between models and data Data informs model generation and selection Models.
Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin.
Decision Tree Learning Presented by Ping Zhang Nov. 26th, 2007.
Data Mining and Decision Support
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
Decision List LING 572 Fei Xia 1/12/06. Outline Basic concepts and properties Case study.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Theoretical Analysis of Multi-Instance Leaning 张敏灵 周志华 南京大学软件新技术国家重点实验室
Chap. 10 Learning Sets of Rules 박성배 서울대학교 컴퓨터공학과.
1 Learning Bias & Clustering Louis Oliphant CS based on slides by Burr H. Settles.
Ensembles of Classifiers Evgueni Smirnov. Outline 1 Methods for Independently Constructing Ensembles 1.1 Bagging 1.2 Randomness Injection 1.3 Feature-Selection.
Pattern recognition – basic concepts. Sample input attribute, attribute, feature, input variable, independent variable (atribut, rys, příznak, vstupní.
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate its.
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Fast Effective Rule Induction
Supervised Time Series Pattern Discovery through Local Importance
Rai University , November 2014
INTRODUCTION TO Machine Learning 2nd Edition
Presentation transcript:

ECML 2001 A Framework for Learning Rules from Multi-Instance Data Yann Chevaleyre and Jean-Daniel Zucker University of Paris VI – LIP6 - CNRS

ECML 2001 atomic description Motivations Att/Val representation Relational representation global description - Low expressivity + Tractable + high expressivity - Untractability, unless strong biases MI Representation  Most available MI learners use numerical data, and generate non easily interpretable hypotheses  Our goal: design efficient MI learners handling numeric and symbolic data, and generating interpretable hypotheses, such as decision trees or rule sets  The choice of a good representation is a central issue in ML tasks.

ECML 2001 Outline 1) Multiple-Instance Learninig –Multiple-instance representation, where are the MI-data, the MI learning problem 2) Extending a propositional algorithm to handle MI data –Method, extending the Ripper rule learner 3) Analysis of the multiple-instance extension of Ripper –Misleading litterals, unrelevant litterals, litteral selection problem 4) Experimentations & Applications Conclusion et future work

ECML 2001 The Multiple Instance Representation: definition Standard A/V representation: Multiple Instance representation: {0,1}-valued label l i is represented by A/V vector x i is represented by A/V vector x i,1 A/V vector x i,2 A/V vector x i,r {0,1}-valued label l i + example i + bag instances example i

ECML 2001  Many complex objects, such as images or molecules, can easily be represented with bags of instances  Relational databases may also be represented this way  More complex representations, such as datalog facts, may be MI-propositionalized [zucker98], [Alphonse and Rouveirol 99] 0,n 1 Where can we find MI data?

ECML 2001 t s(t) s(t k )s(t k+  )s(t k+2.  )...s(t k1+n.  ) s(t j )s(t j+  )s(t j+2  )...s(t j+n.  ) Representing time series as MI data  By encoding each sub-sequence ( s(t k ),...,s(t k+n  ) ) as an instance, the representation becomes invariant by translation tktk tjtj  Windows can be chosen of various size to make the representation invariant by rescaling

ECML 2001 The multiple-instance learning problem From B +,B - sets of positive (resp. negative) bags, find a consistent hypothesis H Their exists a function f, such that : lab(b)=1 iff  x  b, f (x) unbiased multiple-instance Learning problem single-tuple bias multi-instance learning [Dietterich 97] Find a function h covering at least one instance per positive bag and no instance from any negative bag Note: the domain of h is the instance space, instead of the bag space

ECML 2001 Extending a propositional learner  We need to represent the bags of instances as a single set of vectors b1+ b2- Adding bag-id and label to each instance  Measure the degree of multiple-instance-consistancy of the hypothesis being refined.  Instead of measuring p(r), n(r), the number of vectors covered by r, compute p*(r), n*(r), the number of bags for which r covers at least one instance Single-tuple coverage measure

ECML 2001 Extension de l ’algorithme Ripper (Cohen 95) Ripper (Cohen 95) is a fast and efficient top-down rule learner, which compares to C4.5 in terms of accuracy, being much faster Naive-RipperMi is the MI-extensions of Ripper Naive-RipperMi is the MI-extensions of Ripper  Naive-Ripper-MI was tested on the musk (Dietterich 97) tasks. On musk1 (avg of 5,2 instances per bag), it achieved good accuracy. On musk2 (avg 65 instances per bag), only 77% of accuracy.

ECML 2001 Empirical Analysis of Naive-RipperMI  Goal: Analyse pathologies linked to the MI problem and to the Naive- Ripper-MI algorithm.  5 positive bags: white triangles bag white squares bag... black triangles bag black squares bag...  5 negative bags: Y X  Misleading litterals  Unrelevant litterals  Litteral selection problem  Analysing the behaviour of NaiveRipperMi on a simple dataset

ECML 2001  Learning task: induce a rules covering of each positive bag.  Learning task: induce a rules covering at least one instance of each positive bag.  Target concept : Y X X > 5 & X < 9 & Y > 3 Analysing Naive-RipperMI

ECML 2001 Y X  1 st step: Naive-RipperMi induces a rule X > 11 & Y < 5 Analysing Naive-RipperMI : misleading litterals  Target concept : X > 5 & X < 9 & Y > 3 Misleading litterals

ECML 2001 Y X  2nd step: Naive-RipperMi removes the covered bag(s), and induces another rule... Analysing Naive-RipperMI : misleading litterals

ECML 2001 Analysing Naive-RipperMI : misleading litterals  Misleading litterals: litterals bringing information gain but contradicting the target concept  Multiple-instance specific phenomenon.  Dispite other single-instance pathologies, (overfitting, attribute selection problem),  Dispite other single-instance pathologies, (overfitting, attribute selection problem), increasing the number of examples won’t help  The « Cover-and-differentiate » algorithm reduced the chance of finding the target concept If l is a misleading litteral, then  l is not. It is thus sufficient, when the litteral l has been induced, to examin  l at the same time. => It is thus sufficient, when the litteral l has been induced, to examin  l at the same time. => partitioning the instance space

ECML 2001 Analysing Naive-RipperMI : misleading litterals Y X  Build a of the instance space  Build a partition of the instance space  Extract the best possible rule : X 5 & Y > 3

ECML 2001 Analysing Naive-RipperMI : irrelevant litterals  In multiple-instance learnig, irrelevant litterals can occur anywhere in the rule, instead of mainly at the end of a rule in the single-instance case  Use  Use global pruning Y X Y 3 & X > 5 & X 3 & X > 5 & X < 9

ECML 2001 X Y Analysing Naive-RipperMI : litteral selection problem  When the number of instances per bag increases, any litteral covers any bag. Thus, we lack information to select a good litterals

ECML 2001 X Y  When the number of instances per bag increases, any litteral covers any bag. Thus, we lack information to select a good litterals Analysing Naive-RipperMI : litteral selection problem

ECML 2001 Analysing Naive-RipperMI : litteral selection problem  We must  We must take into account the number of covered instances  Making an assumption on the distribution of instances can lead to a formal coverage measure + widely studied in MI learning [Blum98,Auer97,...] + simple coverage measure, and good learnability properties - very unrealistic The single distribution model: A bag is made of r instances drawn i.i.d. from a unique distribution The single distribution model: A bag is made of r instances drawn i.i.d. from a unique distribution D The two distribution model: A positive (resp. negative) bag is made of r instances drawn i.i.d. from (resp.) with at least one (resp. none) covered by f. The two distribution model: A positive (resp. negative) bag is made of r instances drawn i.i.d. from D + (resp. D - ) with at least one (resp. none) covered by f. + more realistic - complex formal measure useful for small number of instances (log # bags)  Design algorithms or measures which « work well » with these models

ECML 2001 Analysing Naive-RipperMI : litteral selection problem  Compute for each positif bag Pr(at least one of the k covered instance  target concept) X Y Target concept Y > 5

ECML 2001 # instances per bag Error rate (%) Analysis of RipperMi: experiments  Artificial datasets of 100 bags with a variable number of instances per bag.  Target concept: monomials (hard to learn with 2 instances per bag [Haussler89])  On the mutagenesis problem : NaiveRipperMi: 78% RipperMi-refined-cov: 82%

ECML 2001 Perception W IF Color = blue AND size > 53 THEN DOOR segmentation What is all this ? I see a door lab = door Application : Anchoring symbols [with Bredeche]  Early experiments with NaiveRipperMi reached 80% accuracy

ECML 2001 Conclusion & Future work  Many problems which existed in relational learning appear clearly within the multiple-instance framework.  Algorithms presented here are aimed at solving these problems They were tested on artificial datasets.  Other realistic models, leading to better heuristics  Instance selection and attribute selection  Future work: MI-propositionalization, applying multiple-instance learning to data-mining tasks  Many ongoing applications...