1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005.

Slides:

Advertisements

Similar presentations

Computational Learning Theory

Advertisements

Chapter 11. Hash Tables.

Concept Learning and the General-to-Specific Ordering

Introductory Mathematics & Statistics for Business

1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.

Estimating the detector coverage in a negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital Dipankar Dasgupta The University of Memphis.

By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.

Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.

Energy-Efficient Distributed Algorithms for Ad hoc Wireless Networks Gopal Pandurangan Department of Computer Science Purdue University.

Summary of Convergence Tests for Series and Solved Problems

Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13

Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13

Title Subtitle.

DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.

ADDING INTEGERS 1. POS. + POS. = POS. 2. NEG. + NEG. = NEG. 3. POS. + NEG. OR NEG. + POS. SUBTRACT TAKE SIGN OF BIGGER ABSOLUTE VALUE.

SUBTRACTING INTEGERS 1. CHANGE THE SUBTRACTION SIGN TO ADDITION

MULT. INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.

Query optimisation.

SADC Course in Statistics Introduction to Non- Parametric Methods (Session 19)

The Course experience questionnaire (P. Ramsden) Designed as a performance indicator 24 statements relating to 5 aspects 1 overall satisfaction statement.

Raising Achievement. 2 Aims To explore approaches and materials to support the planning of learning. To consider strategies for preparing learners for.

1 Data Link Protocols By Erik Reeber. 2 Goals Use SPIN to model-check successively more complex protocols Using the protocols in Tannenbaums 3 rd Edition.

STATISTICAL INFERENCE ABOUT MEANS AND PROPORTIONS WITH TWO POPULATIONS

1 The tiling algorithm Learning in feedforward layered networks: the tiling algorithm writed by Marc M é zard and Jean-Pierre Nadal.

Fact-finding Techniques Transparencies

Test Taking Diagnostic Inventory. Register-- Click in the order of your clicker number. I love Eastern. 1. Yes 2. No.

ABC Technology Project

Page Replacement Algorithms

Cache and Virtual Memory Replacement Algorithms

5-1 Chapter 5 Theory & Problems of Probability & Statistics Murray R. Spiegel Sampling Theory.

Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 22 Comparing Two Proportions.

Copyright © 2013, 2009, 2005 Pearson Education, Inc.

1 CS 391L: Machine Learning: Rule Learning Raymond J. Mooney University of Texas at Austin.

Squares and Square Root WALK. Solve each problem REVIEW:

Traditional IR models Jian-Yun Nie.

Routing and Congestion Problems in General Networks Presented by Jun Zou CAS 744.

1 Modeling and Simulation: Exploring Dynamic System Behaviour Chapter9 Optimization.

Machine Learning: Intro and Supervised Classification

Reaching Agreements II. 2 What utility does a deal give an agent? Given encounter  T 1,T 2  in task domain  T,{1,2},c  We define the utility of a.

Chapter 5 Test Review Sections 5-1 through 5-4.

Addition 1’s to 20.

25 seconds left…...

Chapter Algorithms 3.2 The Growth of Functions

We will resume in: 25 Minutes.

CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.

A SMALL TRUTH TO MAKE LIFE 100%

Copyright © Cengage Learning. All rights reserved.

A SMALL TRUTH TO MAKE LIFE 100%

9.2 Absolute Value Equations and Inequalities

1/22 Worst and Best-Case Coverage in Sensor Networks Seapahn Meguerdichian, Farinaz Koushanfar, Miodrag Potkonjak, and Mani Srivastava IEEE TRANSACTIONS.

Why does it work? We have not addressed the question of why does this classifier performs well, given that the assumptions are unlikely to be satisfied.

January Structure of the book Section 1 (Ch 1 – 10) Basic concepts and techniques Section 2 (Ch 11 – 15): Inference for quantitative outcomes Section.

Bart Jansen 1.  Problem definition  Instance: Connected graph G, positive integer k  Question: Is there a spanning tree for G with at least k leaves?

Classification Classification Examples

Lirong Xia Reinforcement Learning (2) Tue, March 21, 2014.

NON - zero sum games.

Information Extraction Lecture 7 – Linear Models (Basic Machine Learning) CIS, LMU München Winter Semester Dr. Alexander Fraser, CIS.

12-Apr-15 Analysis of Algorithms. 2 Time and space To analyze an algorithm means: developing a formula for predicting how fast an algorithm is, based.

On-line learning and Boosting

Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.

Northwestern University Winter 2007 Machine Learning EECS Machine Learning Lecture 13: Computational Learning Theory.

Experts and Boosting Algorithms. Experts: Motivation Given a set of experts –No prior information –No consistent behavior –Goal: Predict as the best expert.

Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.

Speeding Up Relational Data Mining by Learning to Estimate Candidate Hypothesis Scores Frank DiMaio and Jude Shavlik UW-Madison Computer Sciences ICDM.

Machine Learning: Lecture 3

Presentation transcript:

1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005

2/23 What is it all about? Symbolic machine learning. Learning from positive examples instead of positive and negative examples. The talk contains two parts: 1.General ideas and tactics to learn from positives. 2.How the particular ILP system CProgol 4.4 of S. Muggleton (1997) deals with positive only learning

3/23 Disclaimer This talk has not been extracted from a survey or any article in particular: this is more like a patchwork of my experiences in the domain and how I interpret them. Feel free to criticize: I would like feedback on these ideas since I never shared them before. I would really appreciate comments on the slides with the ? sign.

4/23 Definitions Concept spaceInstances space ordering Inferred concept C Positive/Negative example of C Target concept C Is more general / less specific than The concept space is usually partially ordered with this relation

5/23 Positive and Negative Learning Possibility 1: Discrimination of classes Characterise the difference in the pos/neg examples No model of the positive concept ! ?

6/23 Positive and Negative Learning Possibility 2: Characterisation of a class Use neg. examples to prevent over-generalisation Needs neg. examples close to the concept border ?

7/23 Positive Only Learning Aim: Characterisation of a class Choice ?

8/23 Positive Only Learning Two strategies: 1.Bias in the search space: choosing a space with a (very) strong structure. 2.Bias in the evaluation function: choose a concept with a compromise between: –Generality/specificity of the concept –Coverage of the positives by the concept –Complexity of the hypothesis representing the concept ?

9/23 Search space bias approach Main idea: consider strongly organised concept spaces Possible inference algorithm: –Select the concept the least general covering all examples. –The constraints on the search space ensures there is only one such concept. Trivial example (generally not useful), tree organisation:

10/23 Search space bias approach Advantages: –Strong theoretical convergence results possible. –Can lead to (very) fast inference algorithms. Drawback: –Not available for all concepts spaces! –Theorem: super-finite classes of concepts are not inferable in the limit this way (Gold 69). Super-finite = contains all concepts covering a finite number of examples and at least one concept covering an infinity.

11/23 Heuristic Approach Scoring making a compromise between: 1.Specificity of the concept 2.Coverage of the positives by the concept 3.Complexity of the concept Implementations: –Ad-hoc measure of points 1, 2, 3 and combination in a formulae, e.g.: Score = Coverage + Specificity – Complexity –Minimum Message Length ideas (~MDL) ?

12/23 Heuristic Approach: Ad-hoc implementation Elements of the score –Coverage: counting covered instances –Specificity: measure of the proportion of instances of the space covered –Complexity: the size of the concept representation (e.g., number of rules) Advantages: –Usually easy to implement –Usually provides parameters to tune the compromise Disadvantage: –No theory –Bias not always clear –How to combine coverage/specificity/complexity? ?

13/23 Heuristic Approach: MML implementation Canal Examples Hyp.Examples classes ¦ Hyp classes Canal Examples and classes Hyp Examples and classes ¦ Hyp MML for discrimination MML for characterisation Gain = number of bits needed to send the message without compression – number of bits needed to send the message with compression. ?

14/23 Heuristic Approach: MML implementation Advantages: –Some theoretical justifications in Kolmogorov/ Solomonov/ Ockam/ Bayes/ Chaitin works. –Absolute and meaningful score. Disadvantage: –Limit of the theory: the optimal code can NOT be computed ! –Difficult implementation: the choices of the encoding creates the inference biases, this is not very intuitive.

15/23 Positive only learning in ILP with CProgol4.2

16/23 Positive only learning in ILP The following is not a survey! This is from what I already encountered but I have not looked for further references. MML implementations –Muggleton [88] –Srinivasan, Muggleton, Bain [93] –Stahl [96] Other implementations: –Muggleton CProgol4.2 [97] –Heuristic had-hoc method –Somehow based on MML, but the implementation details makes it quite different.

17/23 CProgol4.2 uses Bayes DHDH DIDI D I ¦h h H i I Score: P(h ¦ E) = P(h) * P(E ¦ h) / P(E) Fixing distributions and computing P(h), P(E ¦ h), P(E)

18/23 Assumptions for the distributions P(h) = e - size(h) –Large theories are less probable than small ones –size(h) = sum over the rules c i of h of the number of literals in the body of c i P(E ¦ h) = Π e E D I¦h (e) = Π e E D I (e) / D I (h) –Assumption that D I and D H gives D I¦h –Independence assumption between examples

19/23 Replacing in Bayes P(h ¦ E) = e - size(h) * [ Π e E D I (e) / D I (h) ] / P(E) As we want to compare hypotheses: = [ e - size(h) / D I (h) |E| ] * Cste1 Take the log: ln(P(h ¦ E)) = -size(h) + |E| * ln(1/D I (h)) + Cste2 We still have to compute D I (h)...

20/23 D I (h): weight of h in the instance set Computing D I : –Using a stochastic logic program S trained with the BK to model D I (not included in the talk) Computing D I (h): –Generate R instances from D I –h covers r of them –D I (h) = (r+1) / (R+2) H

21/23 Formulae for a whole theory covering E ln(P(h ¦ E)) = -size(h) - |E| * ln((r+1)/(R+2)) + C2 ComplexitySpecificityCoverage Estimation of final theory score from a partially inferred theory: ln(P(h ¦ E)) = |E|/p * size(h) - |E| * ln( |E|/p * (r+1)/(R+2)) + C3

22/23 Final evaluation Suppression of |E| and C2: –f(h) = size(h) /p + ln(p) - ln(|E| * (r+1)/(R+2)) Possible boost of positives with k: –size(h)/(k*p) + ln(k*p) - ln( |E|*(r+1)/(R+2) ) The formulae is not written anywhere (the above one is my best guess !). The papers are hard to understand But it seems to work... ComplexitySpecificityCoverage

23/23 Conclusion Learning from positives only is a real challenge and methods from positive and negatives can hardly be adapted. Some nice theoretical frameworks exist. When it gets to implementing heuristic frameworks: –The theory is often lost in approximations and choices of implementation. –Useful systems can be created but tuning and understanding the biases have to be considered as very important stages of inference.