Descriptive ILP for Mathematical Discovery Simon Colton Computational Bioinformatics Lab Department of Computing Imperial College, London.

Slides:

Advertisements

Similar presentations

Artificial Intelligence 12. Two Layer ANNs

Advertisements

Artificial Intelligence 14. Inductive Logic Programming

Artificial Intelligence 9. Resolution Theorem Proving

Logic Programming Automated Reasoning in practice.

Basic Structures: Sets, Functions, Sequences, Sums, and Matrices

Basic Structures: Sets, Functions, Sequences, Sums, and Matrices

Elementary Number Theory and Methods of Proof

Automated Exploration of Bioinformatics Spaces Simon Colton Computational Bioinformatics Laboratory.

Induction and recursion

Constraint Logic Programming Ryan Kinworthy. Overview Introduction Logic Programming LP as a constraint programming language Constraint Logic Programming.

LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.

Logic and Proof. Argument An argument is a sequence of statements. All statements but the first one are called assumptions or hypothesis. The final statement.

CSE115/ENGR160 Discrete Mathematics 02/07/12

LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.

Copyright © Cengage Learning. All rights reserved.

CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 22 Jim Martin.

Automated Puzzle Generation Simon Colton Universities of Edinburgh and York.

Automated Theory Formation for Tutoring Tasks in Pure Mathematics Simon Colton, Roy McCasland, Alan Bundy, Toby Walsh.

ILP for Mathematical Discovery Simon Colton & Stephen Muggleton Computational Bioinformatics Laboratory Imperial College.

Creative Logic Programming Simon Colton Computational Bioinformatics Laboratory Imperial College London.

The HOMER System for Discovery in Number Theory Simon Colton Imperial College, London.

C OURSE : D ISCRETE STRUCTURE CODE : ICS 252 Lecturer: Shamiel Hashim 1 lecturer:Shamiel Hashim second semester Prepared by: amani Omer.

Induction and recursion

Lakatos-style Methods in Automated Reasoning Alison Pease University of Edinburgh Simon Colton Imperial College, London.

Automated Theory Formation: First Steps in Bioinformatics Simon Colton Computational Bioinformatics Laboratory.

CS 478 – Tools for Machine Learning and Data Mining The Need for and Role of Bias.

The TM System for Repairing Non-Theorems Alison Pease – University of Edinburgh Simon Colton – Imperial College, London.

Automated Theory Formation in Bioinformatics Simon Colton Computational Bioinformatics Lab Imperial College, London.

Automated Reasoning for Classifying Finite Algebras Simon Colton Computational Bioinformatics Laboratory Imperial College, London.

MATH 224 – Discrete Mathematics

Theory Revision Chris Murphy. The Problem Sometimes we: – Have theories for existing data that do not match new data – Do not want to repeat learning.

Section 1.8. Section Summary Proof by Cases Existence Proofs Constructive Nonconstructive Disproof by Counterexample Nonexistence Proofs Uniqueness Proofs.

What toolbox is necessary for building exercise environments for algebraic transformations Rein Prank University of Tartu

2012: J Paul GibsonTSP: MSC SAI Mathematical FoundationsMAT7003.ProofsWithRodin.1 MAT 7003 : Mathematical Foundations (for Software Engineering) J Paul.

A Theory of Theory Formation Simon Colton Universities of Edinburgh and York.

Edinburgh and Calculemus Simon Colton Universities of Edinburgh and York.

© Gudmund Grov & Andrew Ireland Dependable Systems Group Planning for System Development Gudmund Grov & Andrew Ireland Dependable Systems Group School.

The HR Program for Theorem Generation Simon Colton Mathematical Reasoning Group University of Edinburgh.

Making Conjectures About Maple Functions Simon Colton Universities of Edinburgh & York.

Working Group 4 Creative Systems for Knowledge Management in Life Sciences.

Formal Specification of Intrusion Signatures and Detection Rules By Jean-Philippe Pouzol and Mireille Ducassé 15 th IEEE Computer Security Foundations.

A Theory of Theory Formation Simon Colton Universities of Edinburgh and York.

For Monday Finish chapter 19 No homework. Program 4 Any questions?

Automated Reasoning Early AI explored how to automated several reasoning tasks – these were solved by what we might call weak problem solving methods as.

For Monday Finish chapter 19 Take-home exam due. Program 4 Any questions?

ICS 253: Discrete Structures I Induction and Recursion King Fahd University of Petroleum & Minerals Information & Computer Science Department.

Automated Discovery in Pure Mathematics Simon Colton Universities of Edinburgh and York.

The Homer System Simon Colton – Imperial College, London Sophie Huczynska – University of Edinburgh.

CS 103 Discrete Structures Lecture 13 Induction and Recursion (1)

Complexity & Computability. Limitations of computer science  Major reasons useful calculations cannot be done:  execution time of program is too long.

Year 9 Proof Dr J Frost Last modified: 19 th February 2015 Objectives: Understand what is meant by a proof, and examples.

Automatic Generation of First Order Theorems Simon Colton Universities of Edinburgh and York Funded by EPSRC grant GR/M98012 and the Calculemus Network.

CS104:Discrete Structures Chapter 2: Proof Techniques.

Week 3 - Monday.  What did we talk about last time?  Predicate logic  Multiple quantifiers  Negating multiple quantifiers  Arguments with quantified.

CompSci 102 Discrete Math for Computer Science March 13, 2012 Prof. Rodger Slides modified from Rosen.

Automated Theorem Discovery Simon Colton Universities of Edinburgh and York.

Chapter 5. Section 5.1 Climbing an Infinite Ladder Suppose we have an infinite ladder: 1.We can reach the first rung of the ladder. 2.If we can reach.

Calculation Invention and Deduction Dr. Simon Colton Imperial College London (Formerly at Edinburgh) YVR in Karlsruhe & Saarbrucken.

Machine Creativity Edinburgh Simon Colton Universities of Edinburgh and York.

Artificial Intelligence Knowledge Representation.

Discovery Systems Author: Kenneth W. Hasse Jr. Presenter: Peter Yoon.

Chapter 2 1. Chapter Summary Sets (This Slide) The Language of Sets - Sec 2.1 – Lecture 8 Set Operations and Set Identities - Sec 2.2 – Lecture 9 Functions.

Investigate Plan Design Create Evaluate (Test it to objective evaluation at each stage of the design cycle) state – describe - explain the problem some.

ESFOR Panel Application Developers’ Wish Lists for Automated Theorem Provers.

TOK: Mathematics Unit 1 Day 1. 2 – B 2 = AB – B 2 Factorize both sides: (A+B)(A-B) = B(A-B) Divide both sides by (A-B): A = B = B Since A = B, B+B=B Add.

Mathematical Induction What it is? Why is it a legitimate proof method? How to use it?

Proof And Strategies Chapter 2. Lecturer: Amani Mahajoub Omer Department of Computer Science and Software Engineering Discrete Structures Definition Discrete.

Direct Proof and Counterexample III: Divisibility

CSE15 Discrete Mathematics 02/08/17

Artificial Intelligence 12. Two Layer ANNs

Presentation transcript:

Descriptive ILP for Mathematical Discovery Simon Colton Computational Bioinformatics Lab Department of Computing Imperial College, London

Overview Inductive logic programming Predictive versus descriptive induction The HR program Applications to mathematics All new application to vision data Future work

Logic Programs Representation language Subset of first order logic Conjunction of literals (body) implies a single literal (head) E.g., p(X)  q(Y,X)  r(X) Huge amount of theoretical work Many knowledge bases Even a programming language (Prolog)

Inductive Logic Programming Machine learning Improve with respect to a task Usually classification/prediction via concept/rule learning In light of experience Usually data/background knowledge Given: Background logic programs, B Positive examples E + and negative examples E - Learn: A hypothesis logic program, H such that (B  H)  E + and (B  H) E -

Predictive Induction AI went down the problem solving path Everything shoe-horned into this paradigm Set up a problem as follows: Given a set of positives and a set of negatives Learn a reason why positives are positives and the negatives are negative Reasons are rulesets/concepts/math-formulae Positives only Manage the same task without negatives Reasons allow for prediction of the class of Of unseen examples Predictive ILP programs: Progol, FOIL, …

Descriptive Induction Much broader remit: Given the same background and data Find something interesting about the data Interesting thing may be: A particular example A concept which categorises examples A hypothesis which relates concepts An explanation of a hypothesis Descriptive ILP systems: Claudien, Warmr

Descriptive versus Predictive Predictive: “You know what you’re looking for, but you don’t know what it looks like” Descriptive: “You don’t know what you’re looking for” Future: “You don’t even know you’re looking for something” E.g., adding an entry to a database

Example – the Animals Classic toy problem for ILP Given details of a set of animals E.g., covering, milk, homeothermic, legs Learn reasons for categorisation into: Mammals, reptiles, birds, fish Descriptive induction finds the same rules But as conjectures, not as solutions to a problem Finds other things of interest: E.g., “Did you see that the duck-billed platypus is the only mammal which lays eggs?”

The HR Program Performs descriptive induction Developed since 1997 Edinburgh, York, Imperial PhD (prolog), PostDoc (java), beyond (projects) Mainly applied to mathematics But developed as a general purpose ML program Also applied to AI tasks Predictive learning, theorem proving, CSPs

Automated Theory Formation A theory consists of at least: Examples, concept, conjectures, proofs Given background knowledge Examples, concepts, axioms E.g., groups, multiplication, identity axiom Theory formation proceeds by a cycle of: Concept formation Conjecture making Proof/disproof attempts Assessment of concepts

Concept Formation 15 production rules Take old concept(s) and produce new ones Most PRs are generic Exists, forall, compose, negate, split, size, match, equal, disjunct, linear constraint A few are more domain specific Record, arithmetic, embed graph,

Example Construction Odd prime numbers Split Negate Size Compose [a,b] : b|a [a,n]:n = |{b:b|a}| [a]:2=|{b:b|a}| [a] : 2|a [a] : not 2|a [a]:2=|{b:b|a}| & not 2|a Split

Conjecture Making Conjectures made empirically At each attempt to make a new concept New concept has same success set as old one An equivalence conjecture is made New concept has empty success set A non-existence conjecture is made Success set is a proper subset/superset An implication conjecture is made More succinct results are extracted from these Implications, implicates and prime implicates

Explanation Handling User supplies some axioms of domain E.g., three axioms of group theory HR appeals to third party ATP software Uses Otter to try to prove that each conjecture follows from the axioms If unsuccessful, uses Mace to try and find a counterexample New example added to the theory Other reasoning software used (MathWeb)

Interestingness HR has many types of search: BFS, DFS, random, tiered, reactive, heuristic Usually depth limit (complexity) Heuristic search: Measure the “interestingness” of each concept Build new concepts from the best old ones Intrinsic measures Comprehensibility, applicability, variety, … Relative values Novelty, child/parent, interestingness of conjectures Utilitarian measures Coverage, highlight, invariance, discrimination, …

Handling Uncertain Data Given a genuinely new concept HR tries to make conjectures opportunistically Near-implications One concept has nearly a subset of examples of another User specifies the minimum percentage match, e.g., 20% E.g., prime numbers are odd (only one exception) Near-equivalences Two concepts have nearly the same success sets Often choose to look at the positives only Near-nonexists New concept has very few examples E.g., even prime number if and only if equal to 2

Mathematical Discovery HR has been applied to: Number theory, graph theory, algebraic domains, e.g., group theory, ring theory Early approaches were entirely descriptive Interesting concepts and conjectures looked for in the theories HR formed Later approaches largely driven by particular applications For mathematical reasons and for AI reasons

Number Theory First big success for HR Encyclopedia of Integer sequences Coming up to 100,000 sequences Such as primes, squares, odds, evens, fibonacci HR invented 20 new sequences Which are now in the encyclopedia All supplied with reasons why they are interesting Some nice examples Odd refactorable numbers are perfect squares Sum of divisors is prime  number of divisors is prime

Spin-off Systems NumbersWithNames Data-mines a subset of the encyclopedia E.g., Perfect numbers are pernicious HOMER Available online soon HR, Otter and Maple combination Maple background file, HR forms conjectures, Otter acts as a filter (provable theorems in the bin) Interacts with the user Demonstration

Graph Theory Siemion Fajtlowicz & Ermelinda Delavina Graffiti program and the writing on the wall Scores of papers written about the conjectures Including by Paul Erdos MSc. Project of Noor Mohamadali Use HR to re-discover Graffiti’s conjectures Very successful Found all the ones we expected it to, and some others which Graffiti hasn’t found Currently being proved/disproved by the AGX program Pierre Hansen and Giles Caporossi

Algebraic Domains Otter/Mace are pretty good in these domains HR can be completely bootstrapping Start from the axioms of the domain alone Building of classification trees HR used in a predictive induction way Find classifying concepts (distinguish pairs of algebras) Part of a larger system which verifies the results: IJCAR04

Example Classification Groups of size eight can be classified using their self inverse elements (x -1 =id) “They will either have (i) all self inverse elements (ii) an element which squares to give a non-self inverse element (iii) no self-inverse elements which aren't also commutators (iv) a self inverse element which can be expressed as the product of two non- commutative elements or (v) none of these properties.” Classification tree produced for loops of size isomorphism classes

AI Applications Machine learning Used for bioinformatics datasets Constraint solving Reformulation of CSPs (implied constraints) E.g., QG3-quasigroups are anti-Abelian Automated reasoning Producing theorems for TPTP library 40,000 possible theorems in 10 minutes Around 100 accepted into the library “Proving” non-theorems

The TM System Inspired by Lakatos’ philosophy of maths PhD project of Alison Pease Starts with a non-theorem Uses Mace to: Find counterexamples and supporting examples Uses HR to find concept which is true of a subset of counters or supporting examples Adds appropriate axioms Uses Otter to prove reformulated theorem

Example from TM Non-theorem from TPTP library: In ring theory, the following identity is true  w x ((((w*w)*x)*(w*w))=id) Mace found 7 supporting examples and 6 falsifying examples HR found a single concept true of 3 positives: ¬(  b (b*b=b & ¬(b+b=b))) “In rings for which, for all b, b*b=b implies that b+b=b, the identity is true” Proved by Otter, hand proof by Roy McCasland

Vision Application Work with Paulo Santos and Derek Magee Vision data: Learning the rules of a dice game Context of autonomous agents Learning/adapting from observations alone Ask Paulo et. al about the vision side! First serious application of HR’s handling of noisy, incomplete data First quantitative comparison of HR and Progol (predictive ILP system)

Progol Setup State and successor data transformed to: Transition/4 predicates E.g., trans([c1],[c1,c2],t101,t103) Mode declarations chosen wisely Application of a language bias Rules extracted from the answer to a positives- only learning problem: Find a set of rules to cover the trans/4 predicate Also a meta-level application of Progol See Santos et. al ECAI paper

HR Setup State and successor data transformed to: state0/1, state1/2 and state2/2 predicates (HR cannot handle lists) Various different setups tried with HR Using match, compose, exists, split PRs Using near-implications at 50%, 70%, 100% Using different search strategies BFS, split-first, compose-last Rules extracted from the theory produced i.e., no explicit problem to solve

Experiments 8 different sessions from vision system Plus one with all the data Looking for 18 rules of the game Empty becomes 1-state 1-state becomes 2-state 2 equal dice faces are both removed Otherwise, the smallest of the two is removed 15 instantiations of this rule Recorded how many rules each system found Also recorded the coverage of the 15 rules Found in the data

Sensitivity Results

HR Results First application of HR with uncertain data Wanted to assess it’s performance Altering parameters Affected sensitivity in entirely expected ways Increasing these increases sensitivity: Number of steps Complexity limit (but not too much) Decreasing percentage match increases sensitivity HR able to pick up conjectures which are only 65% true Altering the search strategy had little effect

Drawbacks to Approaches Predictive approach Problem may be solved without finding some game rules Due to over-generalisation with little data Need to know the mode declarations in advance In general, rules will require the solution of many different prediction problems Descriptive approach Can take a long time to find rules (10x slower) In general, rules will not be so simple to find Can produce a glut of information from which it’s difficult to extract the pertinent rules Descriptive best in theory, predictive slightly better in practice (at least for this data set, but we’ll see…)

Proposed Combination Can we get the best of both worlds? Idea: Use HR for an initial theory formation exploration Look at some rules found Extract a general format for them Turn general rules into mode declarations Use Progol to solve individual problems In practice, HR may take too long to work But the process would be entirely automatic

Conclusions HR shown good abilities for discovery in pure mathematics and AI applications Characterised by lack of noise, concrete axioms Not clear that the application to discovery in noisy, incomplete data sets will be as good

Future Work Applying for grants To continue the application to CSP reformulation With Ian Miguel To continue the Lakatos approach, possibly with applications to software verification With Alison Pease My focus is on integration of Learning, constraint solving, automated reasoning Application to creative tasks in Mathematics, bioinformatics and vision!