Download presentation
Presentation is loading. Please wait.
Published byMarylou Little Modified over 9 years ago
1
Descriptive ILP for Mathematical Discovery Simon Colton Computational Bioinformatics Lab Department of Computing Imperial College, London
2
Overview Inductive logic programming Predictive versus descriptive induction The HR program Applications to mathematics All new application to vision data Future work
3
Logic Programs Representation language Subset of first order logic Conjunction of literals (body) implies a single literal (head) E.g., p(X) q(Y,X) r(X) Huge amount of theoretical work Many knowledge bases Even a programming language (Prolog)
4
Inductive Logic Programming Machine learning Improve with respect to a task Usually classification/prediction via concept/rule learning In light of experience Usually data/background knowledge Given: Background logic programs, B Positive examples E + and negative examples E - Learn: A hypothesis logic program, H such that (B H) E + and (B H) E -
5
Predictive Induction AI went down the problem solving path Everything shoe-horned into this paradigm Set up a problem as follows: Given a set of positives and a set of negatives Learn a reason why positives are positives and the negatives are negative Reasons are rulesets/concepts/math-formulae Positives only Manage the same task without negatives Reasons allow for prediction of the class of Of unseen examples Predictive ILP programs: Progol, FOIL, …
6
Descriptive Induction Much broader remit: Given the same background and data Find something interesting about the data Interesting thing may be: A particular example A concept which categorises examples A hypothesis which relates concepts An explanation of a hypothesis Descriptive ILP systems: Claudien, Warmr
7
Descriptive versus Predictive Predictive: “You know what you’re looking for, but you don’t know what it looks like” Descriptive: “You don’t know what you’re looking for” Future: “You don’t even know you’re looking for something” E.g., adding an entry to a database
8
Example – the Animals Classic toy problem for ILP Given details of a set of animals E.g., covering, milk, homeothermic, legs Learn reasons for categorisation into: Mammals, reptiles, birds, fish Descriptive induction finds the same rules But as conjectures, not as solutions to a problem Finds other things of interest: E.g., “Did you see that the duck-billed platypus is the only mammal which lays eggs?”
9
The HR Program Performs descriptive induction Developed since 1997 Edinburgh, York, Imperial PhD (prolog), PostDoc (java), beyond (projects) Mainly applied to mathematics But developed as a general purpose ML program Also applied to AI tasks Predictive learning, theorem proving, CSPs
10
Automated Theory Formation A theory consists of at least: Examples, concept, conjectures, proofs Given background knowledge Examples, concepts, axioms E.g., groups, multiplication, identity axiom Theory formation proceeds by a cycle of: Concept formation Conjecture making Proof/disproof attempts Assessment of concepts
11
Concept Formation 15 production rules Take old concept(s) and produce new ones Most PRs are generic Exists, forall, compose, negate, split, size, match, equal, disjunct, linear constraint A few are more domain specific Record, arithmetic, embed graph,
12
Example Construction Odd prime numbers Split Negate Size Compose [a,b] : b|a [a,n]:n = |{b:b|a}| [a]:2=|{b:b|a}| [a] : 2|a [a] : not 2|a [a]:2=|{b:b|a}| & not 2|a Split
13
Conjecture Making Conjectures made empirically At each attempt to make a new concept New concept has same success set as old one An equivalence conjecture is made New concept has empty success set A non-existence conjecture is made Success set is a proper subset/superset An implication conjecture is made More succinct results are extracted from these Implications, implicates and prime implicates
14
Explanation Handling User supplies some axioms of domain E.g., three axioms of group theory HR appeals to third party ATP software Uses Otter to try to prove that each conjecture follows from the axioms If unsuccessful, uses Mace to try and find a counterexample New example added to the theory Other reasoning software used (MathWeb)
15
Interestingness HR has many types of search: BFS, DFS, random, tiered, reactive, heuristic Usually depth limit (complexity) Heuristic search: Measure the “interestingness” of each concept Build new concepts from the best old ones Intrinsic measures Comprehensibility, applicability, variety, … Relative values Novelty, child/parent, interestingness of conjectures Utilitarian measures Coverage, highlight, invariance, discrimination, …
16
Handling Uncertain Data Given a genuinely new concept HR tries to make conjectures opportunistically Near-implications One concept has nearly a subset of examples of another User specifies the minimum percentage match, e.g., 20% E.g., prime numbers are odd (only one exception) Near-equivalences Two concepts have nearly the same success sets Often choose to look at the positives only Near-nonexists New concept has very few examples E.g., even prime number if and only if equal to 2
17
Mathematical Discovery HR has been applied to: Number theory, graph theory, algebraic domains, e.g., group theory, ring theory Early approaches were entirely descriptive Interesting concepts and conjectures looked for in the theories HR formed Later approaches largely driven by particular applications For mathematical reasons and for AI reasons
18
Number Theory First big success for HR Encyclopedia of Integer sequences Coming up to 100,000 sequences Such as primes, squares, odds, evens, fibonacci HR invented 20 new sequences Which are now in the encyclopedia All supplied with reasons why they are interesting Some nice examples Odd refactorable numbers are perfect squares Sum of divisors is prime number of divisors is prime
19
Spin-off Systems NumbersWithNames www.machine-creativity.com/programs/nwn Data-mines a subset of the encyclopedia E.g., Perfect numbers are pernicious HOMER Available online soon HR, Otter and Maple combination Maple background file, HR forms conjectures, Otter acts as a filter (provable theorems in the bin) Interacts with the user Demonstration
20
Graph Theory Siemion Fajtlowicz & Ermelinda Delavina Graffiti program and the writing on the wall Scores of papers written about the conjectures Including by Paul Erdos MSc. Project of Noor Mohamadali Use HR to re-discover Graffiti’s conjectures Very successful Found all the ones we expected it to, and some others which Graffiti hasn’t found Currently being proved/disproved by the AGX program Pierre Hansen and Giles Caporossi
21
Algebraic Domains Otter/Mace are pretty good in these domains HR can be completely bootstrapping Start from the axioms of the domain alone Building of classification trees HR used in a predictive induction way Find classifying concepts (distinguish pairs of algebras) Part of a larger system which verifies the results: IJCAR04
22
Example Classification Groups of size eight can be classified using their self inverse elements (x -1 =id) “They will either have (i) all self inverse elements (ii) an element which squares to give a non-self inverse element (iii) no self-inverse elements which aren't also commutators (iv) a self inverse element which can be expressed as the product of two non- commutative elements or (v) none of these properties.” Classification tree produced for loops of size 6 109 isomorphism classes
23
AI Applications Machine learning Used for bioinformatics datasets Constraint solving Reformulation of CSPs (implied constraints) E.g., QG3-quasigroups are anti-Abelian Automated reasoning Producing theorems for TPTP library 40,000 possible theorems in 10 minutes Around 100 accepted into the library “Proving” non-theorems
24
The TM System Inspired by Lakatos’ philosophy of maths PhD project of Alison Pease Starts with a non-theorem Uses Mace to: Find counterexamples and supporting examples Uses HR to find concept which is true of a subset of counters or supporting examples Adds appropriate axioms Uses Otter to prove reformulated theorem
25
Example from TM Non-theorem from TPTP library: In ring theory, the following identity is true w x ((((w*w)*x)*(w*w))=id) Mace found 7 supporting examples and 6 falsifying examples HR found a single concept true of 3 positives: ¬( b (b*b=b & ¬(b+b=b))) “In rings for which, for all b, b*b=b implies that b+b=b, the identity is true” Proved by Otter, hand proof by Roy McCasland
26
Vision Application Work with Paulo Santos and Derek Magee Vision data: Learning the rules of a dice game Context of autonomous agents Learning/adapting from observations alone Ask Paulo et. al about the vision side! First serious application of HR’s handling of noisy, incomplete data First quantitative comparison of HR and Progol (predictive ILP system)
27
Progol Setup State and successor data transformed to: Transition/4 predicates E.g., trans([c1],[c1,c2],t101,t103) Mode declarations chosen wisely Application of a language bias Rules extracted from the answer to a positives- only learning problem: Find a set of rules to cover the trans/4 predicate Also a meta-level application of Progol See Santos et. al ECAI paper
28
HR Setup State and successor data transformed to: state0/1, state1/2 and state2/2 predicates (HR cannot handle lists) Various different setups tried with HR Using match, compose, exists, split PRs Using near-implications at 50%, 70%, 100% Using different search strategies BFS, split-first, compose-last Rules extracted from the theory produced i.e., no explicit problem to solve
29
Experiments 8 different sessions from vision system Plus one with all the data Looking for 18 rules of the game Empty becomes 1-state 1-state becomes 2-state 2 equal dice faces are both removed Otherwise, the smallest of the two is removed 15 instantiations of this rule Recorded how many rules each system found Also recorded the coverage of the 15 rules Found in the data
30
Sensitivity Results
31
HR Results First application of HR with uncertain data Wanted to assess it’s performance Altering parameters Affected sensitivity in entirely expected ways Increasing these increases sensitivity: Number of steps Complexity limit (but not too much) Decreasing percentage match increases sensitivity HR able to pick up conjectures which are only 65% true Altering the search strategy had little effect
32
Drawbacks to Approaches Predictive approach Problem may be solved without finding some game rules Due to over-generalisation with little data Need to know the mode declarations in advance In general, rules will require the solution of many different prediction problems Descriptive approach Can take a long time to find rules (10x slower) In general, rules will not be so simple to find Can produce a glut of information from which it’s difficult to extract the pertinent rules Descriptive best in theory, predictive slightly better in practice (at least for this data set, but we’ll see…)
33
Proposed Combination Can we get the best of both worlds? Idea: Use HR for an initial theory formation exploration Look at some rules found Extract a general format for them Turn general rules into mode declarations Use Progol to solve individual problems In practice, HR may take too long to work But the process would be entirely automatic
34
Conclusions HR shown good abilities for discovery in pure mathematics and AI applications Characterised by lack of noise, concrete axioms Not clear that the application to discovery in noisy, incomplete data sets will be as good
35
Future Work Applying for grants To continue the application to CSP reformulation With Ian Miguel To continue the Lakatos approach, possibly with applications to software verification With Alison Pease My focus is on integration of Learning, constraint solving, automated reasoning Application to creative tasks in Mathematics, bioinformatics and vision!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.