Presentation is loading. Please wait.

Presentation is loading. Please wait.

Simplicity, Induction, and Scientific Discovery School of Computing Science Simon Fraser University Vancouver, Canada.

Similar presentations


Presentation on theme: "Simplicity, Induction, and Scientific Discovery School of Computing Science Simon Fraser University Vancouver, Canada."— Presentation transcript:

1 Simplicity, Induction, and Scientific Discovery School of Computing Science Simon Fraser University Vancouver, Canada

2 2/36 Outline Mind Change Minimization and Simplicity. Introductory Examples. Existence inquiry. Riddle of Induction. Necessary and Sufficient Condition for Mind Change Bounds. Connection with Point-Set Topology. Advanced Examples. Learning Conservation Laws in Particle Physics. Constraint-Based Learning of Bayes Nets. Simplicity, Induction, Scientific Discovery

3 3/36 Learning With Mind Change Bounds Simplicity, Induction, Scientific Discovery

4 4/36 Simplicity and Steady Progress Both simple and complex routes may get us to the same place. But favoring simplicity gets us there more efficiently. Kelly, K.T. How Simplicity Helps You Find the Truth Without Pointing at it. In Induction, Algorithmic Learning Theory, and Philosophy, 2007, Springer.

5 5/36 Learning and Steady Convergence Minimizing Mind Changes can be seen as an objective function for learning, like a loss function. Standard loss function: loss(hypothesis, true model). Mind Changes: loss(sequence of hypotheses). Simplicity, Induction, Scientific Discovery learning time output hypothesis learning time goodworse

6 6/36 Simplicity, Induction, Scientific Discovery Mind Change Bound Example Is a certain reaction possible?, e.g. r = n + n  p + p + e - + e - Rules Learner makes conjecture “yes” or “no” Adversary shows experimental outcomes (“observed” or not). Learner pays for abandoning “yes” or “no”. r observed no r r observed no r r observed no r ….

7 7/36 Simplicity, Induction, Scientific Discovery The New Riddle of Induction Goodman (1983). “Grue applies to all things examined before t just in case they are green but to other things just in case they are blue.” Rules Learner projects generalization (e.g. “all green”) Adversary chooses color of next emerald. Learner pays for mistaken predictions. All green …. …. All blue …. All grue 1 …. All grue 2

8 8/36 Description Length, Green and Grue One of the reasons philosophers are interested in the Riddle of Induction is that it illustrates how descriptive simplicity can depend on choice of vocabulary. Simplicity, Induction, Scientific Discovery Basic predicates: green, blue HypothesisDefinition all green all grue t Basic predicates: grue t, bleen t HypothesisDefinition all green all grue t grue t = green up to time t, blue thereafter. bleen t = blue up to time t, green thereafter.

9 9/36 Topology and Mind Change Bounds Simplicity, Induction, Scientific Discovery

10 10/36 Convergence in the Limit A learning problem consists of: A hypothesis space H. A space D of possible complete data sequences. A correctness notion that specifies which hypothesis H is correct for which data sequence D. A learner outputs a hypothesis on every finite (partial) data sequence. May also output ? for “no conclusion yet”. A learner converges to a correct hypothesis in the limit if, on every data sequence D in D, after some finite time, the learner’s conjecture is always correct for D. Putnam, H., 1963. “Degree of Confirmation and Inductive Logic”, in The Philosophy of Rudolf Carnap. Gold, E., "Language Identification in the Limit," Information and Control, 10, 1967. Kelly, K., 1996. The Logic of Reliable Inquiry, Oxford: Oxford University Press. “all green” “all grue 1 ” “all grue 2 ”.... Hypothesis Space output hypothesis learning time

11 11/36 Mind Change Bounds A learner makes a mind change on a data sequence at time m+1 if its conjecture at time m is in H and is different from the conjecture at time m+1. A learning problem (H,D) is solvable with at most k mind changes if there is a learner that converges to a correct hypothesis and changes its mind at most k times before convergence. Putnam, H., 1965. “Trial and Error Predicates and the Solution to a Problem of Mostowski”, in The Journal of Symbolic Logic, 30(1): 49–57. Convergent (Reliable) +Mind-change bounded learners Learners

12 12/36 Topology on a Hypothesis Space A hypothesis H is an isolated point in hypothesis space H if there is a finite data sequence such that H is the only hypothesis in H consistent with the data. Write H’ for the set of isolated points of H. Successively eliminate isolated points: 1. H 0 = H-H’. 2. H i+1 = H i – H i ’. The accumulation order of H is the least index i s.t. H i+1 = H i. G. Cantor, Grundlagen einer allgemeinen Mannigfaltigkeitslehre, 1883. Apsitis, K., 1994. “Derived sets and inductive inference” ALT 1994. Mind Change Efficient Learning. W. Luo and O. Schulte COLT 2005.

13 13/36 Examples Simplicity, Induction, Scientific Discovery r observed no r r observed no r r observed no r All grue 1 All grue 2 All green …. …. All blue …. no r

14 14/36 Topology and Mind-Change Bounds Theorem (Luo and Schulte 1995) A learning problem (H,D) is solvable with k mind changes if and only if 1. the accumulation order of H is at most k. 2. H k is the empty set. Simplicity, Induction, Scientific Discovery

15 15/36 Topology and Inductive Simplicity The simplicity rank of a hypothesis H is the last stage at which H is eliminated. Greater simplicity rank  greater inductive simplicity. Simplicity, Induction, Scientific Discovery “all grue 1 ” “all grue 2 ”.... “all green” rank 1 rank 0

16 16/36 Mind-Change Optimality Suppose we add convergence time admissibility (Gold 1967) to mind-change optimality. Then there is a unique mind-change optimal learner (Schulte, Luo, Greiner 2007, 2010). Mind-change optimal learning of Bayes net structure from dependency and independency data. Schulte, O., W. Luo, and R. Greiner (2010). Information and Computation, 208:63-82. Convergent +Mind-change bounded Learners +Time- Admissible is there a unique simplest hypothesis consistent with the data? output simplest hypothesis output ? (no conclusion) no yes

17 17/36 Related Work and Extensions Mind change bounds are related to mistake bounds in statistical learning theory (Jain and Sharma 1999). Mind-change solvability requires logical (in)consistency with the data. Relaxed by Kelly, including for statistical applications (Kelly and Mayo-Wilson 2010). Simplicity rank is a kind of degree of falsifiability. Jain, S. and Sharma, A. On a generalized notion of mistake bounds. COLT 1999. Kelly, K.T. and Mayo-Wilson, C. Causal Conclusions that Flip Repeatedly and Their Justification. UAI 2010.

18 18/36 Learning Conservation Laws in Particle Physics Simplicity, Induction, Scientific Discovery

19 19/36 Simplicity, Induction, Scientific Discovery 19 /17 Example: Particle Physics Reactions and Quantities represented as Vectors (Aris 69; Valdés-Pérez 94, 96) i = 1,…n entities r(i) = # of entity i among reagents - # of entity i among products. A quantity is conserved in a reaction if and only if the corresponding vectors are orthogonal. A reaction is “possible” iff it conserves all quantities.

20 20/36 Simplicity, Induction, Scientific Discovery 20 /17 Conserved Quantities in the Standard Model Standard Model based on Gell-Mann’s quark model (1964). Full set of particles: n = 193. Quantity Particle Family (Cluster).

21 21/36 The Learning Task (Toy Example) Simplicity, Induction, Scientific Discovery Given: 1.fixed list of known detectable particles. 2.Input reactions Reactions Reaction Matrix R Output Quantity Matrix Q Learning Cols in Q are conserved, so RQ = 0. Not Given: 1.# of quantities 2.Interpretation of quantities.

22 22/36 Inductive Simplicity of Conservation Laws Simplicity rank of a set of conserved quantities = number of independent quantities (in the sense of linear algebra). Therefore the mind-change optimal method chooses a maximum rank conservation matrix consistent with the data (Schulte 2001). Can be computed as a basis for the nullspace of the observed reaction matrix. Least generalization: Rules out as many unobserved reactions as possible. Inferring Conservation Principles in Particle Physics: A Case Study in the Problem of Induction. Oliver Schulte (2001). The British Journal for the Philosophy of Science, 51: 771-806. R the smallest generalization of observed reactions R = linear span of R larger generalization of observed reactions R Unobserved allowed reactions

23 23/36 Simplicity, Induction, Scientific Discovery System for Finding a Maximally Strict Set of Selection Rules Read in Observed Reactions Convert to list of vectors R Compute basis Q for nullspace of R from database using conversion utility Maple function nullspace

24 24/36 Simplicity, Induction, Scientific Discovery 24 /17 Comparison with Standard Model Dataset complete set of 193 particles (antiparticles listed separately). See Excel. included most probable decay for each unstable particle  182 reactions. Some others from textbooks for total of 205 reactions. See Demo. Matches Standard Model!

25 25/36 Extensions Mind-change optimal learning for simultaneous discovery of conservation laws and hidden particles (neutrinos), Schulte 2009. Same algorithm can be used to find molecular structure of chemical substances (e.g., water is H 2 0), Schulte and Drew 2010. Simultaneous Discovery of Conservation Laws and Hidden Particles With Smith Matrix Decomposition. Schulte, O. IJCAI 2009. Learning Conservation Laws Via Matrix Search. Oliver Schulte and Mark S. Drew, Discovery Science 2010.

26 26/36 Learning Simple Conservation Laws in Particle Physics Simplicity, Induction, Scientific Discovery

27 27/36 Empirically Equivalent Conservation Matrices There are many bases for the nullspace of an observed reaction set. All are empirically equivalent: consistent with exactly the same reactions (non-identifiability).  All have the same inductive simplicity rank. How to choose? 1. Minimize description length/maximize parsimony. 2. Choose conservation matrices with a simpler ontology. Simplicity, Induction, Scientific Discovery

28 28/36 Description Length/Parsimony L1-norm |M| of matrix M = sum of absolute values of entries. Prefer conservation matrices with smaller L1- norm (Valdes-Perez and Erdmann 1994). Valdes-Perez, R., Erdmann, M.: Systematic induction and parsimony of phenomenological conservation laws. Computer Physics Communications 83 (1994).

29 29/36 Ontological Simplicity Recall that conserved quantities define groups or families of particles. Prefer quantities that induce smallest number of disjoint families. The fewer kinds of things are introduced by a theory, the ontologically simpler it is (homogeneity). The less overlap between kinds, the greater the ontological simplicity. Ontological simplicity # of Kinds

30 30/36 30 /17 Parsimony meets Ontology Theorem (Schulte 2008). Let R be a reaction data matrix. If there is a nullspace basis conservation matrix Q with disjoint entity clusters, then The clusters (families) are uniquely determined. There is a unique nullspace basis Q* that minimizes the L1-norm (up to sign). pn -- 00 e-e- e --  --  Baryon#Electron#Muon#Tau# Quantity#1Quantity#2Quantity#3Quantity#4 Any alternative set of 4 Q#s with disjoint carriers The Co-Discovery of Conservation Laws and Particle Families. O. Schulte (2008). Studies in History and Philosophy of Modern Physics.

31 31/36 Implementation The theorem implies that minimizing the L1-norm will discover the unique set of particle families determined by the data. Minimization Problem. Minimize L1-norm |Q|, subject to nonlinear constraint: Q columns are basis for nullspace of R. Algorithm by Schulte and Drew (2010). If electric charge is fixed as input, recovers exactly the laws in Standard Model! Learning Conservation Laws Via Matrix Search. Oliver Schulte and Mark S. Drew, Discovery Science 2010.

32 32/36 Big Picture: Simplicity in Learning Conservation Laws In particle physics problem: 1. Maximize topological/inductive simplicity first. 2. Maximize ontological simplicity and parsimony to break ties. Simplicity, Induction, Scientific Discovery simplicity ontology parsimony/ description length Mind changes/ topology

33 33/36 Another Application: Learning Bayes Nets Learn Bayes nets from observed correlations (constraint-based). Simplicity Rank of Bayes net G = number of edges not in G. Is there a unique minimum- edge graph for a given set of observed correlations? NP- hard. Mind Change Optimal Learning of Bayes Net Structure. O.Schulte, W. Luo and R. Greiner (2007) COLT. MeaslesAllergy Spots MeaslesAllergy Spots simpler more complex

34 34/36 Summary: Theory Mind-change optimal learning: converge to a correct hypothesis with a minimum number of theory changes. Mind-change complexity is characterized by topological concept of accumulation order. Also defines a topological concept of simplicity rank for a hypothesis. There is a mind-change optimal method that conjectures the uniquely topologically simplest hypothesis if there is one, otherwise outputs ? for no conclusion. Topological Simplicity does not indicate truth, but maximizing it leads to efficient convergence. Simplicity, Induction, Scientific Discovery

35 35/36 Summary: Examples Examples of the mind-change optimal method. Existence problem: conjecture “reaction not possible” until observed. Riddle of Induction: conjecture “all emeralds are green” until blue one is observed. Learning conservation laws: conjecture maximum-rank conservation matrix consistent with the data. Matches predictions of particle physics standard model. Refine selection using L1-norm: matches quantities in standard model exactly. recovers particle families (ontology). Simplicity, Induction, Scientific Discovery

36 36/36 The End Thank you!


Download ppt "Simplicity, Induction, and Scientific Discovery School of Computing Science Simon Fraser University Vancouver, Canada."

Similar presentations


Ads by Google