Download presentation
Presentation is loading. Please wait.
Published byDeirdre Fitzgerald Modified over 8 years ago
1
Learning abductive reasoning using random examples Brendan Juba Washington University in St. Louis
2
Outline 1.Models of abductive reasoning 2.An abductive reasoning algorithm 3.“Not-easiness” of abducing conjunctions
3
Abductive reasoning: making plausible guesses Abductive reasoning: Given a conclusion c, find a “plausible” h that implies/leads to/… c Two varieties of “plausibility” in common use – Logical plausibility: a small h from which c follows – Bayesian plausibility: a h which has large posterior probability when given c In symbols… Pr [h|c true] > … Requires a prior distribution over representations…
4
Why might we want a new model? Existing models only tractable in simple cases – E.g. Horn rules (a ⋀ b ⋀ c ⇒ d …no negations), “nice” (conjugate) priors The choice of prior distribution really matters – And, it’s difficult to specify by hand
5
New model: abductive reasoning from random examples Fix a set of attributes (propositional variables x 1, x 2, …, x n ) An environment is modeled by an arbitrary, unknown distribution D over examples, i.e., settings of the n propositional variables. Task: for a conclusion c, find a h such that 1.Plausibility: Pr x ∈ D [h(x)=1] ≥ μ (for some given μ) 2.h almost entails c: Pr x ∈ D [c(x)=1|h(x)=1] ≥ 1-ε All probabilities over examples from D
6
Example: identifying a subgoal Consider: blocks world. For t=1,2,…,T – Propositional state vars. (“fluents”) ON t (A,B), ON t (A,TABLE), ON t (C,A), etc. – Actions also encoded by propositional vars. PUT t (B,A), PUT t (C,TABLE), etc. Given many examples of interaction… Our goal c: ON T (A,TABLE) ⋀ ON T (B,A) ⋀ ON T (C,B) A perhaps plausibly good “subgoal” h: [ON T-1 (B,A) ⋀ PUT T (C,B)] ⋁ [PUT T-1 (B,A) ⋀ PUT T (C,B)] A B Or, even given by examples, not explicitly formulated…
7
Formally: abductive reasoning from random examples for a class H Fix a class of Boolean representations H Given Boolean formula c; ε, δ, μ ∈ (0,1); independent examples x (1),…,x (m) ∈ D, Suppose that there exists a h* ∈ H such that 1.Plausibility: Pr x ∈ D [h*(x)=1] ≥ μ 2.h* entails c: Pr x ∈ D [c(x)=1|h*(x)=1] = 1 Find a h (ideally in H) such that with prob. 1-δ, 1.Plausibility: Pr x ∈ D [h(x)=1] ≥ 1/poly( 1 / μ, 1 / 1-ε, n) 2.h almost entails c: Pr x ∈ D [c(x)=1|h(x)=1] ≥ 1-ε
8
in pictures… c(x)=1 x ∈ {0,1} n h(x)=1 x: h(x)=1 c(x)=1 c(x)=0 c: goal/observation… h: explanation/solution/…
9
Outline 1.Models of abductive reasoning 2.An abductive reasoning algorithm 3.“Not-easiness” of abducing conjunctions
10
Theorem 1. If there is a k-DNF h* such that 1.Plausibility: Pr x ∈ D [h*(x)=1] ≥ μ 2.h* entails c: Pr x ∈ D [c(x)=1|h*(x)=1] = 1 then using m = O( 1 / με (n k +log 1 / δ )) examples, in time O(mn k ) we can find a k-DNF h such that with probability 1-δ, 1.Plausibility: Pr x ∈ D [h(x)=1] ≥ μ 2.h almost entails c: Pr x ∈ D [c(x)=1|h(x)=1] ≥ 1-ε k-DNF: an OR of “terms of size k” – ANDs of at most k “literals” – attributes or their negations
11
Algorithm for k-DNF abduction Start with h as an OR over all terms of size k For each example x (1),…,x (m) – If c(x (i) ) = 0, delete all terms T from h such that T(x (i) ) = 1 Return h Simple algorithm, 1 st proposed by J.S. Mill, 1843 Running time is clearly O(mn k )
12
Analysis pt 1: Pr x ∈ D [h(x)=1] ≥ μ We are given that some k-DNF h* has 1.Plausibility: Pr x ∈ D [h*(x)=1] ≥ μ 2.h* entails c: Pr x ∈ D [c(x)=1|h*(x)=1] = 1 Initially, every term of h* is in h Terms of h* are never true when c(x)=0 by 2. ☞ every term of h* remains in h ☞ h* implies h, so Pr x [h(x)=1]≥Pr x [h*(x)=1]≥μ.
13
Analysis pt 2: Pr x [c(x)=1|h(x)=1] ≥ 1-ε Rewrite conditional probability: Pr x ∈ D [c(x)=0 ⋀ h(x)=1] ≤ εPr x ∈ D [h(x)=1] We’ll show: Pr x ∈ D [c(x)=0 ⋀ h(x)=1] ≤ εμ ( ≤ εPr x ∈ D [h(x)=1] by part 1) Consider any h’ s.t. Pr x [c(x)=0 ⋀ h’(x)=1] > εμ – Since each x (i) is drawn independently from D Pr x ∈ D [no i has c(x (i) )=0 ⋀ h’(x (i) )=1] < (1-εμ) m – A term of h’ is deleted when c=0 and h’ =1 – So, h’ is only possibly output w.p. < (1-εμ) m
14
Analysis pt 2, cont’d: Pr x [c(x)=1|h(x)=1] ≥ 1-ε We’ll show: Pr x ∈ D [c(x)=0 ⋀ h(x)=1] ≤ εμ Consider any h’ s.t. Pr x [c(x)=0 ⋀ h’(x)=1] > εμ – h’ is only possibly output w.p. < (1-εμ) m There are only 2 O(n k ) possible k-DNF h’ Since (1- 1 / x ) x ≤ 1 / e, m = O( 1 / με (n k +log 1 / δ )) ex’s suffice to guarantee that each such h’ is only possible to output w.p. < δ/2 O(n k ) ☞ w.p. >1-δ, our h has Pr x [c(x)=0 ⋀ h(x)=1] ≤ εμ
15
Theorem 1. If there is a k-DNF h* such that 1.Plausibility: Pr x ∈ D [h*(x)=1] ≥ μ 2.h* entails c: Pr x ∈ D [c(x)=1|h*(x)=1] = 1 then using m = O( 1 / με (n k +log 1 / δ )) examples, in time O(mn k ) we can find a k-DNF h such that with probability 1-δ, 1.Plausibility: Pr x ∈ D [h(x)=1] ≥ μ 2.h almost entails c: Pr x ∈ D [c(x)=1|h(x)=1] ≥ 1-ε k-DNF: an OR of “terms of size k” – ANDs of at most k “literals” – attributes or their negations A version that tolerates exceptions is also possible–see paper…
16
Outline 1.Models of abductive reasoning 2.An abductive reasoning algorithm 3.“Not-easiness” of abducing conjunctions
17
Theorem 2. Suppose that a polynomial-time algorithm exists for learning abduction from random examples for conjunctions. Then there is a polynomial-time algorithm for PAC-learning DNF.
18
PAC Learning (x (1), c(x (1) )) (x (2) 1,c(x (2) )) (x (m), c(x (m) )) … D D f C C C∈CC∈C C∈CC∈C w.p. 1-δ over… x’=(x’ 1,x’ 2,…,x’ n ) w.p. 1-ε over… c(x’) f(x’)
19
Theorem 2. Suppose that a polynomial-time algorithm exists for learning abduction from random examples for conjunctions. Then there is a polynomial-time algorithm for PAC-learning DNF. Theorem (Daniely & Shalev-Shwartz’14). If there is a polynomial-time algorithm for PAC- learning DNF, then for every f(k)∞ there is a polynomial-time algorithm for refuting random k-SAT formulas of n f(k) clauses. This is a new hardness assumption. Use at your discretion. ☞
20
Key learning technique: Boosting (Schapire, 1990) Suppose that there is a polynomial-time algorithm that, given examples of c ∈ C, w.p. 1- δ produces a circuit f s.t. Pr x [f(x)=c(x)] > ½+ 1 / poly(n). Then there is a polynomial-time PAC-learning algorithm for the class C. – i.e., using the ability to produce such f’s, we produce a g for which Pr x [g(x)=c(x)] is “boosted” to any 1-ε we require.
21
Sketch of learning DNF using conjunctive abduction If c is a DNF and Pr[c(x)=1]> ¼, then some term T of c has Pr[T(x)=1]> 1 / 4|c| – …and otherwise f≡0 satisfies Pr[f(x)=c(x)]>½+¼ Note: Pr[c(x)=1|T(x)=1]=1, T is a conjunction ☞ Abductive reasoning finds some h such that Pr[h(x)=1]> 1 / poly(n) and Pr[c(x)=1|h(x)=1]>¾ Return f: f(x)=1 whenever h(x)=1; if h(x)=0, f(x)=maj x:h(x)=0 {c(x)}—Pr[f(x)=c(x)]>½+ 1 / 4poly(n)
22
Recap: learning abduction from random examples For goal condition c, if there is a h* such that Pr x ∈ D [h*(x)=1] ≥ μ & Pr x ∈ D [c(x)=1|h*(x)=1] = 1 using examples x from D, find a h such that Pr x ∈ D [h(x)=1] ≥ μ’ & Pr x ∈ D [c(x)=1|h(x)=1] ≥ 1-ε Thm 1. An efficient algorithm exists for k-DNF h Thm 2. Unless DNF is PAC-learnable, there is no efficient algorithm for conjunctions
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.