Download presentation
Presentation is loading. Please wait.
Published byStanley Price Modified over 8 years ago
1
Inductive Learning (2/2) Version Space and PAC Learning Russell and Norvig: Chapter 18, Sections 18.5 through 18.7 Chapter 18, Section 18.5 Chapter 19, Sections 19.1 through 19.3 CS121 – Winter 2003
2
Contents Introduction to inductive learning Logic-based inductive learning: Decision tree method Version space method Function-based inductive learning Neural nets + PAC learning
3
+ + + + + + + + + + + + - - - - - - - - - - - - Example set X {[A, B, …, CONCEPT]} Inductive Learning Scheme Hypothesis space H {[CONCEPT(x) S(A,B, …)]} Training set Inductive hypothesis h
4
Predicate-Learning Methods Decision tree Version space Explicit representation of hypothesis space H Need to provide H with some “structure”
5
Version Space Method 1.V H 2.For every example x in training set do a. Eliminate from V every hypothesis that does not agree with x b. If V is empty then return failure 3.Return V Compared to the decision tree method, this algorithm is: incremental least-commitment But the size of V is enormous!!! V is the version space Idea: Define a partial ordering on the hypotheses in H and only represent the upper and lower bounds of V for this ordering
6
Rewarded Card Example (r=1) v … v (r=10) v (r=J) v (r=Q) v (r=K) ANY-RANK(r) (r=1) v … v (r=10) NUM(r) (r=J) v (r=Q) v (r=K) FACE(r) (s= ) v (s= ) v (s= ) v (s= ) ANY-SUIT(s) (s= ) v (s= ) BLACK(s) (s= ) v (s= ) RED(s) An hypothesis is any sentence of the form: R(r) S(s) REWARD([r,s]) where: R(r) is ANY-RANK(r), NUM(r), FACE(r), or (r=j) S(s) is ANY-SUIT(s), BLACK(s), RED(s), or (s=k)
7
Simplified Representation For simplicity, we represent a concept by rs, with: r = a, n, f, 1, …, 10, j, q, k s = a, b, r, , , , For example: n represents: NUM(r) (s= ) REWARD([r,s]) aa represents: ANY-RANK(r) ANY-SUIT(s) REWARD([r,s])
8
Extension of an Hypothesis The extension of an hypothesis h is the set of objects that verifies h Examples: The extension of f is: {j , q , k } The extension of aa is the set of all cards
9
More General/Specific Relation Let h 1 and h 2 be two hypotheses in H h 1 is more general than h 2 iff the extension of h 1 is a proper superset of h 2 ’s Examples: aa is more general than f f is more general than q fr and nr are not comparable
10
More General/Specific Relation Let h 1 and h 2 be two hypotheses in H h 1 is more general than h 2 iff the extension of h 1 is a proper superset of h 2 ’s The inverse of the “more general” relation is the “more specific” relation The “more general” relation defines a partial ordering on the hypotheses in H
11
Example: Subset of Partial Order aa naab nb nn 44 4b aa 4a
12
Construction of Ordering Relation 110 n a f jk …… b a r
13
G-Boundary / S-Boundary of V An hypothesis in V is most general iff no hypothesis in V is more general G-boundary G of V: Set of most general hypotheses in V
14
G-Boundary / S-Boundary of V An hypothesis in V is most general iff no hypothesis in V is more general G-boundary G of V: Set of most general hypotheses in V An hypothesis in V is most specific iff no hypothesis in V is more general S-boundary S of V: Set of most specific hypotheses in V
15
aa naab nb nn 44 4b aa 4a Example: G-/S-Boundaries of V aa 44 11 k …… Now suppose that 4 is given as a positive example S G We replace every hypothesis in S whose extension does not contain 4 by its generalization set
16
Example: G-/S-Boundaries of V aa naab nb nn 44 4b aa 4a Here, both G and S have size 1. This is not the case in general!
17
Example: G-/S-Boundaries of V aa naab nb nn 44 4b aa 4a Let 7 be the next (positive) example Generalization set of 4 The generalization set of an hypothesis h is the set of the hypotheses that are immediately more general than h
18
Example: G-/S-Boundaries of V aa naab nb nn 44 4b aa 4a Let 7 be the next (positive) example
19
Example: G-/S-Boundaries of V aa naab nb nn aa Let 5 be the next (negative) example Specialization set of aa
20
Example: G-/S-Boundaries of V ab nb nn aa G and S, and all hypotheses in between form exactly the version space 1. If an hypothesis between G and S disagreed with an example x, then an hypothesis G or S would also disagree with x, hence would have been removed
21
Example: G-/S-Boundaries of V ab nb nn aa G and S, and all hypotheses in between form exactly the version space 2. If there were an hypothesis not in this set which agreed with all examples, then it would have to be either no more specific than any member of G – but then it would be in G – or no more general than some member of S – but then it would be in S
22
Example: G-/S-Boundaries of V ab nb nn aa Do 8 , 6 , j satisfy CONCEPT? Yes No Maybe At this stage …
23
Example: G-/S-Boundaries of V ab nb nn aa Let 2 be the next (positive) example
24
Example: G-/S-Boundaries of V ab nb Let j be the next (negative) example
25
Example: G-/S-Boundaries of V nb + 4 7 2 – 5 j NUM(r) BLACK(s) REWARD([r,s])
26
Example: G-/S-Boundaries of V ab nb nn aa … and let 8 be the next (negative) example Let us return to the version space … The only most specific hypothesis disagrees with this example, hence no hypothesis in H agrees with all examples
27
Example: G-/S-Boundaries of V ab nb nn aa … and let j be the next (positive) example Let us return to the version space … The only most general hypothesis disagrees with this example, hence no hypothesis in H agrees with all examples
28
Version Space Update 1.x new example 2.If x is positive then (G,S) POSITIVE-UPDATE(G,S,x) 3.Else (G,S) NEGATIVE-UPDATE(G,S,x) 4.If G or S is empty then return failure
29
POSITIVE-UPDATE(G,S,x) 1.Eliminate all hypotheses in G that do not agree with x
30
POSITIVE-UPDATE(G,S,x) 2.Minimally generalize all hypotheses in S until they are consistent with x Using the generalization sets of the hypotheses
31
POSITIVE-UPDATE(G,S,x) 1.Eliminate all hypotheses in G that do not agree with x 2.Minimally generalize all hypotheses in S until they are consistent with x 3.Remove from S every hypothesis that is neither more specific than nor equal to a hypothesis in G This step was not needed in the card example
32
POSITIVE-UPDATE(G,S,x) 1.Eliminate all hypotheses in G that do not agree with x 2.Minimally generalize all hypotheses in S until they are consistent with x 3.Remove from S every hypothesis that is neither more specific than nor equal to a hypothesis in G 4.Remove from S every hypothesis that is more general than another hypothesis in S 5.Return (G,S)
33
NEGATIVE-UPDATE(G,S,x) 1.Eliminate all hypotheses in S that do not agree with x 2.Minimally specialize all hypotheses in G until they are consistent with x 3.Remove from G every hypothesis that is neither more general than nor equal to a hypothesis in S 4.Remove from G every hypothesis that is more specific than another hypothesis in G 5.Return (G,S)
34
Example-Selection Strategy Suppose that at each step the learning procedure has the possibility to select the object (card) of the next example Let it pick the object such that, whether the example is positive or not, it will eliminate one-half of the remaining hypotheses Then a single hypothesis will be isolated in O(log |H|) steps
35
aa naab nb nn aa Example 9 ? j ? j ?
36
Example-Selection Strategy Suppose that at each step the learning procedure has the possibility to select the object (card) of the next example Let it pick the object such that, whether the example is positive or not, it will eliminate one-half of the remaining hypotheses Then a single hypothesis will be isolated in O(log |H|) steps But picking the object that eliminates half the version space may be expensive
37
Noise If some examples are misclassified the version space may collapse Possible solution: Maintain several G- and S-boundaries, e.g., consistent with all examples, all examples but one, etc… (Exercise: Develop this idea!)
38
Current-Best-Hypothesis Search Keep one hypothesis at each step Generalize or specialize the hypothesis at each new example Details left as an exercise…
39
VSL vs DTL Decision tree learning (DTL) is more efficient if all examples are given in advance; else, it may produce successive hypotheses, each poorly related to the previous one Version space learning (VSL) is incremental DTL can produce simplified hypotheses that do not agree with all examples DTL has been more widely used in practice
40
+ + + + + + + + + + + + - - - - - - - - - - - - Example set X Can Inductive Learning Work? Hypothesis space H Training set Inductive hypothesis h size m size |H| f : correct hypothesis p( x ): probability that example x is picked from X
41
Approximately Correct Hypothesis h H is approximately correct (AC) with accuracy iff: Pr[ h ( x ) f ( x )] where x is an example picked with probability distribution p from X
42
PAC Learning Procedure L is Provably Approximately Correct (PAC) with confidence iff: Pr[ Pr[ h ( x ) f ( x )] > ] Can L be PAC? If yes, how big should the size m of the training set be?
43
Can L Be PAC? Let g be an arbitrary element of H that is not approximately correct Since g is not AC, we have: Pr[ g ( x ) f ( x )] > So, the probability that g is consistent with all the examples in is at most (1- ) m … … and he probability that there exists a non-AC hypothesis matching all the examples in is at most |H|(1- ) m
44
Can L Be PAC? Let g be an arbitrary element of H that is not approximately correct Since g is not AC, we have: Pr[ g ( x ) f ( x )] > So, the probability that g is consistent with all the examples in is at most (1- ) m … … and he probability that there exists a non-AC hypothesis matching all the examples in is at most |H|(1- ) m Therefore, L is PAC if the size m of the training set verifies: |H|(1- ) m
45
Size of Training Set From |H|(1- ) m we derive: m ln( /|H|) / ln(1- ) Since < -ln(1- ) for 0< <1, we have: m ln( /|H|) / (- ) m ln(|H|/ ) / So, m increases logarithmically with the size of the hypothesis space But how big is |H|?
46
If H is the set of all logical sentences with n base predicates, then |H| =, and m is exponential in n If H is the set of all conjunctions of k << n base predicates picked among n predicates, then |H| = O(n k ) and m is logarithmic in n Importance of choosing a “good” KIS bias Importance of KIS Bias 2 2n2n
47
Inductive learning Find h such that KB and h are consistent KB, h Explanation-Based Learning KB: Background knowledge : Observed knowledge such that KB Explanation-based learning Find h such that KB = KB1,KB2 KB1 h KB2, h Example: Derivatives of functions KB1 is the general theory consists of examples h defines the derivatives of usual functions KB2 gives simplification rules Nothing really new is learnt!
48
Summary Version space method Structure of hypothesis space Generalization/specialization of hypothesis PAC learning Explanation-based learning
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.