Inductive Learning (2/2) Version Space and PAC Learning

Inductive Learning (2/2) Version Space and PAC Learning
Russell and Norvig: Chapter 18, Sections 18.5 through 18.7 Chapter 18, Section 18.5 Chapter 19, Sections 19.1 through 19.3 CS121 – Winter 2003 Version Space and PAC Learning

Version Space and PAC Learning
Contents Introduction to inductive learning Logic-based inductive learning: Decision tree method Version space method Function-based inductive learning Neural nets + PAC learning Version Space and PAC Learning

Inductive Learning Scheme
Inductive hypothesis h Training set D + - Example set X {[A, B, …, CONCEPT]} Hypothesis space H {[CONCEPT(x)  S(A,B, …)]} Version Space and PAC Learning

Predicate-Learning Methods
Decision tree Version space Need to provide H with some “structure” Explicit representation of hypothesis space H Version Space and PAC Learning

Version Space Method V is the version space V H
For every example x in training set D do Eliminate from V every hypothesis that does not agree with x If V is empty then return failure Return V But the size of V is enormous!!! Idea: Define a partial ordering on the hypotheses in H and only represent the upper and lower bounds of V for this ordering Compared to the decision tree method, this algorithm is: incremental least-commitment Version Space and PAC Learning

Rewarded Card Example (r=1) v … v (r=10) v (r=J) v (r=Q) v (r=K)  ANY-RANK(r) (r=1) v … v (r=10)  NUM(r) (r=J) v (r=Q) v (r=K)  FACE(r) (s=) v (s=) v (s=) v (s=)  ANY-SUIT(s) (s=) v (s=)  BLACK(s) (s=) v (s=)  RED(s) An hypothesis is any sentence of the form: R(r)  S(s)  REWARD([r,s]) where: R(r) is ANY-RANK(r), NUM(r), FACE(r), or (r=j) S(s) is ANY-SUIT(s), BLACK(s), RED(s), or (s=k) Version Space and PAC Learning

Simplified Representation
For simplicity, we represent a concept by rs, with: r = a, n, f, 1, …, 10, j, q, k s = a, b, r, , , ,  For example: n represents: NUM(r)  (s=)  REWARD([r,s]) aa represents: ANY-RANK(r)  ANY-SUIT(s)  REWARD([r,s]) Version Space and PAC Learning

Extension of an Hypothesis
The extension of an hypothesis h is the set of objects that verifies h Examples: The extension of f is: {j, q, k} The extension of aa is the set of all cards Version Space and PAC Learning

More General/Specific Relation
Let h1 and h2 be two hypotheses in H h1 is more general than h2 iff the extension of h1 is a proper superset of h2’s Examples: aa is more general than f f is more general than q fr and nr are not comparable Version Space and PAC Learning

More General/Specific Relation
Let h1 and h2 be two hypotheses in H h1 is more general than h2 iff the extension of h1 is a proper superset of h2’s The inverse of the “more general” relation is the “more specific” relation The “more general” relation defines a partial ordering on the hypotheses in H Version Space and PAC Learning

Example: Subset of Partial Order
aa na ab nb n 4 4b a 4a Version Space and PAC Learning

Construction of Ordering Relation
1 10 n a f j k …   b a r   Version Space and PAC Learning

G-Boundary / S-Boundary of V
An hypothesis in V is most general iff no hypothesis in V is more general G-boundary G of V: Set of most general hypotheses in V Version Space and PAC Learning

G-Boundary / S-Boundary of V
An hypothesis in V is most general iff no hypothesis in V is more general G-boundary G of V: Set of most general hypotheses in V An hypothesis in V is most specific iff no hypothesis in V is more general S-boundary S of V: Set of most specific hypotheses in V Version Space and PAC Learning

Example: G-/S-Boundaries of V
aa na ab nb n 4 4b a 4a aa We replace every hypothesis in S whose extension does not contain 4 by its generalization set Now suppose that 4 is given as a positive example 4 1 k … S Version Space and PAC Learning

aa na ab Here, both G and S have size 1. This is not the case in general! 4a nb a 4b n 4 Version Space and PAC Learning

The generalization set of an hypothesis h is the set of the hypotheses that are immediately more general than h aa na ab 4a nb a Generalization set of 4 4b n Let 7 be the next (positive) example 4 Version Space and PAC Learning

aa na ab 4a nb a 4b n Let 7 be the next (positive) example 4 Version Space and PAC Learning

Specialization set of aa aa na ab nb a n Let 5 be the next (negative) example Version Space and PAC Learning

G and S, and all hypotheses in between form exactly the version space ab nb a 1. If an hypothesis between G and S disagreed with an example x, then an hypothesis G or S would also disagree with x, hence would have been removed n Version Space and PAC Learning

G and S, and all hypotheses in between form exactly the version space ab nb a 2. If there were an hypothesis not in this set which agreed with all examples, then it would have to be either no more specific than any member of G – but then it would be in G – or no more general than some member of S – but then it would be in S n Version Space and PAC Learning

At this stage … ab No Yes nb a Maybe n Do 8, 6, j satisfy CONCEPT? Version Space and PAC Learning

ab nb a n Let 2 be the next (positive) example Version Space and PAC Learning

ab nb Let j be the next (negative) example Version Space and PAC Learning

+ 4 7 2 – 5 j nb NUM(r)  BLACK(s)  REWARD([r,s]) Version Space and PAC Learning

Let us return to the version space … … and let 8 be the next (negative) example ab nb a The only most specific hypothesis disagrees with this example, hence no hypothesis in H agrees with all examples n Version Space and PAC Learning

Let us return to the version space … … and let j be the next (positive) example ab nb a The only most general hypothesis disagrees with this example, hence no hypothesis in H agrees with all examples n Version Space and PAC Learning

Version Space Update x  new example If x is positive then (G,S)  POSITIVE-UPDATE(G,S,x) Else (G,S)  NEGATIVE-UPDATE(G,S,x) If G or S is empty then return failure Version Space and PAC Learning

POSITIVE-UPDATE(G,S,x)
Eliminate all hypotheses in G that do not agree with x Version Space and PAC Learning

Eliminate all hypotheses in G that do not agree with x Minimally generalize all hypotheses in S until they are consistent with x Using the generalization sets of the hypotheses Version Space and PAC Learning

Eliminate all hypotheses in G that do not agree with x Minimally generalize all hypotheses in S until they are consistent with x Remove from S every hypothesis that is neither more specific than nor equal to a hypothesis in G This step was not needed in the card example Version Space and PAC Learning

Eliminate all hypotheses in G that do not agree with x Minimally generalize all hypotheses in S until they are consistent with x Remove from S every hypothesis that is neither more specific than nor equal to a hypothesis in G Remove from S every hypothesis that is more general than another hypothesis in S Return (G,S) Version Space and PAC Learning

NEGATIVE-UPDATE(G,S,x)
Eliminate all hypotheses in S that do not agree with x Minimally specialize all hypotheses in G until they are consistent with x Remove from G every hypothesis that is neither more general than nor equal to a hypothesis in S Remove from G every hypothesis that is more specific than another hypothesis in G Return (G,S) Version Space and PAC Learning

Example-Selection Strategy
Suppose that at each step the learning procedure has the possibility to select the object (card) of the next example Let it pick the object such that, whether the example is positive or not, it will eliminate one-half of the remaining hypotheses Then a single hypothesis will be isolated in O(log |H|) steps Version Space and PAC Learning

Example aa na ab 9? j? j? nb a n Version Space and PAC Learning

Example-Selection Strategy
Suppose that at each step the learning procedure has the possibility to select the object (card) of the next example Let it pick the object such that, whether the example is positive or not, it will eliminate one-half of the remaining hypotheses Then a single hypothesis will be isolated in O(log |H|) steps But picking the object that eliminates half the version space may be expensive Version Space and PAC Learning

Noise If some examples are misclassified the version space may collapse Possible solution: Maintain several G- and S-boundaries, e.g., consistent with all examples, all examples but one, etc… (Exercise: Develop this idea!) Version Space and PAC Learning

Current-Best-Hypothesis Search
Keep one hypothesis at each step Generalize or specialize the hypothesis at each new example Details left as an exercise… Version Space and PAC Learning

VSL vs DTL Decision tree learning (DTL) is more efficient if all examples are given in advance; else, it may produce successive hypotheses, each poorly related to the previous one Version space learning (VSL) is incremental DTL can produce simplified hypotheses that do not agree with all examples DTL has been more widely used in practice Version Space and PAC Learning

Can Inductive Learning Work?
Inductive hypothesis h size m Training set D + - Example set X Hypothesis space H f: correct hypothesis p(x): probability that example x is picked from X size |H| Version Space and PAC Learning

Approximately Correct Hypothesis
h  H is approximately correct (AC) with accuracy e iff: Pr[h(x)  f(x)]  e where x is an example picked with probability distribution p from X Version Space and PAC Learning

PAC Learning Procedure
L is Provably Approximately Correct (PAC) with confidence g iff: Pr[Pr[h(x)  f(x)] > e]  g Can L be PAC? If yes, how big should the size m of the training set D be? Version Space and PAC Learning

Can L Be PAC? Let g be an arbitrary element of H that is not approximately correct Since g is not AC, we have: Pr[g(x)  f(x)] > e So, the probability that g is consistent with all the examples in D is at most (1-e)m … … and he probability that there exists a non-AC hypothesis matching all the examples in D is at most |H|(1-e)m Version Space and PAC Learning

Can L Be PAC? Let g be an arbitrary element of H that is not approximately correct Since g is not AC, we have: Pr[g(x)  f(x)] > e So, the probability that g is consistent with all the examples in D is at most (1-e)m … … and he probability that there exists a non-AC hypothesis matching all the examples in D is at most |H|(1-e)m Therefore, L is PAC if the size m of the training set verifies: |H|(1-e)m  d Version Space and PAC Learning

Size of Training Set From |H|(1-e)m  g we derive: m  ln(g/|H|) / ln(1-e) Since e < -ln(1-e) for 0<e<1, we have: m  ln(g/|H|) / (-e) m  ln(|H|/g) / e So, m increases logarithmically with the size of the hypothesis space But how big is |H|? Version Space and PAC Learning

Importance of KIS Bias If H is the set of all logical sentences with n base predicates, then |H| = , and m is exponential in n If H is the set of all conjunctions of k << n base predicates picked among n predicates, then |H| = O(nk) and m is logarithmic in n  Importance of choosing a “good” KIS bias 2 2n Version Space and PAC Learning

Explanation-Based Learning
KB: Background knowledge D: Observed knowledge such that KB D Inductive learning Find h such that KB and h are consistent KB,h D Explanation-based learning Find h such that KB = KB1,KB2 KB1 h KB2,h D Example: Derivatives of functions KB1 is the general theory D consists of examples h defines the derivatives of usual functions KB2 gives simplification rules Nothing really new is learnt! Version Space and PAC Learning

Summary Version space method Structure of hypothesis space Generalization/specialization of hypothesis PAC learning Explanation-based learning Version Space and PAC Learning

Inductive Learning (2/2) Version Space and PAC Learning

Similar presentations

Presentation on theme: "Inductive Learning (2/2) Version Space and PAC Learning"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Inductive Learning (2/2) Version Space and PAC Learning

Similar presentations

Presentation on theme: "Inductive Learning (2/2) Version Space and PAC Learning"— Presentation transcript:

Similar presentations

About project

Feedback