Download presentation
Presentation is loading. Please wait.
1
+ + + + + + + + + + + + - - - - - - - - - - - - Example set X Can Inductive Learning Work? Hypothesis space H Training set Inductive hypothesis h size m size |H| h : hypothesis that agrees with all examples in p( x ): probability that example x is picked from X L
2
Approximately Correct Hypothesis h H is approximately correct (AC) with accuracy iff: Pr[ h ( x ) correct] > 1 – where x is an example picked with probability distribution p from X
3
PAC Learning Algorithm A leaning algorithm L is Provably Approximately Correct (PAC) with confidence 1 iff the probability that it generates a non-AC hypothesis h is : Pr[ h is non-AC ] Can L be PAC if the size m of the training set is large enough? If yes, how big should m be?
4
Intuition If m is large enough and g H is not AC, it is unlikely that it agrees with all examples in the training dataset So, if m is large enough, there should be few non-AC hypotheses that agree with all examples in Hence, it is unlikely that L will pick one
5
Can L Be PAC? Let g be an arbitrary hypothesis in H that is not approximately correct Since g is not AC, we have: Pr[ g ( x ) correct] 1– The probability that g is consistent with all the examples in is at most (1- ) m The probability that there exists a non-AC hypothesis matching all examples in is at most |H|(1- ) m Therefore, L is PAC if m verifies: |H|(1- ) m L is PAC if Pr[h is non-AC] h H is AC iff: Pr[h( x ) correct] > 1–
6
Calculus H = {h1, h2, …, h |H| } Pr(hi is not-AC and agrees with ) (1- ) m Pr(h1, or h2, …, is not-AC and agrees with ) i=1,…,|H| Pr(hi is not-AC and agrees with ) |H| (1- ) m
7
Size of Training Set From |H|(1 ) m we derive: m ln( /|H|) / ln(1- ) Since < ln(1 ) for 0 < <1, we have: m ln( /|H|) / ( ) m ln(|H|/ ) / So, m increases logarithmically with the size of the hypothesis space But how big is |H|?
8
If H is the set of all logical sentences with n observable predicates, then |H| =, and m is exponential in n If H is the set of all conjunctions of k << n observable predicates picked among n predicates, then |H| = O(n k ) and m is logarithmic in n Importance of choosing a “good” KIS bias Importance of KIS Bias 2 2n2n
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.