Download presentation
Presentation is loading. Please wait.
1
Computational Learning Theory
In the Name of God Machine Learning Computational Learning Theory Mohammad Ali Keyvanrad Thanks to: M. Soleymani (Sharif University of Technology) Tom Mitchell (Carnegie Mellon University ) Fall 1392
2
Outline Computational Learning Theory PAC learning theorem
VC dimension
3
Computational Learning Theory
We want a theory that relates Number of training examples Complexity of hypothesis space Accuracy to which target function is approximated Probability that learner outputs a successful hypothesis
4
Learning scenarios Learner proposes instances as queries to teacher?
learner proposes π, teacher provides π(π) Teacher (who knows π(π)) proposes training examples? teacher proposes sequence { π 1 ,π π 1 ,β¦, ( π π ,π π π )} instances drawn according to π(π)
5
Sample Complexity How good is the classifier, really?
How much data do I need to make it βgood enoughβ?
6
Problem settings Set of all instances π Set of hypotheses π»
Set of possible target functions πΆ={π:πβ 0,1 } Sequence of π training instances π·= π π ,π π π π=1 π π drawn at random from unknown distribution π(π) Teacher provides noise-free label π(π) for it Learner observes a set of training examples π· for target function π and outputs a hypothesis ββπ» estimating π Goal: with high probability ("probably"), the selected function will have low generalization error ("approximately correct")
7
True error of a hypothesis
True error of β: probability that it will misclassify an example drawn at random from π(π)
8
Two notions of error
9
Overfitting Consider a hypothesis β and its
Error rate over training data: ππππ π π‘ππππ (β) True error rate over all data: ππππ π π‘ππ’π (β) We say β overfits the training data if ππππ π π‘ππ’π (β)>ππππ π π‘ππππ (β) Amount of overfitting ππππ π π‘ππ’π β βππππ π π‘ππππ (β) Can we bound ππππ π π‘ππ’π β in terms of ππππ π π‘ππππ β ?
10
Problem setting Classification
π·: π i.i.d. data points that are labeled Finite number of possible hypothesis (e.g., decision trees of depth π 0 ) A learner finds a hypothesis β that is consistent with training data ππππ π π‘ππππ (β)=0 What is the probability that the true error of β will be more than π? ππππ π π‘ππ’π (β)β₯π
11
How likely is a learner to pick a bad hypothesis?
Bound on the probability that any consistent learner will output β with ππππ π π‘ππ’π (β)>π Theorem [Haussler, 1988]: For target concept c, β 0 β€π β€1 , If π» is finite and π· contains πβ₯1 independent random samples
12
Proof
13
Proof (Contβd)
14
PAC Bound Theorem [Hausslerβ88]: Consider finite hypothesis space π», training set π· with π i.i.d. samples,0<π<1: for any learned hypothesis ββπ» that is consistent on the training set π·: If ππππ π π‘ππππ (β)=0 then with probability at least (1βπΏ): PAC: Probably Approximately Correct
15
PAC Bound Sample Complexity
How many training examples suffice? Given π and πΏ, yields sample complexity: Given π and πΏ, yields error bound:
16
Example: Conjunction of up to π Boolean literals
17
Agnostic learning
18
Hoeffding bounds: Agnostic learning
19
PAC bound: Agnostic learning
20
Limitation of the bounds
21
Shattering a set of instances
22
Vapnick-Chervonenkis (VC) dimension
23
PAC bound using VC
24
VC dimension: linear classifier in a 2-D space
25
VC dimension: linear classifier
26
Summary of PAC bounds
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.