Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computational Learning Theory

Similar presentations


Presentation on theme: "Computational Learning Theory"β€” Presentation transcript:

1 Computational Learning Theory
In the Name of God Machine Learning Computational Learning Theory Mohammad Ali Keyvanrad Thanks to: M. Soleymani (Sharif University of Technology) Tom Mitchell (Carnegie Mellon University )

2 Outline Computational Learning Theory PAC learning theorem
VC dimension

3 Computational Learning Theory
We want a theory that relates Number of training examples Complexity of hypothesis space Accuracy to which target function is approximated Probability that learner outputs a successful hypothesis

4 Learning scenarios Learner proposes instances as queries to teacher?
learner proposes 𝒙, teacher provides 𝑐(𝒙) Teacher (who knows 𝑐(𝒙)) proposes training examples? teacher proposes sequence { 𝒙 1 ,𝑐 𝒙 1 ,…, ( 𝒙 𝑛 ,𝑐 𝒙 𝑛 )} Instances drawn according to 𝑃(𝒙)

5 Sample Complexity How good is the classifier, really?
How much data do I need to make it β€œgood enough”?

6 Problem settings Set of all instances 𝑋 Set of hypotheses 𝐻
Set of possible target functions 𝐢={𝑐:𝑋→ 0,1 } Sequence of π‘š training instances 𝐷= 𝒙 𝑖 ,𝑐 𝒙 𝑖 𝑖=1 π‘š 𝒙 drawn at random from unknown distribution 𝑃(𝒙) Teacher provides noise-free label 𝑐(𝒙) for it Learner observes a set of training examples 𝐷 for target function 𝑐 and outputs a hypothesis β„Žβˆˆπ» estimating 𝑐 Goal: with high probability ("probably"), the selected function will have low generalization error ("approximately correct")

7 True error of a hypothesis
True error of β„Ž: probability that it will misclassify an example drawn at random from 𝑃(𝒙)

8 Two notions of error

9 Overfitting Consider a hypothesis β„Ž and its
Error rate over training data: π‘’π‘Ÿπ‘Ÿπ‘œ π‘Ÿ π‘‘π‘Ÿπ‘Žπ‘–π‘› (β„Ž) True error rate over all data: π‘’π‘Ÿπ‘Ÿπ‘œ π‘Ÿ π‘‘π‘Ÿπ‘’π‘’ (β„Ž) We say β„Ž overfits the training data if π‘’π‘Ÿπ‘Ÿπ‘œ π‘Ÿ π‘‘π‘Ÿπ‘’π‘’ (β„Ž)>π‘’π‘Ÿπ‘Ÿπ‘œ π‘Ÿ π‘‘π‘Ÿπ‘Žπ‘–π‘› (β„Ž) Amount of overfitting π‘’π‘Ÿπ‘Ÿπ‘œ π‘Ÿ π‘‘π‘Ÿπ‘’π‘’ β„Ž βˆ’π‘’π‘Ÿπ‘Ÿπ‘œ π‘Ÿ π‘‘π‘Ÿπ‘Žπ‘–π‘› (β„Ž) Can we bound π‘’π‘Ÿπ‘Ÿπ‘œ π‘Ÿ π‘‘π‘Ÿπ‘’π‘’ β„Ž in terms of π‘’π‘Ÿπ‘Ÿπ‘œ π‘Ÿ π‘‘π‘Ÿπ‘Žπ‘–π‘› β„Ž ?

10 Problem setting Classification
𝐷: π‘š i.i.d. data points that are labeled Finite number of possible hypothesis (e.g., decision trees of depth 𝑑 0 ) A learner finds a hypothesis β„Ž that is consistent with training data π‘’π‘Ÿπ‘Ÿπ‘œ π‘Ÿ π‘‘π‘Ÿπ‘Žπ‘–π‘› (β„Ž)=0 What is the probability that the true error of β„Ž will be more than πœ–? π‘’π‘Ÿπ‘Ÿπ‘œ π‘Ÿ π‘‘π‘Ÿπ‘’π‘’ (β„Ž)β‰₯πœ–

11 How likely is a learner to pick a bad hypothesis?
Bound on the probability that any consistent learner will output β„Ž with π‘’π‘Ÿπ‘Ÿπ‘œ π‘Ÿ π‘‘π‘Ÿπ‘’π‘’ (β„Ž)>πœ– Theorem [Haussler, 1988]: For target concept c, βˆ€ 0 β‰€πœ– ≀1 , If 𝐻 is finite and 𝐷 contains π‘šβ‰₯1 independent random samples

12 Proof

13 Proof (Cont’d)

14 PAC Bound Theorem [Haussler’88]: Consider finite hypothesis space 𝐻, training set 𝐷 with π‘š i.i.d. samples,0<πœ–<1: for any learned hypothesis β„Žβˆˆπ» that is consistent on the training set 𝐷: If π‘’π‘Ÿπ‘Ÿπ‘œ π‘Ÿ π‘‘π‘Ÿπ‘Žπ‘–π‘› (β„Ž)=0 then with probability at least (1βˆ’π›Ώ): PAC: Probably Approximately Correct

15 PAC Bound Sample Complexity
How many training examples suffice? Given πœ– and 𝛿, yields sample complexity: Given π‘š and 𝛿, yields error bound:

16 Example: Conjunction of up to 𝑑 Boolean literals

17 Agnostic learning

18 Hoeffding bounds: Agnostic learning

19 PAC bound: Agnostic learning

20 Limitation of the bounds

21 Shattering a set of instances

22 Vapnick-Chervonenkis (VC) dimension

23 PAC bound using VC

24 VC dimension: linear classifier in a 2-D space

25 VC dimension: linear classifier

26 Summary of PAC bounds

27


Download ppt "Computational Learning Theory"

Similar presentations


Ads by Google