Computational Learning Theory

Slides:



Advertisements
Similar presentations
Computational Learning Theory
Advertisements

PAC Learning 8/5/2005. purpose Effort to understand negative selection algorithm from totally different aspects –Statistics –Machine learning What is.
The Theory of NP-Completeness
PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University.
Northwestern University Winter 2007 Machine Learning EECS Machine Learning Lecture 13: Computational Learning Theory.
Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.
Probably Approximately Correct Model (PAC)
Vapnik-Chervonenkis Dimension Definition and Lower bound Adapted from Yishai Mansour.
1 Computational Learning Theory and Kernel Methods Tianyi Jiang March 8, 2004.
Experimental Evaluation
Probably Approximately Correct Learning (PAC) Leslie G. Valiant. A Theory of the Learnable. Comm. ACM (1984)
PAC learning Invented by L.Valiant in 1984 L.G.ValiantA theory of the learnable, Communications of the ACM, 1984, vol 27, 11, pp
1 Machine Learning: Lecture 5 Experimental Evaluation of Learning Algorithms (Based on Chapter 5 of Mitchell T.., Machine Learning, 1997)
Machine Learning Algorithms in Computational Learning Theory
1 The Theory of NP-Completeness 2012/11/6 P: the class of problems which can be solved by a deterministic polynomial algorithm. NP : the class of decision.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
For Friday Read chapter 18, section 7 Program 3 due.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 5.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
Computation Model and Complexity Class. 2 An algorithmic process that uses the result of a random draw to make an approximated decision has the ability.
Introduction to Machine Learning Supervised Learning 姓名 : 李政軒.
1 Machine Learning: Lecture 8 Computational Learning Theory (Based on Chapter 7 of Mitchell T.., Machine Learning, 1997)
1 The Theory of NP-Completeness 2 Cook ’ s Theorem (1971) Prof. Cook Toronto U. Receiving Turing Award (1982) Discussing difficult problems: worst case.
Overview Concept Learning Representation Inductive Learning Hypothesis
Computational Learning Theory IntroductionIntroduction The PAC Learning FrameworkThe PAC Learning Framework Finite Hypothesis SpacesFinite Hypothesis Spaces.
CS Inductive Bias1 Inductive Bias: How to generalize on novel data.
1 CSI5388 Current Approaches to Evaluation (Based on Chapter 5 of Mitchell T.., Machine Learning, 1997)
Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University Today: Computational Learning Theory Probably Approximately.
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Carla P. Gomes CS4700 Computational Learning Theory Slides by Carla P. Gomes and Nathalie Japkowicz (Reading: R&N AIMA 3 rd ed., Chapter 18.5)
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Ch 2. The Probably Approximately Correct Model and the VC Theorem 2.3 The Computational Nature of Language Learning and Evolution, Partha Niyogi, 2004.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
CS623: Introduction to Computing with Neural Nets (lecture-18) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
Evaluating Hypotheses
HW HW1: Let us know if you have any questions. ( the TAs)
Confidence intervals for µ
Computational Learning Theory
CS 9633 Machine Learning Concept Learning
CS 9633 Machine Learning Inductive-Analytical Methods
Computational Learning Theory
Introduction to Machine Learning
Hard Problems Introduction to NP
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Web-Mining Agents Computational Learning Theory
CH. 2: Supervised Learning
Sampling Distributions
Hypothesis Testing: Hypotheses
Data Mining Lecture 11.
Background: Lattices and the Learning-with-Errors problem
Computational Learning Theory
Perceptron as one Type of Linear Discriminants
Evaluating Hypotheses
LEARNING Chapter 18b and Chuck Dyer
Computational Learning Theory
The probably approximately correct (PAC) learning model
Computational Learning Theory Eric Xing Lecture 5, August 13, 2010
The loss function, the normal equation,
CS344 : Introduction to Artificial Intelligence
Computational Learning Theory
Machine Learning: UNIT-3 CHAPTER-2
Evaluating Hypothesis
Lecture 14 Learning Inductive inference
Supervised machine learning: creating a model
Implementation of Learning Systems
Version Space Machine Learning Fall 2018.
INTRODUCTION TO Machine Learning 3rd Edition
Machine Learning: Lecture 5
Presentation transcript:

Computational Learning Theory Introduction The PAC Learning Framework Finite Hypothesis Spaces Examples of PAC Learnable Concepts What is machine learning?

Computational learning theory (or CLT): Introduction Computational learning theory (or CLT): Provides a theoretical analysis of learning Shows when a learning algorithm can be expected to succeed Shows when learning may be impossible What is machine learning?

There are typically three areas comprised by CLT: Introduction There are typically three areas comprised by CLT: Sample Complexity. How many examples we need to find a good hypothesis? Computational Complexity. How much computational power we need to find a good hypothesis? Mistake Bound. How many mistakes we will make before finding a good hypothesis? What is machine learning?

Computational Learning Theory Introduction The PAC Learning Framework Finite Hypothesis Spaces Examples of PAC Learnable Concepts What is machine learning?

The PAC Learning Framework Let’s start with a simple problem: Assume a two dimensional space with positive and negative examples. Our goal is to find a rectangle that includes the positive examples but not the negatives (input space is R2): true concept What is machine learning? - - + - + - + + - + - + - - -

Distribution D. Assume instances are generated at random Definitions Definitions: Distribution D. Assume instances are generated at random from a distribution D. Class of Concepts C. Let C be a class of concepts that we wish to learn. In our example C is the family of all rectangles in R2. Class of Hypotheses H. The hypotheses our algorithm considers while learning the target concept. True error of a hypothesis h errorD(h) = Pr[c(x) = h(x)]D What is machine learning?

True error is the probability of regions A and B. The true error: true concept c Region A - - + - + - + + - + - What is machine learning? + hypothesis h - - - Region B True error is the probability of regions A and B. Region A : false negatives Region B : false positives

Considerations Two considerations: We don’t want a hypothesis or approximation with zero error. There might be some error as long as it is small (bounded by a constant ε). We don’t need to always produce a good hypothesis for every sample of examples (the probability of failure will be bounded by a constant δ).

The PAC Learning Framework PAC learning stands for probably approximate correct. Roughly, it tells us a class of concepts C (defined over an input space with examples of size n) is PAC learnable by a learning algorithm L, if for arbitrary small δ and ε, and for all concepts c in C, and for all distributions D over the input space, there is a 1-δ probability that the hypothesis h selected from space H by learning algorithm L is approximately correct (has error less than ε). L must run in time polynomial in 1/ ε , 1/ δ, n, and the size of c.

Example true concept c - - + - + - + - + + - + hypothesis h (most specific) - - - Assume a learning algorithm that outputs the rectangle that is most specific (touches the positive examples at the border of the rectangle). Question: Is this class of problems (rectangles in R2) PAC learnable by L?

Example error true concept c - - + - + - - + + + - + hypothesis h (most specific) - - - The error is the probability of the area between h and the true target rectangle c. How many examples do we need to make this error less than ε?

Example error true concept c - - + - + - - + + + - + hypothesis h (most specific) - - - In general, the probability that m independent examples have NOT fallen within the error region is (1- ε) m which we want to be less than δ.

Example In other words, we want: (1- ε) m <= δ Since (1-x) <= e–x we have that e – εm <= δ or m >= (1/ ε) ln (1/ δ) The result grows linearly in 1/ ε and logarithmically 1/ δ

Example error true concept c - - + - + - - + + + - + hypothesis h (most specific) - - - The analysis above can be applied by considering each of the four stripes of the error region. Can you finish the analysis considering each stripe separately and then joining their probabilities?

Computational Learning Theory Introduction The PAC Learning Framework Finite Hypothesis Spaces Examples of PAC Learnable Concepts What is machine learning?

Sample Complexity for Finite Hypothesis Spaces Definition: A consistent learner outputs the hypothesis h in H that perfectly fits the training examples (if possible). How many examples do we need to be approximately correct in finding a hypothesis output by a consistent learner that has low error?

Version Space ε-exhausted This is the same as asking how many examples we need to make the Version Space contain no hypothesis with error greater than ε. When a Version Space VS is such that no hypothesis has error greater than ε, we say the version space is ε-exhausted. How many examples do we need to make a version space VS be ε-exhausted?

Probability of Version Space being ε-exhausted The probability that the version space is not ε-exhausted after seeing m examples is bounded by the probability than all hypotheses in VS have error greater than ε. Since the size of the VS is less than the size of the whole hypothesis space H, then that probability is clearly less than |H|e –εm If we make this less than δ , then we have that m >= 1/ ε (ln |H| + ln (1/ δ ))

Agnostic Learning What happens if our hypothesis space H does not contain the target concept c?; Then clearly we can never find a hypothesis h with zero error. Here we want an algorithm that simply outputs the hypothesis with minimum training error.

Computational Learning Theory Introduction The PAC Learning Framework Finite Hypothesis Spaces Examples of PAC Learnable Concepts What is machine learning?

Examples of PAC-Learnable Concepts Can we say that concepts described by conjunctions of Boolean literals are PAC learnable? First, how large is the hypothesis space when we have n Boolean attributes? Answer: |H| = 3n

Examples of PAC-Learnable Concepts If we substitute this in our analysis of sample complexity for finite hypothesis spaces we have that: m >= 1/ ε (n ln 3 + ln (1/ δ) ) Thus the set of conjunctions of Boolean literals is PAC learnable.

K-Term DNF not PAC Learnable Consider now the class of functions of k-term DNF expressions. These are expressions of the form T1 V T2 V … V Tk where V stands for disjunction and each term Ti is a conjunction of n Boolean attributes.

K-Term DNF not PAC Learnable The size of |H| is k3n Using the equation for the sample complexity of finite hypothesis spaces: m >= 1/ ε (n ln 3 + ln (1/ δ) + ln k) Although the sample complexity is polynomial in the main parameters the problem is known to be NP complete.

K-Term CNF is PAC Learnable But it is interesting to see that another family of functions, the class of k-term CNF expressions is PAC learnable. This is interesting because the class of k-term CNF expressions is strictly larger than the class of k-term DNF expressions.