PAC Learning 8/5/2005. purpose Effort to understand negative selection algorithm from totally different aspects –Statistics –Machine learning What is.

Slides:



Advertisements
Similar presentations
Computational Learning Theory
Advertisements

Introductory Mathematics & Statistics for Business
Estimating the detector coverage in a negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital Dipankar Dasgupta The University of Memphis.
Tests of Hypotheses Based on a Single Sample
Statistical Machine Learning- The Basic Approach and Current Research Challenges Shai Ben-David CS497 February, 2007.
On Complexity, Sampling, and -Nets and -Samples. Range Spaces A range space is a pair, where is a ground set, it’s elements called points and is a family.
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
CHAPTER 2: Supervised Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Learning a Class from Examples.
Week 11 Review: Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution.
CH2 - Supervised Learning Computational learning theory
PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University.
Maria-Florina Balcan Modern Topics in Learning Theory Maria-Florina Balcan 04/19/2006.
TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAA A.
Probably Approximately Correct Learning Yongsub Lim Applied Algorithm Laboratory KAIST.
2D1431 Machine Learning Boosting.
Northwestern University Winter 2007 Machine Learning EECS Machine Learning Lecture 13: Computational Learning Theory.
Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.
Computational Learning Theory
Active Learning with Support Vector Machines
Evaluating Hypotheses
1 How to be a Bayesian without believing Yoav Freund Joint work with Rob Schapire and Yishay Mansour.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Sketched Derivation of error bound using VC-dimension (1) Bound our usual PAC expression by the probability that an algorithm has 0 error on the training.
1 Computational Learning Theory and Kernel Methods Tianyi Jiang March 8, 2004.
CS 4700: Foundations of Artificial Intelligence
Experimental Evaluation
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
PAC learning Invented by L.Valiant in 1984 L.G.ValiantA theory of the learnable, Communications of the ACM, 1984, vol 27, 11, pp
1 Machine Learning: Lecture 5 Experimental Evaluation of Learning Algorithms (Based on Chapter 5 of Mitchell T.., Machine Learning, 1997)
COMP3503 Intro to Inductive Modeling
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
Statistical Decision Theory
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Review and Preview This chapter combines the methods of descriptive statistics presented in.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
CS344: Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 36-37: Foundation of Machine Learning.
 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 5.
1 Machine Learning: Lecture 8 Computational Learning Theory (Based on Chapter 7 of Mitchell T.., Machine Learning, 1997)
Overview Concept Learning Representation Inductive Learning Hypothesis
Computational Learning Theory IntroductionIntroduction The PAC Learning FrameworkThe PAC Learning Framework Finite Hypothesis SpacesFinite Hypothesis Spaces.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
Question paper 1997.
Machine Learning Chapter 5. Evaluating Hypotheses
1 CSI5388 Current Approaches to Evaluation (Based on Chapter 5 of Mitchell T.., Machine Learning, 1997)
Chapter5: Evaluating Hypothesis. 개요 개요 Evaluating the accuracy of hypotheses is fundamental to ML. - to decide whether to use this hypothesis - integral.
Machine Learning Concept Learning General-to Specific Ordering
Goal of Learning Algorithms  The early learning algorithms were designed to find such an accurate fit to the data.  A classifier is said to be consistent.
Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University Today: Computational Learning Theory Probably Approximately.
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
Carla P. Gomes CS4700 Computational Learning Theory Slides by Carla P. Gomes and Nathalie Japkowicz (Reading: R&N AIMA 3 rd ed., Chapter 18.5)
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.
CS-424 Gregory Dudek Lecture 14 Learning –Inductive inference –Probably approximately correct learning.
Ch 2. The Probably Approximately Correct Model and the VC Theorem 2.3 The Computational Nature of Language Learning and Evolution, Partha Niyogi, 2004.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
Evaluating Hypotheses
Computational Learning Theory
Computational Learning Theory
Computational Learning Theory
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Computational Learning Theory
Computational Learning Theory
The probably approximately correct (PAC) learning model
Computational Learning Theory Eric Xing Lecture 5, August 13, 2010
CS344 : Introduction to Artificial Intelligence
Ch 6. Language Change: Multiple Languages 6.1 Multiple Languages
Machine Learning: UNIT-3 CHAPTER-2
Machine Learning: Lecture 6
Machine Learning: UNIT-3 CHAPTER-1
Lecture 14 Learning Inductive inference
Presentation transcript:

PAC Learning 8/5/2005

purpose Effort to understand negative selection algorithm from totally different aspects –Statistics –Machine learning What is machine learning, in a very informal way? Looking for mathematical tool to describe, analyze, evaluate either a learning algorithm, or learning problem.

background PAC learning framework is a branch of computational learning theory. Computational learning theory is a mathematical field related to the analysis of machine learning algorithms. It is actually considered as a field of statistics. Machine learning algorithms take a training set, form hypotheses or models, and make predictions about the future. Because the training set is finite and the future is uncertain, learning theory usually does not yield absolute guarantees of performance of the algorithms. Instead, probabilistic bounds on the performance of machine learning algorithms are quite common.

More about computational learning theory In addition to performance bounds, computational learning theorists study the time complexity and feasibility of learning. In computational learning theory, a computation is considered feasible if it can be done in polynomial time.

More about computational learning theory There are several different approaches to computational learning theory, which are often mathematically incompatible. This incompatibility arises from – using different inference principles: principles which tell you how to generalize from limited data. –differing definitions of probability (frequency probability, Bayesian probability).

More about computational learning theory The different approaches include: – Probably approximately correct learning (PAC learning), proposed by Leslie Valiant; – VC theory, proposed by Vladimir Vapnik; – Bayesian inference, arising from work first done by Thomas Bayes. – Algorithmic learning theory, from the work of E. M. Gold. Computational learning theory has led to practical algorithms. For example, PAC theory inspired boosting

What is this for? The PAC framework allowed accurate mathematical analysis of learning.

Basic facts of PAC learning Probably approximately correct learning (PAC learning) is a framework of learning that was proposed by Leslie Valiant in his paper A theory of the learnable. In this framework the learner gets samples that are classified according to a function from a certain class. The aim of the learner is to find an approximation of the function with high probability. We demand the learner to be able to learn the concept given any arbitrary approximation ratio, probability of success or distribution of the samples. How does negative selection fit in? We only deal with a very special distribution of the samples: one class samples. Is it a PAC learning algorithm?

The intend of PAC model is that successful learning of an unknown target concept should entail obtaining with high probability, a hypothesis that is a good approximation of it. We can consider this target concept as a unknown function, e.g. f:{0,1} n {0,1}; the result to pursue is an approximation of f, or a hypothesis as called here. The purpose of the discussion of PAC is to decide whether a algorithm to find the approximation (1) good enough or not (2) feasible or not. If we wish to define a model of learning from (random) samples, a crucial point is to formulate correctly the notion of success. (quoted but corrected and highlighted)

To make the discussion simple, let us use the simple setup f:{0,1} n {0,1} –Instance space {0,1} n

Give probability distribution D defined on {0,1} n The error of a hypothesis h with respect to a fixed target concept c is defined as Where denotes the symmetric difference. Error(h) is the probability that h and c will disagree according to D. The hypothesis h is a good approximation of the target concept c if error(h) is small. (Note that depends on D).

Definition of PAC Learnability This definition is the center piece of PAC learning model. Defining when the concept class C is: –PAC learnable by the hypothesis space H –Properly PAC learnable –PAC learnable

What is concept class C? C={C n } n1, where C n is set of target concepts over {0, 1} n What is hypothesis space H? H={H n } n1, where H n is set of hypotheses over {0, 1} n

Definition of PAC learnable by the hypothesis space H: –The concept class C is PAC learnable by the hypothesis space H if there exists a polynomial time algorithm A and a polynomial p(,.,.) such that for all n1, all target concepts c C n, all probability distribution D on the instance space {0,1} n, and all e and d, where 0<, <1, if the algorithm A is given at least p(n,1/,1/ ) independent random examples of c drawn according to D, then with probability at least 1-, A returns a hypothesis h H n with error(h). Note: this talks about the existence of A, not what exactly A is. The smallest such polynomial p is called the sample complexity of learning algorithm. –This is as essential to a learning algorithm as time complexity to a general algorithm

Definition pf properly PAC learnable –If C=H Definition of PAC learnable –If C is concept class and there exists some hypothesis space H such that hypotheses in H can be evaluated on given instances in polynomial time and such that C is PAC learnable by H This extension if from for given H to existence of H If C is properly PAC learnable, it is obviously PAC learnable (assuming hypotheses on C can be evaluated on give instance in polynomial time)

There are many variants of the basic definition. It can be shown they are equivalent. The model can be extended to various aspects.

We ask for a single algorithm A for all distribution Not that for every distribution D there exists an algorithm that was designed for the specific distribution D That means: algorithm A does not know the distribution.

A key part of PAC learning and the potential link to negative selection algorithm were try to make (if existing at all): probability distribution D The error probability is measured with respect to the same distribution according to which the random examples are chosen. if the learning algorithm will get random examples from a distribution which provides only samples with first bits 0 and the error will be measured with respect to distribution on strings whose first bit is 1 then clearly the learning algorithm has no chance to do much. NSA, at least my method, seems doing something no chance to do much described above, with a little help from the magic self threshold (or self radius) –NSAs notion of success is not well defined?

What does it mean by L is a PAC learning algorithm: –For any given, >0, there is a sample size m 0, such that for all target functions t computable and all probability distribution P, we have m>=m 0 P m (error(L(s),t)> )<

Unanswered questions How does negative selection algorithm fit into the model of PAC learning? Does NSA count as a learning process or algorithm at all?

references D Haussler. Probably approximately correct learning. In AAAI-90 Proceedings of the Eight National Conference on Artificial Intelligence, Boston, MA, pages American Association for Artificial Intelligence, ml ely_correct_learning ng_theory... …