+ + + + + + + + + + + + - - - - - - - - - - - - Example set X Can Inductive Learning Work? Hypothesis space H Training set  Inductive hypothesis h size.

Slides:



Advertisements
Similar presentations
Computational Learning Theory
Advertisements

Inapproximability of Hypergraph Vertex-Cover. A k-uniform hypergraph H= : V – a set of vertices E - a collection of k-element subsets of V Example: k=3.
Evaluating Classifiers
BOOSTING & ADABOOST Lecturer: Yishay Mansour Itay Dangoor.
PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University.
Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18 Spring 2004.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAA A.
Feasibility of learning: the issues solution for infinite hypothesis sets VC generalization bound (mostly lecture 5 on AMLbook.com)
Probably Approximately Correct Learning Yongsub Lim Applied Algorithm Laboratory KAIST.
Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18 Fall 2005.
Northwestern University Winter 2007 Machine Learning EECS Machine Learning Lecture 13: Computational Learning Theory.
Probably Approximately Correct Model (PAC)
Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18 Fall 2004.
Machine Learning Tuomas Sandholm Carnegie Mellon University Computer Science Department.
Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18.
CS 4700: Foundations of Artificial Intelligence
Experimental Evaluation
Probably Approximately Correct Learning (PAC) Leslie G. Valiant. A Theory of the Learnable. Comm. ACM (1984)
PAC learning Invented by L.Valiant in 1984 L.G.ValiantA theory of the learnable, Communications of the ACM, 1984, vol 27, 11, pp
1 Machine Learning: Lecture 5 Experimental Evaluation of Learning Algorithms (Based on Chapter 5 of Mitchell T.., Machine Learning, 1997)
Estimation and Hypothesis Testing. The Investment Decision What would you like to know? What will be the return on my investment? Not possible PDF for.
Inductive Learning (1/2) Decision Tree Method
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
Random Sampling, Point Estimation and Maximum Likelihood.
1 Machine Learning What is learning?. 2 Machine Learning What is learning? “That is what learning is. You suddenly understand something you've understood.
Machine Learning Chapter 11.
CS Learning Rules1 Learning Sets of Rules. CS Learning Rules2 Learning Rules If (Color = Red) and (Shape = round) then Class is A If (Color.
CS-424 Gregory Dudek Lecture 14 Learning –Probably approximately correct learning (cont’d) –Version spaces –Decision trees.
 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 5.
Learning from Observations Chapter 18 Section 1 – 3, 5-8 (presentation TBC)
Machine Learning Chapter 2. Concept Learning and The General-to-specific Ordering Tom M. Mitchell.
1 Machine Learning: Lecture 8 Computational Learning Theory (Based on Chapter 7 of Mitchell T.., Machine Learning, 1997)
Decision Tree Learning R&N: Chap. 18, Sect. 18.1–3.
CpSc 881: Machine Learning Evaluating Hypotheses.
机器学习 陈昱 北京大学计算机科学技术研究所 信息安全工程研究中心. 课程基本信息  主讲教师:陈昱 Tel :  助教:程再兴, Tel :  课程网页:
Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University.
Machine Learning Chapter 5. Evaluating Hypotheses
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
1 CSI5388 Current Approaches to Evaluation (Based on Chapter 5 of Mitchell T.., Machine Learning, 1997)
Chapter5: Evaluating Hypothesis. 개요 개요 Evaluating the accuracy of hypotheses is fundamental to ML. - to decide whether to use this hypothesis - integral.
MAT 1221 Survey of Calculus Section 4.5 Derivatives of Logarithmic Functions
Goal of Learning Algorithms  The early learning algorithms were designed to find such an accurate fit to the data.  A classifier is said to be consistent.
Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University Today: Computational Learning Theory Probably Approximately.
Data Mining and Decision Support
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
INDUCTION Slides of Ken Birman, Cornell University.
Carla P. Gomes CS4700 Computational Learning Theory Slides by Carla P. Gomes and Nathalie Japkowicz (Reading: R&N AIMA 3 rd ed., Chapter 18.5)
Inductive Learning (2/2) Version Space and PAC Learning Russell and Norvig: Chapter 18, Sections 18.5 through 18.7 Chapter 18, Section 18.5 Chapter 19,
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate its.
CS-424 Gregory Dudek Lecture 14 Learning –Inductive inference –Probably approximately correct learning.
Write in logarithmic form Write in exponential form Write in exponential form Math
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
CS623: Introduction to Computing with Neural Nets (lecture-18) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
Evaluating Hypotheses
Computational Learning Theory
Express the equation {image} in exponential form
Computational Learning Theory
Evaluating Hypotheses
LEARNING Chapter 18b and Chuck Dyer
Computational Learning Theory
Derivatives of Logarithmic Functions
4.3 – Differentiation of Exponential and Logarithmic Functions
Machine Learning: UNIT-3 CHAPTER-2
Evaluating Hypothesis
Lecture 14 Learning Inductive inference
Machine Learning Chapter 2
Inductive Learning (2/2) Version Space and PAC Learning
Machine Learning Chapter 2
Presentation transcript:

Example set X Can Inductive Learning Work? Hypothesis space H Training set  Inductive hypothesis h size m size |H| h : hypothesis that agrees with all examples in  p( x ): probability that example x is picked from X L

Approximately Correct Hypothesis h  H is approximately correct (AC) with accuracy  iff: Pr[ h ( x ) correct] > 1 –  where x is an example picked with probability distribution p from X

PAC Learning Algorithm  A leaning algorithm L is Provably Approximately Correct (PAC) with confidence 1  iff the probability that it generates a non-AC hypothesis h is   : Pr[ h is non-AC ]   Can L be PAC if the size m of the training set  is large enough? If yes, how big should m be?

Intuition  If m is large enough and g  H is not AC, it is unlikely that it agrees with all examples in the training dataset   So, if m is large enough, there should be few non-AC hypotheses that agree with all examples in   Hence, it is unlikely that L will pick one

Can L Be PAC?  Let g be an arbitrary hypothesis in H that is not approximately correct  Since g is not AC, we have: Pr[ g ( x ) correct]  1–   The probability that g is consistent with all the examples in  is at most (1-  ) m  The probability that there exists a non-AC hypothesis matching all examples in  is at most |H|(1-  ) m  Therefore, L is PAC if m verifies: |H|(1-  ) m   L is PAC if Pr[h is non-AC]   h  H is AC iff: Pr[h( x ) correct] > 1– 

Calculus  H = {h1, h2, …, h |H| }  Pr(hi is not-AC and agrees with  )  (1-  ) m Pr(h1, or h2, …, is not-AC and agrees with  )   i=1,…,|H| Pr(hi is not-AC and agrees with  )  |H| (1-  ) m

Size of Training Set  From |H|(1  ) m   we derive: m  ln(  /|H|) / ln(1-  )  Since  <  ln(1  ) for 0 <  <1, we have: m  ln(  /|H|) / (  ) m  ln(|H|/  ) /   So, m increases logarithmically with the size of the hypothesis space But how big is |H|?

 If H is the set of all logical sentences with n observable predicates, then |H| =, and m is exponential in n  If H is the set of all conjunctions of k << n observable predicates picked among n predicates, then |H| = O(n k ) and m is logarithmic in n   Importance of choosing a “good” KIS bias Importance of KIS Bias 2 2n2n