1 CS 4700: Foundations of Artificial Intelligence Prof. Bart Selman Machine Learning: The Theory of Learning R&N 18.5.

Slides:



Advertisements
Similar presentations
Pretty-Good Tomography Scott Aaronson MIT. Theres a problem… To do tomography on an entangled state of n qubits, we need exp(n) measurements Does this.
Advertisements

PAC Learning 8/5/2005. purpose Effort to understand negative selection algorithm from totally different aspects –Statistics –Machine learning What is.
Evaluating Classifiers
An Efficient Membership-Query Algorithm for Learning DNF with Respect to the Uniform Distribution Jeffrey C. Jackson Presented By: Eitan Yaakobi Tamar.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Online learning, minimizing regret, and combining expert advice
Hypothesis Testing A hypothesis is a claim or statement about a property of a population (in our case, about the mean or a proportion of the population)
Machine Learning with Discriminative Methods Lecture 02 – PAC Learning and tail bounds intro CS Spring 2015 Alex Berg.
Drawing Samples in “Observational Studies” Sample vs. the Population How to Draw a Random Sample What Determines the “Margin of Error” of a Poll?
Final Exam: May 10 Thursday. If event E occurs, then the probability that event H will occur is p ( H | E ) IF E ( evidence ) is true THEN H ( hypothesis.
Probability theory and average-case complexity. Review of probability theory.
Chapter 10: Hypothesis Testing
Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18 Spring 2004.
Probably Approximately Correct Learning Yongsub Lim Applied Algorithm Laboratory KAIST.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18 Fall 2005.
Learning From Observations
Computational Learning Theory
Prof. Bart Selman Module Probability --- Part d)
Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18 Fall 2004.
Machine Learning Tuomas Sandholm Carnegie Mellon University Computer Science Department.
Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18.
Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,
Machine Learning CSE 473. © Daniel S. Weld Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning.
CS 4700: Foundations of Artificial Intelligence
Lecture 2: Basic Information Theory Thinh Nguyen Oregon State University.
Experimental Evaluation
Information Theory and Security
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Inference about Population Parameters: Hypothesis Testing
Statistical Hypothesis Testing. Suppose you have a random variable X ( number of vehicle accidents in a year, stock market returns, time between el nino.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 9. Hypothesis Testing I: The Six Steps of Statistical Inference.
Estimation and Hypothesis Testing. The Investment Decision What would you like to know? What will be the return on my investment? Not possible PDF for.
Is this quarter fair? How could you determine this? You assume that flipping the coin a large number of times would result in heads half the time (i.e.,
1 Today Null and alternative hypotheses 1- and 2-tailed tests Regions of rejection Sampling distributions The Central Limit Theorem Standard errors z-tests.
Chapter 19 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 A dichotomous outcome is one that has only two possibilities (e.g., pass or fail;
Probability Rules!. ● Probability relates short-term results to long-term results ● An example  A short term result – what is the chance of getting a.
CS Learning Rules1 Learning Sets of Rules. CS Learning Rules2 Learning Rules If (Color = Red) and (Shape = round) then Class is A If (Color.
CS344: Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 36-37: Foundation of Machine Learning.
 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 5.
Chapter 7: Sample Variability Empirical Distribution of Sample Means.
Statistical Hypotheses & Hypothesis Testing. Statistical Hypotheses There are two types of statistical hypotheses. Null Hypothesis The null hypothesis,
PSY2004 Research Methods PSY2005 Applied Research Methods Week Five.
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
Hypothesis Testing An understanding of the method of hypothesis testing is essential for understanding how both the natural and social sciences advance.
Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.
CS Inductive Bias1 Inductive Bias: How to generalize on novel data.
: An alternative representation of level of significance. - normal distribution applies. - α level of significance (e.g. 5% in two tails) determines the.
Simple examples of the Bayesian approach For proportions and means.
Inference: Probabilities and Distributions Feb , 2012.
1 Hypothesis Testing Basic Problem We are interested in deciding whether some data credits or discredits some “hypothesis” (often a statement about the.
Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University Today: Computational Learning Theory Probably Approximately.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
Pseudo-random generators Talk for Amnon ’ s seminar.
Carla P. Gomes CS4700 Computational Learning Theory Slides by Carla P. Gomes and Nathalie Japkowicz (Reading: R&N AIMA 3 rd ed., Chapter 18.5)
SPSS Problem and slides Is this quarter fair? How could you determine this? You assume that flipping the coin a large number of times would result in.
SPSS Homework Practice The Neuroticism Measure = S = 6.24 n = 54 How many people likely have a neuroticism score between 29 and 34?
Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. 16 Mathematics of Normal Distributions 16.1Approximately Normal.
1 Chapter 9 Undecidability  Turing Machines Coded as Binary Strings  Universal Turing machine  Diagonalizing over Turing Machines  Problems as Languages.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
CS-424 Gregory Dudek Lecture 14 Learning –Inductive inference –Probably approximately correct learning.
SPSS Homework Practice The Neuroticism Measure = S = 6.24 n = 54 How many people likely have a neuroticism score between 29 and 34?
© Jude Shavlik 2006 David Page 2007 CS 760 – Machine Learning (UW-Madison)Lecture #28, Slide #1 Theoretical Approaches to Machine Learning Early work (eg.
Computational Learning Theory
Presented By S.Yamuna AP/CSE
Issues in Decision-Tree Learning Avoiding overfitting through pruning
CS 4700: Foundations of Artificial Intelligence
CS344 : Introduction to Artificial Intelligence
Lecture 14 Learning Inductive inference
Presentation transcript:

1 CS 4700: Foundations of Artificial Intelligence Prof. Bart Selman Machine Learning: The Theory of Learning R&N 18.5

2 Machine Learning Theory Central question in machine learning: How can we be sure that the hypothesis produced by a learning alg. will produce the correct answer on previously unseen examples? Question in some sense too vague… Leads to the general problem of inductive inference: How can we generalize from data? Major advance: Computational Learning Theory Valiant 1984; Turing Award 2010.

3

4

5

6 Probabilities to the rescue: certain “bad” events (e.g. learner learns the wrong hypothesis) become exponentially rare with enough data.

7

8 (or higher.)

9 S = # of “heads”

10 Wow! S = # of “heads” So, when flipping a fair coin 10,000 times, you’ll “never, ever” see more than 5,500 heads. And, no one ever will… I.e., with enough data, we can be very certain that if coin is fair, we won’t see a “large” deviation in terms of #heads vs. #tails.

11

12

13

14 Also, makes interesting learning algorithms possible!

15

16 Two issues: (1) (Re: High prob.) May get a “bad” sequence of training examples (many “relatively short men). Small risk but still small risk of getting wrong hypothesis. (2) We only use a small set of all examples (otherwise not real learning). So, we may not get hypothesis exactly correct. Finally, want efficient learning --- polytime!!

17 Valiant’s genius was to focus in on the “simplest” models that still captured all the key aspects we want in a “learning machine.” PAC role has been profound even though mainly to shine a “theoretical” light.

18

19

20 For our learning algorithm, we simply use a method that keeps the hypothesis consistent with all examples seen so far. Can start out with an hypothesis that says “No” to all examples. Then when first positive example comes in, minimally modify hypothesis to make it consistent with that example. Proceed doing that for every new pos example.

21 So, h_b could mistakenly be learned! Note: we want to make it likely that all consistent hypotheses are approximately correct. So, no “bad” consistent hypothesis occur at all.

22 ² and ± are assumed given (set as desired) Keep size hypothesis class H down. Another showing of Ockham’s razor!

23 So, in this setting the requirements on our learning algorithm are quite minimal. We do also want poly time though.

24 Aside: Shannon already noted that the vast majority of Boolean functions on N letters look “random” and cannot be compressed. (No structure!)

25

26

27

28

29

30

31

32 Still need polytime learning alg. to generate a consistent decision list with m examples.

33

34

35

36

37

38

39

40 Some more PAC learnability examples.

41

42

43

44

45

46

47

48

49

50