CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 19 Jim Martin.

Slides:



Advertisements
Similar presentations
Computer Science CPSC 322 Lecture 25 Top Down Proof Procedure (Ch 5.2.2)
Advertisements

ECS 15 if and random. Topic  Testing user input using if statements  Truth and falsehood in Python  Getting random numbers.
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 20 Jim Martin.
Logic CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
PROBABILITY. Uncertainty  Let action A t = leave for airport t minutes before flight from Logan Airport  Will A t get me there on time ? Problems :
Hypothesis Testing A hypothesis is a claim or statement about a property of a population (in our case, about the mean or a proportion of the population)
HUDM4122 Probability and Statistical Inference March 30, 2015.
Chapter 10: Hypothesis Testing
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
CPSC 322, Lecture 23Slide 1 Logic: TD as search, Datalog (variables) Computer Science cpsc322, Lecture 23 (Textbook Chpt 5.2 & some basic concepts from.
Machine Learning Week 2 Lecture 2.
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 28 Jim Martin.
Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18 Fall 2005.
Class Project Due at end of finals week Essentially anything you want, so long as its AI related and I approve Any programming language you want In pairs.
CPSC 322, Lecture 23Slide 1 Logic: TD as search, Datalog (variables) Computer Science cpsc322, Lecture 23 (Textbook Chpt 5.2 & some basic concepts from.
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 12 Jim Martin.
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 9 Jim Martin.
Chapter 7 Sampling and Sampling Distributions
Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18 Fall 2004.
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 17 Jim Martin.
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 8 Jim Martin.
This material in not in your text (except as exercises) Sequence Comparisons –Problems in molecular biology involve finding the minimum number of edit.
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 18 Jim Martin.
Foundations of Network and Computer Security J J ohn Black Lecture #3 Aug 28 th 2009 CSCI 6268/TLEN 5550, Fall 2009.
LEARNING DECISION TREES
Hypothesis Tests for Means The context “Statistical significance” Hypothesis tests and confidence intervals The steps Hypothesis Test statistic Distribution.
CS 4700: Foundations of Artificial Intelligence
Statistics for the Social Sciences Psychology 340 Fall 2006 Hypothesis testing.
Stat 217 – Day 15 Statistical Inference (Topics 17 and 18)
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 22 Jim Martin.
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 5 Jim Martin.
The Marriage Problem Finding an Optimal Stopping Procedure.
Inferential Statistics Part 2: Hypothesis Testing Chapter 9 p
Hypothesis Testing. Distribution of Estimator To see the impact of the sample on estimates, try different samples Plot histogram of answers –Is it “normal”
Estimation and Hypothesis Testing. The Investment Decision What would you like to know? What will be the return on my investment? Not possible PDF for.
Chapter 8 Hypothesis testing 1. ▪Along with estimation, hypothesis testing is one of the major fields of statistical inference ▪In estimation, we: –don’t.
Testing Hypotheses Tuesday, October 28. Objectives: Understand the logic of hypothesis testing and following related concepts Sidedness of a test (left-,
1 9/8/2015 MATH 224 – Discrete Mathematics Basic finite probability is given by the formula, where |E| is the number of events and |S| is the total number.
Inference is a process of building a proof of a sentence, or put it differently inference is an implementation of the entailment relation between sentences.
All of Statistics Chapter 5: Convergence of Random Variables Nick Schafer.
Lecture Discrete Probability. 5.1 Probabilities Important in study of complexity of algorithms. Modeling the uncertain world: information, data.
ECS 10 10/8. Outline Announcements Homework 2 questions Boolean expressions If/else statements State variables and avoiding sys.exit(…) Example: Coin.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 25 Wednesday, 20 October.
LOGISTIC REGRESSION David Kauchak CS451 – Fall 2013.
1 Logical Agents CS 171/271 (Chapter 7) Some text and images in these slides were drawn from Russel & Norvig’s published material.
Class Project Due at end of finals week Essentially anything you want, so long as its AI related and I approve Any programming language you want In pairs.
IE241: Introduction to Hypothesis Testing. We said before that estimation of parameters was one of the two major areas of statistics. Now let’s turn to.
1 Reasoning Under Uncertainty Artificial Intelligence Chapter 9.
Computing & Information Sciences Kansas State University Wednesday, 22 Oct 2008CIS 530 / 730: Artificial Intelligence Lecture 22 of 42 Wednesday, 22 October.
1 Psych 5500/6500 The t Test for a Single Group Mean (Part 1): Two-tail Tests & Confidence Intervals Fall, 2008.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 25 of 41 Monday, 25 October.
5.1 Chapter 5 Inference in the Simple Regression Model In this chapter we study how to construct confidence intervals and how to conduct hypothesis tests.
1 Logical Agents CS 171/271 (Chapter 7) Some text and images in these slides were drawn from Russel & Norvig’s published material.
Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.
Copyright © 2010 Pearson Education, Inc. Unit 4 Chapter 14 From Randomness to Probability.
Chapter 19 Confidence intervals for proportions
Welcome to MM570 Psychological Statistics
1 CS 4700: Foundations of Artificial Intelligence Prof. Bart Selman Machine Learning: The Theory of Learning R&N 18.5.
Computer Science CPSC 322 Lecture 22 Logical Consequences, Proof Procedures (Ch 5.2.2)
1 Hypothesis Testing Basic Problem We are interested in deciding whether some data credits or discredits some “hypothesis” (often a statement about the.
Machine Learning in Practice Lecture 10 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.
Tests of Significance We use test to determine whether a “prediction” is “true” or “false”. More precisely, a test of significance gets at the question.
Review Statistical inference and test of significance.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
C ODEBREAKER Class discussion.
Issues in Decision-Tree Learning Avoiding overfitting through pruning
Presentation transcript:

CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 19 Jim Martin

CSCI 5582 Fall 2006 Today 11/13 Decision Lists Break –Quiz review –New HWs Boosting

CSCI 5582 Fall 2006 Decision Lists Each element in the list is a test that an object can pass or fail. If it passes, emit the label associated with the test. If it fails, move to the next test. If an object fails all the tests emit a default answer. The tests are propositional logic statements where the feature/value combinations are atomic propositions.

CSCI 5582 Fall 2006 Decision Lists

CSCI 5582 Fall 2006 Decision Lists Key parameters: –Maximum allowable length of the list –Maximum number of elements in a test –Logical connectives allowed in the test The longer the lists, and the more complex the tests, the larger the hypothesis space.

CSCI 5582 Fall 2006 Decision List Learning

CSCI 5582 Fall 2006 Training Data # F1 (In/Out) F2 (Meat/Veg) F3 (Red/Gree n/Blue) Label 1 InVegRedYes 2 OutMeatGreenYes 3 InVegRedYes 4 InMeatRedYes 5 InVegRedYes 6 OutMeatGreenYes 7 OutMeatRedNo 8 OutVegGreenNo

CSCI 5582 Fall 2006 Decision Lists Let’s try [F1 = In]  Yes

CSCI 5582 Fall 2006 Training Data # F1 (In/Out) F2 (Meat/Veg) F3 (Red/Gree n/Blue) Label 1 InVegRedYes 2 OutMeatGreenYes 3 InVegRedYes 4 InMeatRedYes 5 InVegRedYes 6 OutMeatGreenYes 7 OutMeatRedNo 8 OutVegGreenNo

CSCI 5582 Fall 2006 Decision Lists [F1 = In]  Yes [F2 = Veg]  No

CSCI 5582 Fall 2006 Training Data # F1 (In/Out) F2 (Meat/Veg) F3 (Red/Gree n/Blue) Label 1 InVegRedYes 2 OutMeatGreenYes 3 InVegRedYes 4 InMeatRedYes 5 InVegRedYes 6 OutMeatGreenYes 7 OutMeatRedNo 8 OutVegGreenNo

CSCI 5582 Fall 2006 Decision Lists [F1 = In]  Yes [F2 = Veg]  No [F3=Green]  Yes

CSCI 5582 Fall 2006 Training Data # F1 (In/Out) F2 (Meat/Veg) F3 (Red/Gree n/Blue) Label 1 InVegRedYes 2 OutMeatGreenYes 3 InVegRedYes 4 InMeatRedYes 5 InVegRedYes 6 OutMeatGreenYes 7 OutMeatRedNo 8 OutVegGreenNo

CSCI 5582 Fall 2006 Decision Lists [F1 = In]  Yes [F2 = Veg]  No [F3=Green]  Yes No

CSCI 5582 Fall 2006 Covering and Splitting The decision tree learning algorithm is a splitting approach. –The training set is split apart according to the results of a test –Until all the splits are uniform Decision list learning is a covering algorithm –Tests are generated that uniformly cover a subset of the training set –Until all the data are covered

CSCI 5582 Fall 2006 Choosing a Test What tests should be put at the front of the list? –Tests that are simple? –Tests that uniformly cover large numbers of examples? –Both?

CSCI 5582 Fall 2006 Choosing a Test What about choosing tests that only cover small numbers of examples? –Would that ever be a good idea? Sure, suppose that you have a large heterogeneous group with one label. And a small homogeneous group with a different label. You don’t need to characterize the big group, just the small one.

CSCI 5582 Fall 2006 Decision Lists The flexibility in defining the tests and the length of the lists is a big advantage to decision lists. –(Decision trees can end up being a bit unwieldy)

CSCI 5582 Fall 2006 What Does Matter? I said that in practical applications the choice of ML technique doesn’t really matter. They will all result in the same error rate (give or take) So what does matter?

CSCI 5582 Fall 2006 What Matters Having the right set of features in the training set Having enough training data

CSCI 5582 Fall 2006 Break The next quiz will be on 11/28. It will cover the ML material and the probabilistic sequence material. The readings for this quiz are: –Chapter 18 –Chapter 19 –Chapter 20: –HMM chapter posted on the web

CSCI 5582 Fall 2006 Quiz 1. True 2. Soundness: All inferred sentences are entailed. 3. Stench and Wumpus 4. Probabilities and Wumpus 5. Belief nets.

CSCI 5582 Fall 2006 Wumpus

CSCI 5582 Fall 2006 Wumpus What do you know about the presence or absence of a wumpus in [2,3] before the game even begins? What do you know about it after you first detect a stench in [1,3]?

CSCI 5582 Fall 2006 Wumpus (Q 3) a)~S22 => (~w23 ^ ~w12 ^ ~w32 ^ ~w21) b)By MP a)~S22 and the above rule: ~w23 ^ ~w12 ^ ~w32 ^ ~w21 By And elim ~w23 c)We know ~Wx,y in all but W33. We know that there has to be one wumpus (w11 or w12 or w13…) Successive resolutions will result in W33.

CSCI 5582 Fall 2006 Wumpus (Q4) P(W|S) = P(S|W) P(W)/P(S) = P(W)/P(S) =.25 / P(S) =.25 / P(S,W)+P(S,~W) =.25 /P(S|W)P(W)+P(S|~W)P(~W) =.25 /(.25 + P(S|~W)*.75)

CSCI 5582 Fall 2006 Q 5 CL SX P(C)P(L)P(S|C,L)P(X|L)

CSCI 5582 Fall 2006 Q5 b) P(L|S) = P(L,S)/P(S) c) P(C|S,~X) = P(C,S, ~X)/P(S,~X)

CSCI 5582 Fall 2006 HWs We’ll have two remaining HWs. The next one will be due 12/5; the second is due on 12/14. Basic idea for assignment 1: –I give you two bodies of texts by two authors (labeled). You train a system to recognize the work of each author. –To test I give you new texts by each author.

CSCI 5582 Fall 2006 Colloquium I’m giving the colloquium on Thursday on natural language processing research here at CU. Your chance to heckle me in public. –Ask me where the HWs are.

CSCI 5582 Fall 2006 Computational Learning Theory The big elements of this are: –|H| the size of the hypothesis space For lists, the number of possible lists –The number of training instances –The acceptable error rate of a hypothesis –The probability that a given hypothesis is a good one (has an error rate in the acceptable range).

CSCI 5582 Fall 2006 CLT First an exercise with a coin... –A bunch of folks get identical copies of a coin. Their job is to say its either a normal coin or a two-headed coin. By getting the results of flips (without examining both sides) –Let’s say that you go about this by assuming one hypothesis and try to disprove that hypothesis via flips

CSCI 5582 Fall 2006 Coin Flipping Ok, given this framework what’s a good hypothesis (fair vs. fake). –Fake Fake can be disproved by one flip (tails) Fair can’t be logically disproved by any number of flips.

CSCI 5582 Fall 2006 Coin Flipping You let these people flip five times. The lucky folks will encounter a tails and report the coin is fair. The unlucky folks will get 5 heads in a row and… report that they think its fake. –How many? 1/32

CSCI 5582 Fall 2006 Coin Flipping Say there are 320 flippers… How many unlucky folks will there be? –10 Ok… now you decide you’re going to ask a random person what they think about this coin and that’s the answer you’re going to go with. What’s the probability that you’ll stumble over an unlucky flipper –1/32

CSCI 5582 Fall 2006 CLT Back to learning… –Learning is viewed as candidate hypothesis elimination. –Each training example can be seen as a filter on the space of possible hypotheses –Hypotheses inconsistent with a training example are filtered out leaving the ones that are consistent (give the right answer) –What do we know about those?

CSCI 5582 Fall 2006 CLT Ok what about the ones that are consistent… two kinds –Hypotheses that are flat out wrong, but just coincidentally give the right answer –Hypotheses that are basically right (and got the right answer because they’re normally right)

CSCI 5582 Fall 2006 CLT So run the training data as a filter on the hypotheses When the data runs out pick a random hypothesis from among the ones still left standing (remember we don’t know what the right answer is). Can we say something about the probability of being unlucky and picking a hypothesis that is really wrong?

CSCI 5582 Fall 2006 CLT Yup… its clearly based on the size of the hypothesis space and just how long a bad hypothesis can keep giving the right answers (ie. The size of the training set).

CSCI 5582 Fall 2006 Hypothesis Space

CSCI 5582 Fall 2006 Bad Hypotheses Say we’re happy with any hypothesis that has an error rate no more than 5%. So any hypothesis with an error rate greater than 5% is in H_bad.

CSCI 5582 Fall 2006 Bad Hypotheses Look at one with error rate of 20%. The probability of it being correct on any given example is?.8 Probability correct on n examples?.8^n How many of those will there be? ? <.8^n * |H_bad| <.8^n*|H|

CSCI 5582 Fall 2006 So… The name of the game is to say that if I want to be X% sure that I’m going to have a solution with an error rate no worse than Y% then either I have to –Reduce the number of surviving bad hypotheses More training examples –Or reduce |H| Restrict the hypothesis space by restricting the expressiveness of the possible answers –Or provide a bias for how to select from among surviving hypotheses (Occam).

CSCI 5582 Fall 2006 Next Thursday –Ensembles (Sec 18.4) –SVMs and NNs (20.5 and 20.6) Next week –Chapter 19: learning with knowledge