Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate accuracy.

Slides:



Advertisements
Similar presentations
Estimation of Means and Proportions
Advertisements

Commonly Used Distributions
Previous Lecture: Distributions. Introduction to Biostatistics and Bioinformatics Estimation I This Lecture By Judy Zhong Assistant Professor Division.
Random Variables ECE460 Spring, 2012.
DISCRETE RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS
Week11 Parameter, Statistic and Random Samples A parameter is a number that describes the population. It is a fixed number, but in practice we do not know.
Sampling: Final and Initial Sample Size Determination
Statistics review of basic probability and statistics.
Simulation with ArenaAppendix C – A Refresher on Probability and StatisticsSlide 1 of 33 A Refresher on Probability and Statistics Appendix C.
Sampling Distributions (§ )
Chapter 18 Sampling Distribution Models
Evaluation (practice). 2 Predicting performance  Assume the estimated error rate is 25%. How close is this to the true error rate?  Depends on the amount.
Sampling Distributions
Chapter 6 Introduction to Sampling Distributions
Probability Distributions
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 6 Introduction to Sampling Distributions.
Topic 2: Statistical Concepts and Market Returns
Evaluating Hypotheses
Statistical Background
Continuous Random Variables and Probability Distributions
Copyright © 2014 by McGraw-Hill Higher Education. All rights reserved.
Experimental Evaluation
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
Statistical inference Population - collection of all subjects or objects of interest (not necessarily people) Sample - subset of the population used to.
BCOR 1020 Business Statistics
Standard error of estimate & Confidence interval.
SAMPLING DISTRIBUTION
Prof. SankarReview of Random Process1 Probability Sample Space (S) –Collection of all possible outcomes of a random experiment Sample Point –Each outcome.
1 Machine Learning: Lecture 5 Experimental Evaluation of Learning Algorithms (Based on Chapter 5 of Mitchell T.., Machine Learning, 1997)
Estimation Basic Concepts & Estimation of Proportions
Binomial and Related Distributions 學生 : 黃柏舜 學號 : 授課老師 : 蔡章仁.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.
Random Sampling, Point Estimation and Maximum Likelihood.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
Random Variables. A random variable X is a real valued function defined on the sample space, X : S  R. The set { s  S : X ( s )  [ a, b ] is an event}.
Bernoulli Trials Two Possible Outcomes –Success, with probability p –Failure, with probability q = 1  p Trials are independent.
Binomial Experiment A binomial experiment (also known as a Bernoulli trial) is a statistical experiment that has the following properties:
King Saud University Women Students
Chapter 7 Sampling and Sampling Distributions ©. Simple Random Sample simple random sample Suppose that we want to select a sample of n objects from a.
Physics 270 – Experimental Physics. Let say we are given a functional relationship between several measured variables Q(x, y, …) x ±  x and x ±  y What.
Random Variables Presentation 6.. Random Variables A random variable assigns a number (or symbol) to each outcome of a random circumstance. A random variable.
Chapter 2 Statistical Background. 2.3 Random Variables and Probability Distributions A variable X is said to be a random variable (rv) if for every real.
CpSc 881: Machine Learning Evaluating Hypotheses.
机器学习 陈昱 北京大学计算机科学技术研究所 信息安全工程研究中心. 课程基本信息  主讲教师:陈昱 Tel :  助教:程再兴, Tel :  课程网页:
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
Machine Learning Chapter 5. Evaluating Hypotheses
1 CSI5388 Current Approaches to Evaluation (Based on Chapter 5 of Mitchell T.., Machine Learning, 1997)
Chapter5: Evaluating Hypothesis. 개요 개요 Evaluating the accuracy of hypotheses is fundamental to ML. - to decide whether to use this hypothesis - integral.
Ka-fu Wong © 2003 Chap 6- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.
Discrete Random Variables. Introduction In previous lectures we established a foundation of the probability theory; we applied the probability theory.
Analysis of Experimental Data; Introduction
Probability Theory Modelling random phenomena. Permutations the number of ways that you can order n objects is: n! = n(n-1)(n-2)(n-3)…(3)(2)(1) Definition:
SAMPLING DISTRIBUTION. 2 Introduction In real life calculating parameters of populations is usually impossible because populations are very large. Rather.
Copyright © Cengage Learning. All rights reserved. 5 Joint Probability Distributions and Random Samples.
Chapter 8 Estimation ©. Estimator and Estimate estimator estimate An estimator of a population parameter is a random variable that depends on the sample.
Engineering Probability and Statistics - SE-205 -Chap 3 By S. O. Duffuaa.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate its.
Section 7.3. Why we need Bayes?  How to assess the probability that a particular event occurs on the basis of partial evidence.  The probability p(F)
Sampling Distributions Chapter 18. Sampling Distributions A parameter is a number that describes the population. In statistical practice, the value of.
Evaluating Hypotheses
Engineering Probability and Statistics - SE-205 -Chap 3
Elementary Statistics
Lecture 13 Sections 5.4 – 5.6 Objectives:
Evaluating Hypotheses
Evaluating Hypothesis
Chapter 5: Sampling Distributions
Machine Learning: Lecture 5
How Confident Are You?.
Presentation transcript:

Evaluating Hypotheses

Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate accuracy cover additional samples – When data is limited, what is the best way to use this data to both learn a hypothesis and estimate its accuracy

Motivation Evaluate the performance of learned hypotheses as precisely as possible – Whether to use the hypothesis – Evaluating hypotheses is an integral component of many learning method Estimate its future accuracy given only a limited set of data – Bias in the estimate – Variance in the estimate

Estimating Hypothesis Accuracy There is some space of possible instance X over which many target functions may be defined. A convenient way to model this is to assume there is some unknown probability distribution D that defines the probability of encountering each instance in X The learning task is to learn the target concept or target function f by considering a space H of possible hypotheses

Estimating Hypothesis Problem Given a hypothesis h and a data sample containing n examples drawn at random according to the distribution D, what is the best estimate of the accuracy of h over future instances drawn from the same distribution What is the probable error in this accuracy estimate

Sample Error and True Error The sample error (error s (h)) of a hypothesis h with respect to target function f and data sample S The true error (error D (h)) of hypothesis h with respect to target function f and distribution D, is the probability that h will misclassify an instance drawn at random according to D

Sample Error and True Error We want to know the true error (error D (h)) of the hypothesis because this is the error we can expect when applying the hypothesis to future examples The one we can measure is error S (h) How good an estimate of error D (h) is provided by error s (h)

Confidence Intervals for Discrete- Valued Hypothesis H is a discrete-valued hypothesis – The sample S contains n examples drawn independent of one another, and independent of h, according to the probability distribution D – n>=30 – Hypothesis h commits r errors over these n examples Under these conditions, statistical theory allows us to make the following assertions – Given no other information, the most probable value of error D (h) is error S (h) – With approximately 95% probability, the true error error D (h) lies in the interval

Example Suppose the data sample S contains n=40 examples and that hypothesis h commits r=12 error over this data. In this case, error S (h)=0.3 Given no other information, the best estimate of the true error error D (h)=0.3 If we were to collect a second sample S’ containing 40 new randomly drawn examples, we might expect the sample error error S’ (h) If we repeated this experiment over and over, each time drawing a new sample containing 40 new examples, we would find that for approximately 95% of these experiments, the calculated interval would contain the true error. We call this interval the 95% confidence interval estimate for error D (h) 0.30 (1.96*0.07).

Confidence Intervals for Discrete- Valued Hypothesis We can calculate the 68% confidence interval in this case to be 0.30 (1.00*0.07). It makes intuitive sense that the 68% confidence interval is smaller than the 95% confidence interval Confidence level N%50%68%80%90%95%98%99% Constant Z N :

Basic of Sampling Theory A random variable A probability distribution The expect value The variance of a random variable The standard deviation The Binomial distribution The Normal distribution The estimation bias of Y A N% confidence interval

Random Variables A random variable is a function from the sample space of an experiment to the set of real numbers. That is, a random variable assigns a real number to each possible outcome. Suppose a coin is flipped 3 times. X(t): the random variable that equals the number of heads that appear when t is the outcome. X(HHH)=3 X(HHT)=X(HTH)=X(THH)=2 X(TTH)=X(THT)=X(HTT)=1 X(TTT)=0 12

Distribution of a Random Variable The distribution of a random variable X on a sample space S is the set of pairs (r, p(X=r)), where p(X=r) is the probability that X takes the value r. A distribution is usually described by specifying p(X=r) X(HHH)=3, X(HHT)=X(HTH)=X(THH)=2, X(TTH)=X(THT)=X(HTT)=1, X(TTT)=0 p(X=3)=1/8 p(X=2)=3/8 p(X=1)=3/8 p(X=0)=1/8 13

Expected Values The expected value of the random variable X(s) on the sample space S is equal to Example Let X be the number that comes up when a die is rolled. What is the expected value of X? 14

Variance and Standard Deviation Let X be a random variable on a sample space S. The variance of X, denoted by V(X), is The standard deviation of X, denoted by δ(X), is defined to be If X is a random variable on a sample space S, then V(X)=E(X 2 )-E(X) 2. If X and Y are two independent random variables on a sample space S, then V(X+Y)=V(X)+V(Y). Furthermore, if X i (i=1, 2, …, n) are pairwise independent random variables on S, then V(X 1 +X 2 +…+X n )=V(X 1 )+V(X 2 )+…+V(X n ). 15

Bernoulli Trials and Binomial Distribution Each performance of an experiment with 2 possible outcomes is called a Bernoulli trial. Bernoulli trials are mutually independent. Theorem The probability of exactly r successes in n independent Bernoulli trials, with probability of success p and probability of failure q=1-p, is C(n, r)p r q n-r. Binomial distribution b(r; n, p) = C(n, r)p r q n-r. A coin is biased so that the probability of heads is 2/3. What is the probability that exactly 4 heads come up when the coin is flipped 7 times, assuming that the flips are independent? C(7,4)(2/3) 4 (1/3) 3 =560/

The Binomial Distribution The general setting to which the Binomial distribution applies is – There is a base, or underlying, experiment whose outcome can be described by a random variable, say Y. – The probability that Y=1 on any single trial of the underlying experiment is given by some constant p, independent of the outcome of any other experiment – A series of n independent trials of the underlying experiment is performed. Let R denote the number of trials for which Y i =1 in this series of n experiments. – The probability that the random variable R will take on a specific value r is given by the Binomial distribution

The Binomial Distribution Expect Value – E[X]=np Variance – Var(X)=np(1-p) Standard Deviation

Error Estimation and Estimating Binomial Proportions Image that we run k such random experiments, measuring the random variables error S1 (h), error S2 (h), …, error Sk (h). As we allowed k to grow, the histogram would approach the Binomial distribution

The Binomial Distribution Estimating p from a random sample of coin tosses is equivalent to estimating error D (h). The probability p that a single random coin toss will turn up heads corresponds to the probability that a single instance drawn at random will be misclassified (p corresponds to error D (h)) Binomial distribution depends on the specific sample size n and the specific probability p or error D (h)

Error Estimation and Estimating Binomial Proportions Measuring the sample error is performing an experiment with a random outcome Collect a random sample S of n independently drawn instances from the distribution D, and then measure the sample error error S (h) Repeat this experiment many times. error Si (h) is a random variable

Estimators, Bias, and Variance If random variable error S (h) obeys a Binomial distribution, what is likely different between error S (h) and error D (h) We have – error S (h)=r/n – error D (h)=p Statisticians call error S (h) an estimator for the true error error D (h) Whether an estimator on average gives the right estimate.

Estimators, Bias, and Variance The estimation bias of an estimator Y for an arbitrary parameter p is E[Y]-p If the estimation is zero, we say that Y is an unbiased estimator of p. The average of many random values of Y generated by repeated random experiments converge toward p error S (h) is a Binomial distribution. Thus error S (h) is an unbiased estimator for error D (h) In order for error S (h) to give an unbiased estimate of error D (h), the hypothesis h and sample S must be chosen independently

Estimators, Bias, and Variance Example – N=40 – R=12 – Standard deviation of error S (h) is 2.9/40=0.07 In general, given r errors in a sample of n independently drawn test examples, the standard deviation of error S (h) is given by

Confidence Intervals Describe the uncertainty associated with an estimate is to give an interval within which the true value is expected to fall into this interval An N% confidence interval for some parameter p is an interval that is expected with probability N% to contain p error S (h) follows Binomial probability distribution. To derive a 95% confidence interval, we need only find the interval centered around the mean value error D (h), which is wide enough to contain 95%

Normal Distribution It is difficult to find the size of the interval that contains N% of the probability mass for Binomial distribution Sufficiently large examples sizes the Binomial distribution can be closely approximated by the Normal distribution Normal Distribution - Bell-shaped continuous distribution widely used in statistical inference A random variable X with mean  and standard deviation  is normally distributed if its probability density function is given by

The probability density function The probability that X will fall into interval (a, b) is given by The expected value, The variance of X is

Rule 68% of the data 95% of the data 99.7% of the data

Confidence Intervals If a random variable Y obeys a Normal distribution with mean  and standard deviation , then the measured random value y of Y will fall into the following interval N% of the time The mean  will fall into the following interval N% of the time

Confidence Intervals With 95% confidence, the value of random variable will lie in the two sided interval [-1.96,1.96]. Note that Z 0.95 = In estimating the standard deviation  of error S (h),we have approximated error D (h) by error S (h) 2. The Binomial distribution has been approximated by the Normal Distribution