Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate accuracy.

Slides:

Advertisements

Similar presentations

Estimation of Means and Proportions

Advertisements

Commonly Used Distributions

Previous Lecture: Distributions. Introduction to Biostatistics and Bioinformatics Estimation I This Lecture By Judy Zhong Assistant Professor Division.

Random Variables ECE460 Spring, 2012.

DISCRETE RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

Week11 Parameter, Statistic and Random Samples A parameter is a number that describes the population. It is a fixed number, but in practice we do not know.

Sampling: Final and Initial Sample Size Determination

Statistics review of basic probability and statistics.

Simulation with ArenaAppendix C – A Refresher on Probability and StatisticsSlide 1 of 33 A Refresher on Probability and Statistics Appendix C.

Sampling Distributions (§ )

Chapter 18 Sampling Distribution Models

Evaluation (practice). 2 Predicting performance  Assume the estimated error rate is 25%. How close is this to the true error rate?  Depends on the amount.

Sampling Distributions

Chapter 6 Introduction to Sampling Distributions

Probability Distributions

Fall 2006 – Fundamentals of Business Statistics 1 Chapter 6 Introduction to Sampling Distributions.

Topic 2: Statistical Concepts and Market Returns

Evaluating Hypotheses

Statistical Background

Continuous Random Variables and Probability Distributions

Copyright © 2014 by McGraw-Hill Higher Education. All rights reserved.

Experimental Evaluation

Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.

Statistical inference Population - collection of all subjects or objects of interest (not necessarily people) Sample - subset of the population used to.

BCOR 1020 Business Statistics

Standard error of estimate & Confidence interval.

SAMPLING DISTRIBUTION

Prof. SankarReview of Random Process1 Probability Sample Space (S) –Collection of all possible outcomes of a random experiment Sample Point –Each outcome.

1 Machine Learning: Lecture 5 Experimental Evaluation of Learning Algorithms (Based on Chapter 5 of Mitchell T.., Machine Learning, 1997)

Estimation Basic Concepts & Estimation of Proportions

Binomial and Related Distributions 學生 : 黃柏舜學號 : 授課老師 : 蔡章仁.

Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.

Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.

Random Sampling, Point Estimation and Maximum Likelihood.

Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.

Random Variables. A random variable X is a real valued function defined on the sample space, X : S  R. The set { s  S : X ( s )  [ a, b ] is an event}.

Bernoulli Trials Two Possible Outcomes –Success, with probability p –Failure, with probability q = 1  p Trials are independent.

Binomial Experiment A binomial experiment (also known as a Bernoulli trial) is a statistical experiment that has the following properties:

King Saud University Women Students

Chapter 7 Sampling and Sampling Distributions ©. Simple Random Sample simple random sample Suppose that we want to select a sample of n objects from a.

Physics 270 – Experimental Physics. Let say we are given a functional relationship between several measured variables Q(x, y, …) x ±  x and x ±  y What.

Random Variables Presentation 6.. Random Variables A random variable assigns a number (or symbol) to each outcome of a random circumstance. A random variable.

Chapter 2 Statistical Background. 2.3 Random Variables and Probability Distributions A variable X is said to be a random variable (rv) if for every real.

CpSc 881: Machine Learning Evaluating Hypotheses.

机器学习陈昱北京大学计算机科学技术研究所信息安全工程研究中心. 课程基本信息  主讲教师：陈昱 Tel ：  助教：程再兴， Tel ：  课程网页：

Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.

Machine Learning Chapter 5. Evaluating Hypotheses

1 CSI5388 Current Approaches to Evaluation (Based on Chapter 5 of Mitchell T.., Machine Learning, 1997)

Chapter5: Evaluating Hypothesis. 개요 개요 Evaluating the accuracy of hypotheses is fundamental to ML. - to decide whether to use this hypothesis - integral.

Ka-fu Wong © 2003 Chap 6- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Discrete Random Variables. Introduction In previous lectures we established a foundation of the probability theory; we applied the probability theory.

Analysis of Experimental Data; Introduction

Probability Theory Modelling random phenomena. Permutations the number of ways that you can order n objects is: n! = n(n-1)(n-2)(n-3)…(3)(2)(1) Definition:

SAMPLING DISTRIBUTION. 2 Introduction In real life calculating parameters of populations is usually impossible because populations are very large. Rather.

Copyright © Cengage Learning. All rights reserved. 5 Joint Probability Distributions and Random Samples.

Chapter 8 Estimation ©. Estimator and Estimate estimator estimate An estimator of a population parameter is a random variable that depends on the sample.

Engineering Probability and Statistics - SE-205 -Chap 3 By S. O. Duffuaa.

Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”

Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate its.

Section 7.3. Why we need Bayes?  How to assess the probability that a particular event occurs on the basis of partial evidence.  The probability p(F)

Sampling Distributions Chapter 18. Sampling Distributions A parameter is a number that describes the population. In statistical practice, the value of.

Evaluating Hypotheses

Engineering Probability and Statistics - SE-205 -Chap 3

Elementary Statistics

Lecture 13 Sections 5.4 – 5.6 Objectives:

Evaluating Hypotheses

Evaluating Hypothesis

Chapter 5: Sampling Distributions

Machine Learning: Lecture 5

How Confident Are You?.

Presentation transcript:

Evaluating Hypotheses

Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate accuracy cover additional samples – When data is limited, what is the best way to use this data to both learn a hypothesis and estimate its accuracy

Motivation Evaluate the performance of learned hypotheses as precisely as possible – Whether to use the hypothesis – Evaluating hypotheses is an integral component of many learning method Estimate its future accuracy given only a limited set of data – Bias in the estimate – Variance in the estimate

Estimating Hypothesis Accuracy There is some space of possible instance X over which many target functions may be defined. A convenient way to model this is to assume there is some unknown probability distribution D that defines the probability of encountering each instance in X The learning task is to learn the target concept or target function f by considering a space H of possible hypotheses

Estimating Hypothesis Problem Given a hypothesis h and a data sample containing n examples drawn at random according to the distribution D, what is the best estimate of the accuracy of h over future instances drawn from the same distribution What is the probable error in this accuracy estimate

Sample Error and True Error The sample error (error s (h)) of a hypothesis h with respect to target function f and data sample S The true error (error D (h)) of hypothesis h with respect to target function f and distribution D, is the probability that h will misclassify an instance drawn at random according to D

Sample Error and True Error We want to know the true error (error D (h)) of the hypothesis because this is the error we can expect when applying the hypothesis to future examples The one we can measure is error S (h) How good an estimate of error D (h) is provided by error s (h)

Confidence Intervals for Discrete- Valued Hypothesis H is a discrete-valued hypothesis – The sample S contains n examples drawn independent of one another, and independent of h, according to the probability distribution D – n>=30 – Hypothesis h commits r errors over these n examples Under these conditions, statistical theory allows us to make the following assertions – Given no other information, the most probable value of error D (h) is error S (h) – With approximately 95% probability, the true error error D (h) lies in the interval

Example Suppose the data sample S contains n=40 examples and that hypothesis h commits r=12 error over this data. In this case, error S (h)=0.3 Given no other information, the best estimate of the true error error D (h)=0.3 If we were to collect a second sample S’ containing 40 new randomly drawn examples, we might expect the sample error error S’ (h) If we repeated this experiment over and over, each time drawing a new sample containing 40 new examples, we would find that for approximately 95% of these experiments, the calculated interval would contain the true error. We call this interval the 95% confidence interval estimate for error D (h) 0.30 (1.96*0.07).

Confidence Intervals for Discrete- Valued Hypothesis We can calculate the 68% confidence interval in this case to be 0.30 (1.00*0.07). It makes intuitive sense that the 68% confidence interval is smaller than the 95% confidence interval Confidence level N%50%68%80%90%95%98%99% Constant Z N :

Basic of Sampling Theory A random variable A probability distribution The expect value The variance of a random variable The standard deviation The Binomial distribution The Normal distribution The estimation bias of Y A N% confidence interval

Random Variables A random variable is a function from the sample space of an experiment to the set of real numbers. That is, a random variable assigns a real number to each possible outcome. Suppose a coin is flipped 3 times. X(t): the random variable that equals the number of heads that appear when t is the outcome. X(HHH)=3 X(HHT)=X(HTH)=X(THH)=2 X(TTH)=X(THT)=X(HTT)=1 X(TTT)=0 12

Distribution of a Random Variable The distribution of a random variable X on a sample space S is the set of pairs (r, p(X=r)), where p(X=r) is the probability that X takes the value r. A distribution is usually described by specifying p(X=r) X(HHH)=3, X(HHT)=X(HTH)=X(THH)=2, X(TTH)=X(THT)=X(HTT)=1, X(TTT)=0 p(X=3)=1/8 p(X=2)=3/8 p(X=1)=3/8 p(X=0)=1/8 13

Expected Values The expected value of the random variable X(s) on the sample space S is equal to Example Let X be the number that comes up when a die is rolled. What is the expected value of X? 14

Variance and Standard Deviation Let X be a random variable on a sample space S. The variance of X, denoted by V(X), is The standard deviation of X, denoted by δ(X), is defined to be If X is a random variable on a sample space S, then V(X)=E(X 2 )-E(X) 2. If X and Y are two independent random variables on a sample space S, then V(X+Y)=V(X)+V(Y). Furthermore, if X i (i=1, 2, …, n) are pairwise independent random variables on S, then V(X 1 +X 2 +…+X n )=V(X 1 )+V(X 2 )+…+V(X n ). 15

Bernoulli Trials and Binomial Distribution Each performance of an experiment with 2 possible outcomes is called a Bernoulli trial. Bernoulli trials are mutually independent. Theorem The probability of exactly r successes in n independent Bernoulli trials, with probability of success p and probability of failure q=1-p, is C(n, r)p r q n-r. Binomial distribution b(r; n, p) = C(n, r)p r q n-r. A coin is biased so that the probability of heads is 2/3. What is the probability that exactly 4 heads come up when the coin is flipped 7 times, assuming that the flips are independent? C(7,4)(2/3) 4 (1/3) 3 =560/

The Binomial Distribution The general setting to which the Binomial distribution applies is – There is a base, or underlying, experiment whose outcome can be described by a random variable, say Y. – The probability that Y=1 on any single trial of the underlying experiment is given by some constant p, independent of the outcome of any other experiment – A series of n independent trials of the underlying experiment is performed. Let R denote the number of trials for which Y i =1 in this series of n experiments. – The probability that the random variable R will take on a specific value r is given by the Binomial distribution

The Binomial Distribution Expect Value – E[X]=np Variance – Var(X)=np(1-p) Standard Deviation

Error Estimation and Estimating Binomial Proportions Image that we run k such random experiments, measuring the random variables error S1 (h), error S2 (h), …, error Sk (h). As we allowed k to grow, the histogram would approach the Binomial distribution

The Binomial Distribution Estimating p from a random sample of coin tosses is equivalent to estimating error D (h). The probability p that a single random coin toss will turn up heads corresponds to the probability that a single instance drawn at random will be misclassified (p corresponds to error D (h)) Binomial distribution depends on the specific sample size n and the specific probability p or error D (h)

Error Estimation and Estimating Binomial Proportions Measuring the sample error is performing an experiment with a random outcome Collect a random sample S of n independently drawn instances from the distribution D, and then measure the sample error error S (h) Repeat this experiment many times. error Si (h) is a random variable

Estimators, Bias, and Variance If random variable error S (h) obeys a Binomial distribution, what is likely different between error S (h) and error D (h) We have – error S (h)=r/n – error D (h)=p Statisticians call error S (h) an estimator for the true error error D (h) Whether an estimator on average gives the right estimate.

Estimators, Bias, and Variance The estimation bias of an estimator Y for an arbitrary parameter p is E[Y]-p If the estimation is zero, we say that Y is an unbiased estimator of p. The average of many random values of Y generated by repeated random experiments converge toward p error S (h) is a Binomial distribution. Thus error S (h) is an unbiased estimator for error D (h) In order for error S (h) to give an unbiased estimate of error D (h), the hypothesis h and sample S must be chosen independently

Estimators, Bias, and Variance Example – N=40 – R=12 – Standard deviation of error S (h) is 2.9/40=0.07 In general, given r errors in a sample of n independently drawn test examples, the standard deviation of error S (h) is given by

Confidence Intervals Describe the uncertainty associated with an estimate is to give an interval within which the true value is expected to fall into this interval An N% confidence interval for some parameter p is an interval that is expected with probability N% to contain p error S (h) follows Binomial probability distribution. To derive a 95% confidence interval, we need only find the interval centered around the mean value error D (h), which is wide enough to contain 95%

Normal Distribution It is difficult to find the size of the interval that contains N% of the probability mass for Binomial distribution Sufficiently large examples sizes the Binomial distribution can be closely approximated by the Normal distribution Normal Distribution - Bell-shaped continuous distribution widely used in statistical inference A random variable X with mean  and standard deviation  is normally distributed if its probability density function is given by

The probability density function The probability that X will fall into interval (a, b) is given by The expected value, The variance of X is

Rule 68% of the data 95% of the data 99.7% of the data

Confidence Intervals If a random variable Y obeys a Normal distribution with mean  and standard deviation , then the measured random value y of Y will fall into the following interval N% of the time The mean  will fall into the following interval N% of the time

Confidence Intervals With 95% confidence, the value of random variable will lie in the two sided interval [-1.96,1.96]. Note that Z 0.95 = In estimating the standard deviation  of error S (h),we have approximated error D (h) by error S (h) 2. The Binomial distribution has been approximated by the Normal Distribution