Evaluating Hypothesis 자연언어처리연구실 장 정 호
개요 Evaluating the accuracy of hypotheses is fundamental to ML. - to decide whether to use this hypothesis - integral component of many learning system Difficulty from limited set of data - Bias in the estimate - Variance in the estimate
1. Contents Methods for evaluating learned hypotheses Methods for comparing the accuracy of two hypotheses Methods for comparing the accuracy of two learning algorithms when limited set of data is available
2. Estimating Hypothesis Accuracy Two Interests 1. Given a hypothesis h and a data sample, what is the best estimate of the accuracy of h over unseen data? 2. What is probable error in accuracy estimate?
2. Evaluating… (Cont’d) Two Definitions of Error 1. Sample Error with respect to target function f and data sample S, 2. True Error with respect to target function f and distribution D, How good an estimate of errorD(h) is provided by errorS(h)?
2. Evaluating… (Cont’d) Problems Causing Estimating Error 1. Bias : if S is training set, errorS(h) is optimistically biased estimation bias = E[errorS(h)] - errorD(h) For unbiased estimate, h and S must be chosen independently 2. Variance : Even with unbiased S, errorS(h) may vary from errorD(h)
2. Evaluating… (Cont’d) Estimators Experiment : 1. Choose sample S of size n according to distribution D 2. Measure errorS(h) errorS(h) is a random variable errorS(h) is an unbiased estimator for errorD(h) Given observed errorS(h) what can we conclude about errorD(h) ?
2. Evaluating… (Cont’d) Confidence Interval if 1. S contains n examples, drawn independently of h and each other 2. n >= 30 then with approximately N% probability, errorD(h) lies in interval
2. Evaluating… (Cont’d) Normal Distribution Approximates Binomial Distribution errorS(h) follows a Binomial distribution, with Approximate this by a Normal distribution with
2. Evaluating… (Cont’d) More Correct Confidence Interval if 1. S contains N examples, drawn independently of h and each other 2. N>= 30 then with approximately 95% probability, errorS(h) lies in interval equivalently, errorS(h) lies in interval which is approximately
2. Evaluating… (Cont’d) Two-sided and One-sided bounds 1. Two-sided What is the probability that errorD(h) is between L and U? 2. One-sided What is the probability that errorD(h) is at most U? 100(1-a)% confidence interval in Two-sided implies 100(1-a/2)% in One-sided.
3. General Confidence Interval Consider a set of independent, identically distributed random variables Y1…Yn, all governed by an arbitrary probability distribution with mean and variance 2. Define sample mean, Central Limit Theorem As n, the distribution governing approaches a Normal distribution, with mean and variance 2 /n.
3. General Confidence Interval (Cont’d) 1. Pick parameter p to estimate errorD(h) 2. Choose an estimator errorS(h) 3. Determine probability distribution that governs estimator errorS(h) governed by Binomial distribution, approximated by Normal distribution when n>=30 4. Find interval (L, U) such that N% of probability mass falls in the interval
4. Difference in Error of Two Hypothesis Assumption - two hypothesis h1, h2. - h1 is tested on sample S1 containing n1 random examples. h2 is tested on sample S2 containing n2 ramdom examples. Object - get difference between two true errors. where, d = errorD(h1) - errorD(h2)
4. Difference in Error of Two Hypothesis(Cont’d) Procedure 1. Choose an estimator for d 2. Determine probability distribution that governs estimator 3. Find interval (L, U) such that N% of probability mass falls in the interval
4. Difference in Error of Two Hypothesis(Cont’d) Hypothesis Test Ex) size of S1, S2 is 100 error s1(h1)=0.30, errors2(h2) = 0.20 What is the probability that errorD(h1) > errorD(h2)?
4. Difference in Error of Two Hypothesis(Cont’d) Solution 1. The problem is equivalent to getting the probability of the following 2. From former expression, 3. Table of Normal distribution shows that associated confidence level for two-sided interval is 90%, so for one-sided interval, it is 95%
5. Comparing Two Learning Algorithms What we’d like to estimate: where L(S) is the hypothesis output by learner L using training set S But, given limited data D0, what is a good estimator? Could partition D0 into training set S and test set T0, and measure errorT0(LA(S0)) - errorT0(LB(S0)) Even better, repeat this many times and average the results
5. Comparing Two Learning Algorithms(Cont’d) 1. Partition data D0 into k disjoint test sets T1, T2, …, Tk of equal size, where this size if at least 30. 2. For 1 <= i <=k, do use Ti for the test set, and the remaining data for training set Si Si = {D0 - Ti}, hA= LA(Si), hB= LB(Si) 3. Return the value i, where
5. Comparing Two Learning Algorithms(Cont’d) 4. Now, use paired t test on to obtain a confidence interval The result is… N% confidence interval estimate for :