Presentation is loading. Please wait.

Presentation is loading. Please wait.

Empirical Research Methods in Computer Science Lecture 2, Part 1 October 19, 2005 Noah Smith.

Similar presentations


Presentation on theme: "Empirical Research Methods in Computer Science Lecture 2, Part 1 October 19, 2005 Noah Smith."— Presentation transcript:

1 Empirical Research Methods in Computer Science Lecture 2, Part 1 October 19, 2005 Noah Smith

2 Some tips Perl scripts can be named encode instead of encode.pl encode foo ≢ encode < foo chmod u+x encode Instead of making us run java Encode, write a shell script:  #!/bin/sh  cd `dirname $0`  java Encode Check that it works on (say) ugrad10.

3 Assignment 1 If you didn’t turn in a first version yesterday, don’t bother – just turn in the final version. Final version due Tuesday 10/25, 8pm We will post a few exercises soon. Questions?

4 Today Standard error Bootstrap for standard error Confidence intervals Hypothesis testing

5 Notation P is a population S = [s 1, s 2,..., s n ] is a sample from P Let X = [x 1, x 2,..., x n ] be some numerical measurement on the s i  distributed over P according to unknown F We may use Y, Z for other measurements.

6 Mean What does mean mean?  μ x is population mean of x (depends on F)  μ x is in general unknown How do we estimate the mean?  Sample mean

7 Gzip compression rate usually < 1, but not always

8 Gzip compression rate

9 Accuracy How good an estimate is the sample mean? Standard error (se) of a statistic:  We picked one S from P.  How would vary if we picked a lot of samples from P?  There is some “true” se value!

10 Extreme cases n → ∞ n = 1

11 Standard error (of the sample mean) Known: “Standard error” = standard deviation of a statistic true standard deviation of x under F

12 Gzip compression rate

13 Central Limit Theorem The sampling distribution of the sample mean approaches a normal distribution as n increases.

14 How to estimate σ x ? “Plug-in principle” Therefore:

15 Plug-in principle We don’t have (and can’t get) P  We don’t know F, the true distribution over X We do have S (the sample)  We do know, the sample distribution over X Estimating a statistic: use for F

16 Good and Bad News We have a formula to estimate the standard error of the sample mean! We have a formula to estimate only the standard error of the sample mean!  variance  median  trimmed mean  ratio of means of x and y  correlation between x and y

17

18 Bootstrap world unknown distribution F observed random sample X statistic of interest empirical distribution bootstrap random sample X* bootstrap replication statistics about the estimate (e.g., standard error)

19 Bootstrap sample X = [3.0, 2.8, 3.7, 3.4, 3.5] X* could be:  [2.8, 3.4, 3.7, 3.4, 3.5]  [3.5, 3.0, 3.4, 2.8, 3.7]  [3.5, 3.5, 3.4, 3.0, 2.8] ... Draw n elements with replacement.

20 Reflection Imagine doing this with a pencil and paper. The bootstrap was born in 1979. Typically, sampling is costly and computation is cheap. In (empirical) CS, sampling isn’t even necessarily all that costly.

21 Bootstrap estimate of se Let s( · ) be a function for computing an estimate True value of the standard error: Ideal bootstrap estimate: Bootstrap estimate with B boostrap samples:

22 Bootstrap estimate of se

23 Bootstrap, intuitively We don’t know F. We would like lots of samples from P, but we only have one (S). We approximate F by  Plug-in principle! Easy to generate lots of “samples” from

24 B = 25 (mean compression)

25 B = 50 (mean compression)

26 B = 200 (mean compression)

27 Correlation (another statistic) Population P, sample S Two values, x i and y i for each element of the sample Correlation coefficient: ρ Sample correlation coefficient:

28 Example: gzip compression r = 0.9616

29 Accuracy of r No general closed form for se(r). If we assume x and y are bivariate Gaussian

30 se normal r n

31 Normality Why assume the data are Gaussian? Alternative: bootstrap estimate of the standard error of r

32 Example: gzip compression r = 0.9616 se normal (r) = 0.0024

33 se 200 (r) = 0.0298

34 se bootstrap advice Plot the data. Runtime? Efron and Tibshirani:  B = 25 is informative  B = 50 often enough  seldom need B > 200 (for se)

35 Summary so far A statistic is a “true fact” about the distribution F We don’t know F For some parameter θ, we want:  estimate “θ hat”  accuracy of that estimate (e.g., standard error) For the mean, μ, we have a closed form For other θ, the bootstrap will help!


Download ppt "Empirical Research Methods in Computer Science Lecture 2, Part 1 October 19, 2005 Noah Smith."

Similar presentations


Ads by Google