Presentation is loading. Please wait.

Presentation is loading. Please wait.

Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

Similar presentations


Presentation on theme: "Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith."— Presentation transcript:

1 Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith

2 Today Review bootstrap estimate of se (from homework). Review sign and permutation tests for paired samples. Lots of examples of hypothesis tests.

3 Recall... There is a true value of the statistic. But we don’t know it. We can compute the sample statistic. We know sample means are normally distrubuted (as n gets big):

4 But we don’t know anything about the distribution of other sample statistics (medians, correlations, etc.)!

5 Bootstrap world unknown distribution F observed random sample X statistic of interest empirical distribution bootstrap random sample X* bootstrap replication statistics about the estimate (e.g., standard error)

6 Bootstrap estimate of se Run B bootstrap replicates, and compute the statistic each time: θ*[1], θ*[2], θ*[3],..., θ*[B] (mean of θ* across replications) (sample standard deviation of θ* across replications)

7 Paired-Sample Design pairs (x i, y i ) x ~ distribution F y ~ distribution G How do F and G differ?

8 Sign Test H 0 : F and G have the same median median(F) – median(G) = 0 Pr(x > y) = 0.5 sign(x – y) ~ binomial distribution compute bin(N +, 0.5)

9 Sign Test nonparametric (no assumptions about the data) closed form (no random sampling)

10 Example: gzip speed build gzip with –O2 or with –O0 on about 650 files out of 1000, gzip-O2 was faster binomial distribution, p = 0.5, n = 1000 p < 3 x 10 -24

11 Permutation Test H 0 : F = G Suppose difference in sample means is d. How likely is this difference (or a greater one) under H 0 ? For i = 1 to P  Randomly permute each (x i, y i )  Compute difference in sample means

12 Permutation Test nonparametric (no assumptions about the data) randomized test

13 Example: gzip speed 1000 permutations: difference of sample means under H 0 is centered on 0 -1579 is very extreme; p ≈ 0

14 Comparing speed is tricky! It is very difficult to control for everything that could affect runtime. Solution 1: do the best you can. Solution 2: many runs, and then do ANOVA tests (or their nonparametric equivalents). “Is there more variance between conditions than within conditions?”

15 Sampling method 1 for r = 1 to 10  for each file f for each program p  time p on f

16 Result (gzip first) student 2’s program faster than gzip!

17 Result (student first) student 2’s program is slower than gzip!

18 Sampling method 1 for r = 1 to 10  for each file f for each program p  time p on f

19 Order effects Well-known in psychology. What the subject does at time t will affect what she does at time t+1.

20 Sampling method 2 for r = 1 to 10  for each program p for each file f  time p on f

21 Result gzip wins

22 Sign and Permutation Tests median(F)  median(G) all distribution pairs (F, G) F  G

23 Sign and Permutation Tests median(F)  median(G) all distribution pairs (F, G) F  G sign test rejects H 0 

24 Sign and Permutation Tests median(F)  median(G) all distribution pairs (F, G) F  G  permutation test rejects H 0

25 Sign and Permutation Tests median(F)  median(G) all distribution pairs (F, G) F  G  permutation test rejects H 0 sign test rejects H 0 

26 There are other tests! We have chosen two that are  nonparametric  easy to implement Others include:  Wilcoxon Signed Rank Test  Kruskal-Wallis (nonparametric “ANOVA”)

27 Pre-increment? Conventional wisdom: “Better to use ++x than to use x++.” Really, with a modern compiler?

28 Two (toy) programs for(i = 0; i < (1 << 30); ++i) j = ++k; for(i = 0; i < (1 << 30); i++) j = k++; ran each 200 times (interleaved) mean runtimes were 2.835 and 2.735 significant well below.05

29 What? leal -8(%ebp), %eax incl (%eax) movl -8(%ebp), %eax leal -8(%ebp), %edx incl (%edx) %edx is not used anywhere else

30 Conclusion Compile with –O and the assembly code is identical!

31 Why was this a dumb experiment?

32 Pre-increment, take 2 Take gzip source code. Replace all post-increments with pre-increments, in places where semantics won’t change. Run on 1000 files, 10 times each. Compare average runtime by file.

33 Sign test p = 8.5 x 10 -8

34 Permutation test

35 Conclusion Preincrementing is faster!... but what about –O?  sign test: p = 0.197  permutation test: p = 0.672 Preincrement matters without an optimizing compiler.

36 Joke.

37 Your programs... 8 students had a working program both weeks. 6 people changed their code. 1 person changed nothing. 1 person changed to –O3. 3 people lossy in week 1. Everyone lossy in week 2!

38 Your programs! Was there an improvement on compression between the two versions? H 0 : No. Find sampling distribution of difference in means, using permutations.

39 Student 1 (lossless week 1)

40 Compression < 1?

41 Student 2: worse compression

42 Compression < 1?

43 Student 3

44 Student 4 (lossless week 1)

45 Student 5 (lossless week 1)

46 Student 6

47 Student 7

48 Student 8

49 Homework Assignment 2 6 experiments: 1. Does your program compress text or images better? 2. What about variance of compression? 3. What about gzip’s compression? 4. Variance of gzip’s compression? 5. Was there a change in the compression of your program from week 1 to week 2? 6. In the runtime?

50 Remainder of the course 11/9: EDA 11/16: Regression and learning 11/23: Happy Thanksgiving! 11/30: Statistical debugging 12/7: Review, Q&A Saturday 12/17, 2-5pm: Exam


Download ppt "Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith."

Similar presentations


Ads by Google