Download presentation
Presentation is loading. Please wait.
Published byMartin Jacobs Modified over 9 years ago
1
Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith
2
Today Review bootstrap estimate of se (from homework). Review sign and permutation tests for paired samples. Lots of examples of hypothesis tests.
3
Recall... There is a true value of the statistic. But we don’t know it. We can compute the sample statistic. We know sample means are normally distrubuted (as n gets big):
4
But we don’t know anything about the distribution of other sample statistics (medians, correlations, etc.)!
5
Bootstrap world unknown distribution F observed random sample X statistic of interest empirical distribution bootstrap random sample X* bootstrap replication statistics about the estimate (e.g., standard error)
6
Bootstrap estimate of se Run B bootstrap replicates, and compute the statistic each time: θ*[1], θ*[2], θ*[3],..., θ*[B] (mean of θ* across replications) (sample standard deviation of θ* across replications)
7
Paired-Sample Design pairs (x i, y i ) x ~ distribution F y ~ distribution G How do F and G differ?
8
Sign Test H 0 : F and G have the same median median(F) – median(G) = 0 Pr(x > y) = 0.5 sign(x – y) ~ binomial distribution compute bin(N +, 0.5)
9
Sign Test nonparametric (no assumptions about the data) closed form (no random sampling)
10
Example: gzip speed build gzip with –O2 or with –O0 on about 650 files out of 1000, gzip-O2 was faster binomial distribution, p = 0.5, n = 1000 p < 3 x 10 -24
11
Permutation Test H 0 : F = G Suppose difference in sample means is d. How likely is this difference (or a greater one) under H 0 ? For i = 1 to P Randomly permute each (x i, y i ) Compute difference in sample means
12
Permutation Test nonparametric (no assumptions about the data) randomized test
13
Example: gzip speed 1000 permutations: difference of sample means under H 0 is centered on 0 -1579 is very extreme; p ≈ 0
14
Comparing speed is tricky! It is very difficult to control for everything that could affect runtime. Solution 1: do the best you can. Solution 2: many runs, and then do ANOVA tests (or their nonparametric equivalents). “Is there more variance between conditions than within conditions?”
15
Sampling method 1 for r = 1 to 10 for each file f for each program p time p on f
16
Result (gzip first) student 2’s program faster than gzip!
17
Result (student first) student 2’s program is slower than gzip!
18
Sampling method 1 for r = 1 to 10 for each file f for each program p time p on f
19
Order effects Well-known in psychology. What the subject does at time t will affect what she does at time t+1.
20
Sampling method 2 for r = 1 to 10 for each program p for each file f time p on f
21
Result gzip wins
22
Sign and Permutation Tests median(F) median(G) all distribution pairs (F, G) F G
23
Sign and Permutation Tests median(F) median(G) all distribution pairs (F, G) F G sign test rejects H 0
24
Sign and Permutation Tests median(F) median(G) all distribution pairs (F, G) F G permutation test rejects H 0
25
Sign and Permutation Tests median(F) median(G) all distribution pairs (F, G) F G permutation test rejects H 0 sign test rejects H 0
26
There are other tests! We have chosen two that are nonparametric easy to implement Others include: Wilcoxon Signed Rank Test Kruskal-Wallis (nonparametric “ANOVA”)
27
Pre-increment? Conventional wisdom: “Better to use ++x than to use x++.” Really, with a modern compiler?
28
Two (toy) programs for(i = 0; i < (1 << 30); ++i) j = ++k; for(i = 0; i < (1 << 30); i++) j = k++; ran each 200 times (interleaved) mean runtimes were 2.835 and 2.735 significant well below.05
29
What? leal -8(%ebp), %eax incl (%eax) movl -8(%ebp), %eax leal -8(%ebp), %edx incl (%edx) %edx is not used anywhere else
30
Conclusion Compile with –O and the assembly code is identical!
31
Why was this a dumb experiment?
32
Pre-increment, take 2 Take gzip source code. Replace all post-increments with pre-increments, in places where semantics won’t change. Run on 1000 files, 10 times each. Compare average runtime by file.
33
Sign test p = 8.5 x 10 -8
34
Permutation test
35
Conclusion Preincrementing is faster!... but what about –O? sign test: p = 0.197 permutation test: p = 0.672 Preincrement matters without an optimizing compiler.
36
Joke.
37
Your programs... 8 students had a working program both weeks. 6 people changed their code. 1 person changed nothing. 1 person changed to –O3. 3 people lossy in week 1. Everyone lossy in week 2!
38
Your programs! Was there an improvement on compression between the two versions? H 0 : No. Find sampling distribution of difference in means, using permutations.
39
Student 1 (lossless week 1)
40
Compression < 1?
41
Student 2: worse compression
42
Compression < 1?
43
Student 3
44
Student 4 (lossless week 1)
45
Student 5 (lossless week 1)
46
Student 6
47
Student 7
48
Student 8
49
Homework Assignment 2 6 experiments: 1. Does your program compress text or images better? 2. What about variance of compression? 3. What about gzip’s compression? 4. Variance of gzip’s compression? 5. Was there a change in the compression of your program from week 1 to week 2? 6. In the runtime?
50
Remainder of the course 11/9: EDA 11/16: Regression and learning 11/23: Happy Thanksgiving! 11/30: Statistical debugging 12/7: Review, Q&A Saturday 12/17, 2-5pm: Exam
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.