Presentation is loading. Please wait.

Presentation is loading. Please wait.

Empirical Research Methods in Computer Science Lecture 1, Part 1 October 12, 2005 Noah Smith

Similar presentations


Presentation on theme: "Empirical Research Methods in Computer Science Lecture 1, Part 1 October 12, 2005 Noah Smith"— Presentation transcript:

1 Empirical Research Methods in Computer Science Lecture 1, Part 1 October 12, 2005 Noah Smith http://nlp.cs.jhu.edu/~nasmith/erm

2 Empiricism empeiros: experienced (peira = trial or test) cf. rationalism

3 Exploration & Experiment Exploratory Data Analysis (lecture ≈5) Hypothesis Testing (lectures 1,2) explore visualize summarize model experiment confirm yes/no?

4 Computer What? Theory  Algorithms, Computation Practice  Software Engineering, Application Areas Systems  OS, Architecture

5 Who cares? 1. anyone who wants to do research 2. anyone who wants to follow research (i.e., read papers) 3. anyone who wants to be able to make smart decisions / draw conclusions 4. anyone who likes thinking critically

6 Basic Research Questions

7 int foo() {... }

8 Why bother? int foo() {... } int foo() {... } int foo() {... } int foo() {... } int foo() {... } int foo() {... }

9 Variation → Statistics int foo() {... } determinism isn’t good enough any more!

10 Statistics, in this Course Nonparametric tests Sampling Later: Parametric tests (when and why)

11 Warning Theory (complexity analysis, etc.) is important, too! Many phenomena aren’t surprising if you know your math.

12 Goals Know how to look for the interesting experiments Know how to construct experiments Know how to analyze the results Be critical of all claims Develop an aesthetic for good empirical work!

13 Empiricism is FUN! Especially in computer science!

14 Basic Course Information instructors: Noah and David {n,d}asmith@cs.jhu.edu Wednesdays 4-5:15 pm no class Thanksgiving week homeworks (65%); final exam (30%)

15 About Us Combined 19 years of experience in CS; 36 years programming Autodidact empiricists Research interests in statistical modeling and machine learning (Eisner/Yarowsky lab) NEB 332

16 Plan Hypothesis testing, statistics (2) Case study: runtime (2) Exploratory data analysis (1) Parametric testing, modeling (1-2) Statistical analysis of computer programs (1)

17 MO Come to class. Send us feedback anytime.  What do you want to know?  Bring us papers.

18 Empirical Research Methods in Computer Science Lecture 1, Part 2 October 12, 2005 David Smith

19 Terminological Prelude Populations  Population distributions  “All possible files”. How big? Samples  Sampling distributions  “Files on my system” Statistics  Functions of data  “Size of my files” Models  Parameters

20 And now for some data

21 Abnormality

22

23 The Bootstrap Simulates the sampling distribution Proposed by Efron in 1979  Anticipated by permutation tests, jackknife, cross-validation From original sample of size n, draw B samples of size n with replacement and calculate the statistic on each

24 Sampling Distributions μ μ μ μ μ

25 Bootstrapping the Mean

26 What’s Going On? Why is bootstrap distribution normal? Remember, this is a mean Linearity of Expectation Central Limit Theorem Closed form standard error for means

27 More Heavy Tails

28 Sampling Still Normal

29 Bivariate Data

30 Compression Performance

31 Bootstrapping Correlation

32 Error, Confidence, Testing Standard error from sampling distribution Confidence intervals: bounding error probability (e.g. to 5%) Hypothesis testing: how likely is a particular statistic under our assumptions?

33 Hypothesis Testing One-sample  “Are these data normal/Poisson/…?” Two-sample  “Are these two samples from the same distribution?” Paired-sample  “Is this technique better than that?”

34 Your First Assignment Data compression Three-way tradeoff  Compression  Speed  Loss Degenerate cases (cat, echo ‘’, …) Unknown distribution of input


Download ppt "Empirical Research Methods in Computer Science Lecture 1, Part 1 October 12, 2005 Noah Smith"

Similar presentations


Ads by Google