Presentation is loading. Please wait.

Presentation is loading. Please wait.

IS 4800 Empirical Research Methods for Information Science Class Notes March 13 and 15, 2012 Instructor: Prof. Carole Hafner, 446 WVH

Similar presentations


Presentation on theme: "IS 4800 Empirical Research Methods for Information Science Class Notes March 13 and 15, 2012 Instructor: Prof. Carole Hafner, 446 WVH"— Presentation transcript:

1 IS 4800 Empirical Research Methods for Information Science Class Notes March 13 and 15, 2012 Instructor: Prof. Carole Hafner, 446 WVH hafner@ccs.neu.edu Tel: 617-373-5116 Course Web site: www.ccs.neu.edu/course/is4800sp12/

2 Parametric Statistics (numeric variables) Assumes a (near-enough-to) normal population distribution so these parameters make sense: μ = the population mean (unknown) σ 2 = the population variance σ = the population standard deviation Samples of size N are used to estimate these parameters M is the sample mean used to estimate μ Calculate: M = Σ X N

3 3 Relationship Between Population and Samples When a Treatment Had No Effect

4 4 Relationship Between Population and Samples When a Treatment Had An Effect

5 What we must decide Which one of these diagrams to believe ????? How to express belief in the first diagram How to express belief in the second diagram How do we make that decision ? How far apart do the sample means need to be? We calculate this relative to information about the variance !! ? Using a criterion alpha which is our tolerance for being wrong !!

6 Estimating population variance SS = Σ (X - M) 2 “Sum of Squares” SD 2 = Σ (X - M) 2 Sample variance N S 2 = Σ (X - M) 2 = SS Estimated population variance N – 1 N-1 σ 2 M = true variance of the sample means = σ 2 (unknown) N S 2 M = estimated variance of the sample means = S 2 N

7 Why do we care about the variance of the sample means ? Sampling Distribution –The distribution of means of every possible sample taken from a population (with size N) Sampling Error –The difference between a sample mean and the population mean: M - μ –The standard error of the mean is a measure of sampling error (std dev of distribution of means)

8 8 Understanding numeric measures Sources of variance –IV –Other uncontrolled factors (“error variance”) If (many) independent, random variables with the same distribution are added, the result approximately a normal curve –The Central Limit Theorem

9 9 The most important parts of the normal curve (for testing) Z=1.65 5%

10 10 The most important parts of the normal curve (for testing) Z=1.96 2.5% Z=-1.96 2.5%

11 11 Hypothesis testing – two tailed Hypothesis: sample (of 1) will be significantly different from known population distribution Example – WizziWord experiment: –H1:  WizziWord   Word –  (two-tailed) –Population (Word users):  Word  –What level of performance do we need to see before we can accept H1?

12 12 Hypothesis testing – two tailed Hypothesis: sample (of 1) will be significantly different from known population distribution Example – WizziWord experiment: –H1:  WizziWord   Word –  (two-tailed) –Population (Word users):  Word  –What level of performance do we need to see before we can accept H1? Must see performance >1.96 stddevs above mean = 199 BUT, also if performance < 1.96 stddevs below mean = 101 Will reject H0.

13 13 Standard testing criteria for experiments  Two-tailed

14 14 Don’t try this at home You would never do a study this way. Why? –Can’t control extraneous variables through randomization. –Usually don’t know population statistics. –Can’t generalize from one individual.

15 Population  Mean?Variance? Sampling Sample of size N Mean values from all possible samples of size N aka “distribution of means”    Z M = ( M - 

16 Z tests and t-tests t is like Z: Z = M - μ / t = M – 0 / We use a stricter criterion (t) instead of Z because is based on an estimate of the population variance while is based on a known population variance.

17 Given info about population of change scores and the sample size we will be using (N) T-test with paired samples Now, given a particular sample of change scores of size N We can compute the distribution of means We compute its mean and finally determine the probability that this mean occurred by chance ?  = 0 S 2 est  2 from sample = SS/df df = N-1 S 2 M = S 2 /N

18 t test for independent samples Given two samples Estimate population variances (assume same) Estimate variances of distributions of means Estimate variance of differences between means (mean = 0) This is now your comparison distribution

19 Estimating the Population Variance S 2 is an estimate of σ 2 S 2 = SS/(N-1) for one sample (take sq root for S) For two independent samples – “pooled estimate”: S 2 = df 1 /df Total * S 1 2 + df 2 /df Total * S 2 2 df Total = df 1 + df 2 = (N1 -1) + (N2 – 1) From this calculate variance of sample means: S 2 M = S 2 /N needed to compute t statistic

20 t test for independent samples, continued This is your comparison distribution NOT normal, is a ‘t’ distribution Shape changes depending on df df = (N1 – 1) + (N2 – 1) Distribution of differences between means Compute t = (M1-M2)/SDifference Determine if beyond cutoff score for test parameters (df,sig, tails) from lookup table.

21 21 Effect size The amount of change in the DVs seen. Can have statistically significant test but small effect size.

22 22 Power Analysis Power –Increases with effect size –Increases with sample size –Decreases with alpha Should determine number of subjects you need ahead of time by doing a ‘power analysis’ Standard procedure: –Fix alpha and beta (power) –Estimate effect size from prior studies Categorize based on Table 13-8 in Aron (sm/med/lg) –Determine number of subjects you need –For Chi-square, see Table 13-10 in Aron reading


Download ppt "IS 4800 Empirical Research Methods for Information Science Class Notes March 13 and 15, 2012 Instructor: Prof. Carole Hafner, 446 WVH"

Similar presentations


Ads by Google