R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | i 1 INF397C Introduction to Research in Information Studies.

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | rbias@ischool.utexas.edu i 1 INF397C Introduction to Research in Information Studies Fall, 2005 Days 10 & 11

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | rbias@ischool.utexas.edu i 2 Context Where we’ve been: –Descriptive statistics Frequency distributions Graphs Types of scales Probability Measures of central tendency and spread z scores –Experimental design The scientific method Operational definitions IV, DV, controls, counterbalancing, confounds Validity, reliability Within- and between-subject designs –Qualitative research Gracy, Rice Lively

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | rbias@ischool.utexas.edu i 3 Context (cont’d.) Where we’re going: –More descriptive statistics Correlation –Inferential statistics Confidence intervals Hypothesis testing, Type I and II errors, significance level t-tests Anova –Which method when? –Cumulative final

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | rbias@ischool.utexas.edu i 4 First, correcting a lie Parameters (for populations) Statistics (for samples) Meanµ = ΣX/N M = ΣX/N Standard deviation σ = SQRT of (ΣX 2 –(ΣX) 2 /N)/ N s = SQRT of (ΣX 2 –(ΣX) 2 /N)/ N-1

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | rbias@ischool.utexas.edu i 5 Degrees of Freedom Demo

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | rbias@ischool.utexas.edu i 6 Standard Error of the Mean So far, we’ve computed a sample mean (M, X bar), and used it to estimate the population mean (µ). One thing we’ve gotten convinced of (I hope) is... larger sample sizes are better. –Think about it – what if I asked ONE of you, what School are you a student in? Versus asking 10 of you?

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | rbias@ischool.utexas.edu i 7 Standard Error (cont’d.) Well, instead of picking ONE sample, and using that mean to estimate the population mean, what if we sampled a BUNCH of samples? If we sampled ALL possible samples, the mean of the means would equal the population mean. (“µ M ”) Here are some other things we know: –As we get more samples, the mean of the sample means gets closer to the population mean. –Distribution of sample means tends to be normal. –We can use the z table to find the probability of a mean of a certain value. –And most importantly...

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | rbias@ischool.utexas.edu i 8 Standard Error (cont’d.) We can easily work out the standard deviation of the distribution of sample means: SE = S M = S/SQRT(N) So, the standard error of the mean is the standard distance that a sample mean is from the population mean. Thus, the SE tells us how good an estimate our sample mean is of the population mean. Note, as N gets larger, the SE gets smaller, and the better the sample mean estimates the population mean. Hold on – we’ll use SE later.

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | rbias@ischool.utexas.edu i 9 Four Questions 1.Does an iSchool IT-provided online tutorial lead to better learning than a face-to-face class?

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | rbias@ischool.utexas.edu i 10 Two methods of making statistical inferences Null hypothesis testing –Assume IV has no effect on DV; differences we obtain are just by chance (error variance) –If the difference is unlikely enough to happen by chance (and “enough” tends to be p <.05), then we say there’s a true difference. Confidence intervals –We compute a confidence interval for the “true” population mean, from sample data. (95% level, usually.) –If two groups’ confidence intervals don’t overlap, we say (we INFER) there’s a true difference.

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | rbias@ischool.utexas.edu i 11 Remember... Earlier I said that there are two ways for us to be confident that something is true: –Statistical inference –Replicability Now I’m saying there are two avenues of statistical inference: –Hypothesis testing –Confidence intervals

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | rbias@ischool.utexas.edu i 12 t-tests Remember the z scores: –z = (X - µ)/σ –It is often the case that we want to know “What percentage of the scores are above (or below) a certain other score”? –Asked another way, “What is the area under the curve, beyond a certain point”? –THIS is why we calculate a z score, and the way we do it is with the z table, on p. 306 of Hinton. Problem: We RARELY truly know µ or σ.

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | rbias@ischool.utexas.edu i 13 t-tests (cont’d.) So, typically what we do is use M to estimate µ and s to estimate σ. (Duh.) (Note: When we estimate σ with s, we divide by N-1, which is degrees of freedom.) Then, instead of z, we calculate t. Hinton’s example on p. 64 is for a t-test when you have a null hypothesis population mean (µ 0 ). (That is, you want to test if your observed sample mean is different from some value.) Hinton then offers examples in Chapter 8 of related (dependent, within-subjects) and independent (unrelated, between-subjects) t-tests. S, Z, & Z’s example on p. 409 is for a t-test to compare independent means.

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | rbias@ischool.utexas.edu i 14 Formulae -For a single mean(compared with µ 0 ): -t = (M - µ)/(s/SQRTn) -For related (within-subjects) groups: -t = (M 1 – M 2 )/s M1 – M2 -Where s M1 – M2 = (s x1 – x2 )/SQRTn -See Hinton, p. 83 -For independent groups: -From S, Z, & Z, p. 409, and Hinton, p. 87 -t = (M 1 – M 2 )/s M1 – M2 –Where s M1 – M2 = SQRT [(S 1 2 /n 1 ) + (S 2 2 /n 2 )] –See Hinton, p. 87 »Will’s correction.

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | rbias@ischool.utexas.edu i 15 Steps For a t test for a single sample –Restate the question as a research hypothesis and a null hypothesis about the populations. –Determine the characteristics of the comparison distribution. The mean is the known population mean. Compute the standard deviation by: –Calculate the estimated population variance (S 2 = SS/df) –Calculate the variance of the distribution of means (S 2 /n) –Take the square root, to get SE. –Note, we’re calculating t with N-1 df. Determine the cutoff sample score on the comparison distribution at which the null hypothesis should be rejected. –Decide on an alpha and one-tailed vs. two-tailed –Look up the critical value in the table –Determine your sample’s t score: t = m- µ / SE –Decide whether to reject or not reject the null hypothesis. (If the observed value of t exceeds the table value, reject.)

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | rbias@ischool.utexas.edu i 16 Steps For a t test for dependent means –Restate the question as a research hypothesis and a null hypothesis about the populations. –Determine the characteristics of the comparison distribution. Make each person’s score into a difference score. From here on out, use difference scores. Compute the mean of the difference scores. Assume a population mean of 0: µ = 0. Compute the standard deviation of the difference scores: –Calculate the estimated population variance (S 2 = SS/df) –Calculate the variance of the distribution of means (S 2 /n) –Take the square root, to get SE. –Note, we’re calculating t with N-1 df. Determine the cutoff sample score on the comparison distribution at which the null hypothesis should be rejected. –Decide on an alpha, and one-tailed vs. two-tailed –Look up the critical value in the table –Determine your sample’s t score: t = m - µ / SE –Decide whether to reject or not reject the null hypothesis. (If the observed value of t exceeds the table value, reject.)

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | rbias@ischool.utexas.edu i 17 Steps For a t test for independent means –Same as for dependent means, except the value for SE is that squirrely formula on Hinton, p. 87. –Basically, here’s the point. When you’re comparing DEPENDENT (within-subject, related) means, you can assume both sets of scores come from the same distribution, thus have the same standard deviation. But when you’re comparing independent (between- subject, unrelated) means, you gotta basically average the variability of each of the two distributions.

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | rbias@ischool.utexas.edu i 18 Three points df –Four people, take your choice of candy. –One df used up calculating the mean. One or two tails –Must be VERY careful, choosing to do a one-tailed test. Comparing the z and t tables –Check out the.05 t table values for infinity df (1.96 for two-tailed test, 1.645 for one-tailed). –Now find the commensurate values in the z table.

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | rbias@ischool.utexas.edu i 19 Let’s work some examples

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | rbias@ischool.utexas.edu i 20 Confidence Intervals We calculate a confidence interval for a population parameter. The mean of a random sample from a population is a point estimate of the population mean. But there’s variability! (SE tells us how much.) What is the range of scores between which we’re 95% confident that the population mean falls? Think about it – the larger the interval we select, the larger the likelihood it will “capture” the true (population) mean. CI = M +/- (t.05 )(SE) See Box 12.2 on “margin of error.” NOTE: In the box they arrive at a 95% confidence that the poll has a margin of error of 5%. It is just coincidence that these two numbers add up to 100%.

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | rbias@ischool.utexas.edu i 21 CI about a mean -- example CI = M +/- (t.05 )(SE) Establish the level of α (two-tailed) for the CI. (.05) M=15.0 s=5.0 N=25 Use Table A.2 to find the critical value associated with the df. –t.05 (24) = 2.064 CI = 15.0 +/- 2.064(5.0/SQRT 25) = 15.0 +/- 2.064 = 12.935 – 17.064 “The odds are 95 out of 100 that the population mean falls between 12.935 and 17.064.” (NOTE: This is NOT the same as “95% of the scores fall within this range!!!)

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | rbias@ischool.utexas.edu i 22 Another CI example Hinton, p. 89. t-test not sig. What if we did this via confidence intervals?

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | rbias@ischool.utexas.edu i 23 Significance Level Remember, two ways to test statistical significance – hypothesis tests and confidence intervals. With confidence intervals, if two groups yield data and their confidence intervals don’t overlap, then we conclude a significant difference. In hypothesis testing, if the probability of finding our differences by chance is smaller than our chosen alpha, then we say we have a significant difference. We select alpha (α), by tradition. Statistical significance isn’t the same thing as practical significance.

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | rbias@ischool.utexas.edu i 24 Type I and Type II Errors World  Our decision Null hypothesis is false Null hypothesis is true Reject the null hypothesis Correct decision Type I error (α) Fail to reject the null hypothesis Type II error (β) Correct decision (1-β)

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | rbias@ischool.utexas.edu i 25 Power of the Test The power of a statistical test refers to its ability to find a difference in distributions when there really is one there. Things that influence the power of a test: –Size of the effect. –Sample size. –Variability. –Alpha level.

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | rbias@ischool.utexas.edu i 26 Correlation With correlation, we return to DESCRIPTIVE statistics. (This is counterintuitive. To me.) We are describing the strength and direction of the relationship between two variables. And how much one variable predicts the other.

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | rbias@ischool.utexas.edu i 27 Homework Continue reading. Optional: For next Monday, send me a link to an online research paper that describes (or mentions!) a t test, a confidence interval, an anova, or a correlation.

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | rbias@ischool.utexas.edu i 28 Effect Size How big of an effect does the IV have on the DV? Remember, two things that make it hard to find a difference are: –There’s a small actual difference. –There’s a lot of within-group variability (error variance). –(Demonstrate with moving distributions.)

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | rbias@ischool.utexas.edu i 29 Effect Size (cont’d.) From S, Z, & Z: “To be able to observe the effect of the IV, given large within-group variability, the difference between the two group means must be large.” Cohen’s d = (µ 1 – µ 2 )/ σ “Because effect sizes are presented in standard deviation units, they can be used to make meaningful comparisons of effect sizes across experiments using different DVs.”

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | rbias@ischool.utexas.edu i 30 Effect Size (cont’d.) When σ isn’t known, it’s obtained by pooling the within-group variability across groups and dividing by the total number of scores in both groups. σ = SQRT {[(n 1 -1)S 1 2 + (n 2 -1) S 2 2 ]/N} And, by convention: –d of.20 is considered small –d of.50 is considered medium –d of.80 is considered large

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | rbias@ischool.utexas.edu i 31 Effect Size example Let’s look at the heights of men and women. Just for grins, intuitively, what you say – small, medium, or large difference? µ women = 64.6 in. µ men = 69.8 in. σ = 2.8 in. d = (µ 1 – µ 2 )/ σ = (69.8 – 64.6)/2.8 = 1.86 So, very large difference. Indeed, one that everyone is aware of.

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | i 1 INF397C Introduction to Research in Information Studies.

Similar presentations

Presentation on theme: "R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | i 1 INF397C Introduction to Research in Information Studies."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | i 1 INF397C Introduction to Research in Information Studies.

Similar presentations

Presentation on theme: "R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | i 1 INF397C Introduction to Research in Information Studies."— Presentation transcript:

Similar presentations

About project

Feedback