Reasoning in Psychology Using Statistics 2017
Inferential statistics Hypothesis testing Testing claims about populations (and the effect of variables) based on data collected from samples Estimation Using sample statistics to estimate the population parameters Inferential statistics used to generalize back Sampling to make data collection manageable Population Sample Inferential statistics
Inferential statistics Hypothesis testing Testing claims about populations (and the effect of variables) based on data collected from samples Estimation Using sample statistics to estimate the population parameters Inferential statistics used to generalize back Sampling to make data collection manageable Population Sample Inferential statistics
Inferential statistics Hypothesis testing Testing claims about populations (and the effect of variables) based on data collected from samples Estimation Using sample statistics to estimate the population parameters Inferential statistics used to generalize back Sampling to make data collection manageable Population Cummings (2012) website Sample Inferential statistics
Describe the typical college student If we can’t measure the entire population of students, how do we get the population means for these variables? μ = ? hours of studying per week Age hours of sleep per night pizza consumption
Estimation Estimate it based on what we do know μ = ? If we can’t measure the entire population of students, how do we get the population means for these variables? Estimate it based on what we do know μ = ? On information from a sample Two kinds of estimation Point estimates A single score Interval estimates A range of scores X Estimation
Describe the typical college student Point estimates “12 hrs” hours of studying per week Interval estimates “2 to 21 hrs” Age “19 yrs” “17 to 21 yrs” hours of sleep per night pizza consumption “8 hrs” “1 per wk” “4 to 10 hrs” “0 to 8 per wk”
Estimation Estimate the number of people attending lecture today How confident are you that your estimate is correct? “Not real confident, maybe 20%” “50 students” “Fairly confident, maybe 90%” “Somewhere between 20 and 80 students” Estimation
Estimation How confident are you that your estimate is correct? Two kinds of estimation Point estimates A single score Interval estimates A range of scores Advantage Disadvantage A range of scores A single score Low “confidence” in the estimate Why? Because we recognize sampling error Feels precise “50 students” High “confidence” in the estimate Feels wishy-washy “Somewhere between 20 and 80 students” “The # is 50 ± 30” It is bound to be there somewhere Estimation
Estimation Both kinds of estimates use the same basic procedure The formula is a variation of the test statistic formulas (we’ll start with the z-score) Do some adding/subtracting Multiply both sides by This is what we want to estimate Estimation
Estimation Both kinds of estimates use the same basic procedure The formula is a variation of the test statistic formulas (we’ll start with the z-score) Why the sample mean? It is often the only piece of evidence that we have, so it is our best guess. Most sample means will be pretty close to the population mean, so we have a good chance that our sample mean is close. Estimation
Estimation Both kinds of estimates use the same basic procedure The formula is a variation of the test statistic formulas (we’ll start with the z-score) The standard error (the difference that you’d expect by chance) Based on sample size (n) and population standard deviation (σ) A test statistic value (e.g., a z-score) based on design (z or t) and level of confidence Margin of error Estimation
Estimation Both kinds of estimates use the same basic procedure The formula is a variation of the test statistic formulas (we’ll start with the z-score) The standard error (the difference that you’d expect by chance) Based on sample size (n) and population standard deviation (σ) Note: How you compute your standard error will depend on your design Estimation
Estimation Both kinds of estimates use the same basic procedure The formula is a variation of the test statistic formulas (we’ll start with the z-score) A test statistic value (e.g., a z-score) based on design (z or t) and level of confidence Distribution of the test statistic Confidence interval uses the zcrit values that identify the top and bottom tails The upper and lower 2.5% A 95% CI is like using a “two-tailed” z-test with with α = 0.05 2.5% 2.5% Estimation 95% of the sample means
Estimation Finding the right test statistic for your estimate μ = ? You begin by making a reasonable estimation of what the z (or t) value should be for your estimate. For a point estimation, you want what? z (or t) = 0, right in the middle Z = 0 z scores μ = ? Raw scores transform Estimation
Actual population mean Finding the right test statistic (z or t) You begin by making a reasonable estimation of what the z (or t) value should be for your estimate. For a point estimation, you want what? z (or t) = 0, right in the middle For an interval, your values will depend on how confident you want to be in your estimate Actual population mean μ What do I mean by “confident”? 90% confidence means that 90% of confidence interval estimates of this sample size will include the actual population mean 9 out of 10 intervals contain μ Estimation
Estimation Finding the right test statistic (z or t) You begin by making a reasonable estimation of what the z (or t) value should be for your estimate. For a point estimation, you want what? z (or t) = 0, right in the middle For an interval, your values will depend on how confident you want to be in your estimate Computing the point estimate or the confidence interval: Step 1: Take your “reasonable” estimate for your test statistic Step 2: Put it into the formula Step 3: Solve for the unknown population parameter Estimation
Estimates with z-scores Make a point estimate of the population mean given a sample with a X = 85, n = 25, and a population σ = 5. So the point estimate is the sample mean sample mean serves as the center z (or t) = 0, right in the middle Estimates with z-scores
Estimates with z-scores Make an interval estimate with 95% confidence of the population mean given a sample with a X = 85, n = 25, and a population σ = 5. What two z-scores do 95% of the data lie between? 95% Estimates with z-scores
Estimates with z-scores Make an interval estimate with 95% confidence of the population mean given a sample with a X = 85, n = 25, and a population σ = 5. What two z-scores do 95% of the data lie between? From the table: So the 95% confidence interval is: 83.04 to 86.96 or 85 ± 1.96 z(1.96) =.0250 2.5% 95% Estimates with z-scores
Estimates with z-scores Make an interval estimate with 90% confidence of the population mean given a sample with a X = 85, n = 25, and a population σ = 5. What two z-scores do 90% of the data lie between? From the table: z(1.65) =.0500 So the 90% confidence interval is: 83.35 to 86.65 or 85 ± 1.65 5% 90% Estimates with z-scores
Estimates with z-scores Make an interval estimate with 90% confidence of the population mean given a sample with a X = 85, n = 4, and a population σ = 5. What two z-scores do 90% of the data lie between? From the table: z(1.65) =.0500 So the confidence interval is: 80.88 to 89.13 or 85 ± 4.13 5% 90% Estimates with z-scores
Estimation The size of the margin of error related to: Sample size As n increases, the margin of error gets narrower (changes the standard error) Level of confidence As confidence increases (e.g., 90%-> 95%), the margin of error gets wider (changes the critical test statistic values) Estimation
Estimation in other designs Two kinds of estimates that use the same basic procedure The formula is a variation of the test statistic formulas Different Designs: Estimating the mean of the population from one or two samples, but we don’t know the σ Center/point estimate? How do we find this? How do we find this? Depends on the design (what is being estimated) Use the t-table & your confidence level Depends on the design Estimation in other designs
Estimates with t-scores Confidence intervals always involve + a margin of error This is similar to a two-tailed test, so in the t-table, always use the “proportion in two tails” heading, and select the α-level corresponding to (1 - Confidence level) What is the tcrit needed for a 95% confidence interval? 2.5% so two tails with 2.5% in each 2.5%+2.5% = 5% or α = 0.05, so look here 95% 95% in middle Estimates with t-scores
Estimation in other designs Estimating the difference between the population mean and the sample mean based when the population standard deviation is not known Confidence interval Diff. Expected by chance Estimation in other designs
Estimation in one sample t-design Make an interval estimate with 95% confidence of the population mean given a sample with a X = 85, n = 25, and a sample s = 5. What two critical t-scores do 95% of the data lie between? So the confidence interval is: 82.94 to 87.06 From the table: tcrit =+2.064 or 85 ± 2.064 2.5% 95% Estimation in one sample t-design
Estimation in related samples design Estimating the difference between two population means based on two related samples Confidence interval Diff. Expected by chance Estimation in related samples design
Estimation in related samples design Dr. S. Beach reported on the effectiveness of cognitive-behavioral therapy as a treatment for anorexia. He examined 12 patients, weighing each of them before and after the treatment. Estimate the average population weight gain for those undergoing the treatment with 90% confidence. Differences (post treatment - pre treatment weights): 10, 6, 3, 23, 18, 17, 0, 4, 21, 10, -2, 10 Related samples estimation Confidence level 90% CI(90%)= 5.72 to 14.28 Estimation in related samples design
Estimation in independent samples design Estimating the difference between two population means based on two independent samples Confidence interval Diff. Expected by chance Estimation in independent samples design
Estimation in independent samples design Dr. Mnemonic develops a new treatment for patients with a memory disorder. He randomly assigns 8 patients to one of two samples. He then gives one sample (A) the new treatment but not the other (B) and then tests both groups with a memory test. Estimate the population difference between the two groups with 95% confidence. Independent samples t-test situation Confidence level 95% CI(95%)= -8.73to 19.73 Estimation in independent samples design
Relating estimates to hypothesis tests Notice that this interval includes zero -8.73 19.73 If we had instead done a hypothesis test with an α = 0.05, what would you expect our conclusion to be? H0: “there is no difference between the groups” - Fail to reject the H0 CI(95%)= -8.73to 19.73 Relating estimates to hypothesis tests
Estimation Summary Design Estimation (Estimated) Standard error One sample, σ known One sample, σ unknown Two related samples, σ unknown Two independent samples, σ unknown Things to note: The design drives the formula used The standard error differs depending on the design (kind of comparison) Sample size plays a role in SE formula and in df’s of the critical value Level of confidence comes in with the critical value used Estimation Summary
Error bars Two types typically Standard Error (SE) diff by chance Confidence Intervals (CI) A range of plausible estimates of the population mean CI: μ = (X) ± (tcrit) (diff by chance) Note: Make sure that you label your graphs, let the reader know what your error bars are Error bars
Error Bars: Reporting CIs Important point! In text (APA style) example M = 30.5 cm, 95% CI [18.0, 42.0] In graphs as error bars In tables (see more examples in APA manual) Error Bars: Reporting CIs
Group Differences: 95%CI Rule of Thumb Error bars can be informative about group differences, but you have to know what to look for Rule of thumb for 95% CIs*: If the overlap is about half of one one-sided error bar, the difference is significant at ~ p < .05 If the error bars just abut, the difference is significant at ~ p< .01 *works if n > 10 and error bars don’t differ by more than a factor of 2 Cumming & Finch, 2005 Error Bars: Reporting CIs
Hypothesis testing with CIs If we had instead done a hypothesis test on 2 independent samples with an α = 0.05, what would you expect our conclusion to be? H0: “there is no difference between the groups” MD = 2.23, t(34) = 1.25, p = 0.22 - Fail to reject the H0 -1.4 5.9 MD = 2.23, 95% CI [-1.4, 5.9] Hypothesis testing with CIs
Hypothesis testing with CIs If we had instead done a hypothesis test on 2 independent samples with an α = 0.05, what would you expect our conclusion to be? H0: “there is no difference between the groups” MD = 2.23, t(34) = 1.25, p = 0.22 - Fail to reject the H0 -1.4 5.9 MD = 2.23, 95% CI [-1.4, 5.9] MD = 3.61, 95% CI [0.6, 6.6] 0.6 6.6 - reject the H0 MD = 3.61, t(42) = 2.43, p = 0.02 Hypothesis testing with CIs
Hypothesis testing & p-values Because CIs are more informative than p-values Hypothesis testing & p-values Dichotomous thinking Yes/No reject H0 (remember H0 is “no effect”) Neyman-Pearson approach Strength of evidence Fisher approach Confidence Intervals Gives plausible estimates of the pop parameter (values outside are implausible) Provide information about both level and variability Wide intervals can indicate low power Good for emphasizing comparisons across studies (e.g., meta-analytic thinking) Can also be used for Yes/No reject H0 Estimation: Why?
In labs Practice computing and interpreting confidence intervals Understanding CI: https://www.youtube.com/watch?v=tFWsuO9f74o Calculating CI: https://www.youtube.com/watch?v=s4SRdaTycaw Kahn Academy: CI and sample size: https://www.youtube.com/watch?v=K4KDLWENXm0 CI and t-test: https://www.youtube.com/watch?v=hV4pdjHCKuA CI for Ind Samp: https://www.youtube.com/watch?v=hxZ6uooEJOk (pt 2) CI and margin of error: https://www.youtube.com/watch?v=UogOJHgJDqs HT and CI: https://www.youtube.com/watch?v=k1at8VukIbw HT vs. CI rap: https://www.youtube.com/watch?v=C88fUKAHPn0 CIs by Geoff Cumming: Introduction to: https://www.youtube.com/watch?v=OK6DXfXv8BM Workshop (6 part series) In labs