Stat 217 – Day 28 Review Stat 217
Last Time – Paired data When have paired observations (rather than independent samples) must analyze as paired data Statistic: mean difference in sample Parameter: mean difference in population H0: m = 0 Theory-based approach: one-sample t-test At least 20 differences or differences follow a normal distribution Stat 217
Last Time – Paired Data Why might I choose to conduct a paired-data study? Hope there is less variation in the differences than in the response variable itself (ages at marriage vs. age differences at marriage) When can’t I conduct a paired-data study Don’t have access to 2nd observation! May not be natural pairings (e.g., students vs. faculty) Stat 217
Since first exam: Comparing 2 Groups Categorical Response Quantitative Response Difference in sample proportions Difference in population proportions/probabilities Test of significance Simulation (Two Proportions applet) Normal distribution (TBI) Confidence interval Theory-Based applet Difference in sample means Difference in population means m1- m2 Test of significance Simulation (Multiple Means applet) t distribution (TBI) Confidence interval Theory-Based applet Stat 217
Including Paired Data Matched Pairs - Categorical Matched Pairs - Quantitative Sample proportion (without ties) Population proportion p (of those that differ) Test of significance Simulation (One Prop) Normal distribution (Theory-Based applet) Confidence interval Theory-Based applet Sample mean difference Population mean difference (mdiff) Test of significance Simulation (Matched Pairs Randomization – Lab 6) Paired t test (one sample t test on differences) Confidence interval Theory-Based applet Stat 217
Format of Exam Can bring two (two-sided) pages of notes that either I gave you or you self-produced Prepare as if closed book Work efficiently Still bring calculator Be able to read computer output Be able to carry out/interpret simulations in Two Proportions, Multiple Means applets Be able to describe simulation process Be able to carry out/interpret theory-based procedures in Theory-Based Inference applet Blue notes on review handout Stat 217
Choice of procedure Is the question asking for a test of significance (“is there evidence of… ”) or a confidence interval (“estimate the … ”)? Is the question dealing with quantitative data (e.g., means) or categorical data (e.g., proportions)? How many (independent) samples do I have? Stat 217
Big Ideas Is the “difference” statistically significant? p-value small? How large is the difference? Estimation Confidence intervals Can we draw cause-and-effect conclusion? Types of studies, Confounding variables Random assignment? Can we generalize to larger population? Random sampling? Stat 217
Big Ideas Interpretations What mean by “parameter” What factors affect test statistic, p-value, confidence interval What mean by “p-value” Interpretation vs. evaluation What mean by “confidence” Study design issues (why…) Stat 217
What is “statistical inference”? Need to be able to assess how much sample-to-sample variability there is, by chance alone (random sampling or random assignment, assuming the null hypothesis is true), to decide if an observed difference is “real” Whether an observed sample result is surprising assuming some claim about the population (so estimate p-value) How far we think our sample result might be from the population value (so estimate population value) Stat 217
Test of Significance Steps Define parameter in words; State Ho and Ha hypotheses about parameter Calculate/Interpret standardized statistic; Use technology to determine p-value When allowed to use Theory-Based methods Short-cut to running the simulation Make conclusion: evaluate p-value, reject or fail to reject Ho, and answer the research question in context Stat 217
Confidence Interval Steps Define parameter in words Use technology to determine the interval Theory-Based? Two SD shortcut? Interpret the interval … I’m 90% confident that… If asked, interpret “confidence” without using “confidence” Maybe: Interpret the level 90% of intervals constructed this way Stat 217
Types of Interpretation Problems Population vs. sample vs. null distributions (distribution of statistic) Interpret probability, confidence interval, p-value, standardized statistic … Including “what if” Identify appropriate procedures Scope of conclusions (generalizability, causation, significance, estimation) Why? confidence level, Stat 217
Common Mistakes Quantitative vs. categorical Symbols: p vs. m (be able to define in words) Technical conditions z-procedures with proportions, t-procedures with means Don’t combine: “mean proportion” Stating hypotheses about statistic/nothing H0: = .5 Sample size checks vs. population check. What’s “normal”? What changes when you increase the sample size… Parameter vs. statistic, sample vs. null distribution Stopping at “reject Ho” decision… Stat 217
Advice Work problems, don’t just read them! Using only notes pages, calculator Start with observation units, variable(s) See if you can identify problem type Think about what each step means Banish the word IT (proof, data, group) Add the phrase “of what” and “for what” Learn from/seek clarification of grader comments Labs, quizzes, investigations, etc. Self-check, What went wrong, Self-tests tables, computer, Stat 217
Quantitative data Penny ages Change Population Sample (n = 30) Sampling distribution Change Population Sample (n = 30) Sampling distribution Obs unit = sample Variable = sample mean Stat 217
Example: Memory Study 1. Describe the sample For our class, the average number of letters remembered was slightly higher for those receiving the JFK sequence (15.07 letters) than for those receiving the JFKC sequence (10.4 letters) Stat 217
Memory Study 1. Describe the sample For our class, the average number of letters remembered was slightly higher for those receiving the JFK sequence (15.07 letters) than for those receiving the JFKC sequence (10.4 letters) Multiples of three for JFK Lots of overlap Stat 217
Memory Study 2. State hypotheses Let mJFK represent the long-run mean number of letters that would be remembered with that sequence in population (similar for mJFKC) H0: mJFK - mJFKC = 0 vs. Ha: mJFK - mJFKC > 0 Stat 217
Memory Study 2. p-value Theory-based: sample sizes are not that large, sample distributions are not that normal looking so is questionable Could use simulation-based approach instead Stat 217
Finding p-value Theory-based approach Stat 217
Confidence interval 4.67 + 2(2.694) = (-.718, 10.06) Stat 217
Interpreting confidence interval I’m 95% confident that the average JFK score is up to 10.12 letters more than the average JFKC score but could also be .78 letters smaller. I’m 90% confident that in the population, the JFK average score is .157 to 9.17 larger than the average JFKC score Stat 217
Summarizing conclusions We have some evidence, but not super strong, that the JFK sequence leads to higher scores on average (theory-based p-value < .05) We can say “leads to” because this was a randomized, double-blind experiment (independent evaluator of results) In fact, we are 90% confident the average score is .16 to 9.2 more letters But we are cautious in the population we generalize to as this was a convenience sample Stat 217
Factors Original data New data Stat 217
Factors Original data New data Stat 217
Factors Original data New data Stat 217
Questions? Office hours Optional review session Monday 11-12 Tuesday 2-3 in 33-220A Optional review session Wednesday 6pm (Library-111B) Stat 217