Stat 217 – Day 17 Review
Last Time – Confidence interval for m Lab 2 – Goal is to estimate, on average, how long after the start of the party people tend to arrive. Close to sample mean Need to know how much sample mean might wander off, by random sampling chance alone, from the population mean (s/ ) Could also consider sample median
In general – Confidence interval Quantitative or categorical data? If categorical, can find a confidence interval for p 95% 2SD method: SE(p-hat) One-sample z-interval If quantitative, can find a confidence interval for m 95% 2SD method: SE(x-bar) One-sample t-interval Need to have decent sample size for these methods (e.g., 10S/10F or 20/normal)
Example 3.2 Calculating and interpreting a “one-sample z-interval” “margin-of-error” = .014 or about 1.4 percentage points Calculating and interpreting a “one-sample z-interval” Observed sample proportion: 713/1034 = .69 .69 + 1.96√(.69*(1-.69)/1034) .69 + .014 (.676, .704) I’m 95% confident that between 67.6% and 70.4% of the population will claim to have felt an impact from the Affordable Care Act Assuming the sample was representative and no nonsampling errors
Body Temperatures Body temps for 130 healthy adults n = 130 Mean = 29.249 0F s = .733 0F Symmetric distribution
Body Temperatures “margin-of-error” = .129 degrees Calculating and interpreting a “one-sample t-interval” Observed sample mean: 98.249 degrees with sample standard deviation s = .733 degrees 98.249 + 2ish (.733/sqrt(130)) 98.249 + .129 (98.12, 98.38) I’m 95% confident that the population mean healthy body temperature is between 98.12 degrees and 98.38 degrees Fahrenheit
Body Temperatures Notice this interval is not very wide: It’s an interval for the population mean, not one person
Body Temperatures Notice: This interval does not contain 98.6; we would not consider 98.6 to be a plausible value for the population mean body temperature Of course, not terribly far away (98.12, 98.38) Would still like to know more about how this sample was selected before deciding what population I think it is representative of
Section 3.4 Factors that impact width of confidence interval Larger sample size Narrower interval Larger confidence level Wider interval (Proportion closer to 0.5 wider) (Larger sample SD, s wider)
Section 3.5 These confidence interval procedures only “work” if you have representative sample Vs. voluntary response bias Vs. bad sampling frame and no “nonrandom” (nonsampling) errors People change their minds People don’t remember correctly People lie/social expectation Leading questions Demeanor of interviewer
Gallup.com
Exam 1 May use one 8.5 x 11 (both sides) page of self-produced notes Mixture of multiple choice, short answer, longer questions (see quizzes, labs, investigations) Bring a calculator (not a cell phone) Access to the computer (e.g., applets) Probably a section of multiple choice questions on the computer
Exam 1 Resources Review handout Review questions/solutions Chapters 0-3 Self-check videos, self-tests, what went wrong Quiz solutions Access to pre-labs Grading comments on quizzes, investigations, labs
Exam 1 Advice Review handout, problems online Review labs Work problems Review labs Start with ideas that we have emphasized more often Be ready to interpret and explain
Some advice during exam If you get stuck on a problem, move on later parts, later problems Try to hit the highlights in your answer (e.g., not all sources of bias, just the most serious) Be succinct (think before you write) Read the question carefully Show all of your work, explain well communication points, no “it”! Read entire question before writing anything
Some big, big ideas Observational units, variable Probability What see in sample (descriptive) vs. saying something beyond the sample (inferential) Statistic vs. Parameter Interpretation of p-value, Statistical significance Estimation (confidence interval) Generalizability Interpretations, reasoning Properties, “what if” questions… How are you deciding this?
Main Topics Sample from a random process (e.g., coin toss, dolphins, kissing couples) Parameter: p = probability of “success” Statistic: sample proportion Random sample from a finite large population (e.g., Gallup poll) Parameter: p = population proportion of “successes” Statistic: sample proportion Consider sampling, nonsampling biases
Previously When have random sampling method, the mean of the “could have been” statistics will be equal to the population parameter >> Will believe sample is representative of pop’n The variability of the distribution of “could have been” statistics will decrease if you increase the sample size (number of observational units per sample) Population size (assuming it’s pretty big to begin with) doesn’t really matter
Agree?
Main Topics Sample from a random process or population with quantitative data Parameter: m = population mean Statistic: sample mean
Test of Significance Test a conjecture about parameter Assume null hypothesis is true Look at the random distribution of the statistic when the null hypothesis is true Simulation (One Proportion applet) Normal model (Theory Based Inference applet) If observed value is in the tail of the distribution (small p-value, large z), reject the null hypothesis. Otherwise “fail to reject.” FTR: Not convincing evidence against Ho
Test of Significance – Lab 2 Sample mean = 6.9 hours < 8 hours The population mean is 8 hours and we just got an unlucky random sample Small p-value discounts this explanation Our sample is not representative of the population Random sampling would discount this explanation The population mean is actually less than 8 hours
Make sure you recognize Interpreting the p-value vs. evaluating the p-value 3% of random samples … observed result … null hypothesis true I find this p-value to be small so I reject the null hypothesis Interpreting the confidence interval vs. the confidence level I’m 95% confident that…this method … .6225 and .7323 If we took thousands of intervals, roughly 95% of the resulting intervals should contain the parameter
Confidence Interval Want to estimate parameter from the sample data Could test all the possible values for parameter and make an interval of the ones that are not rejected Not practical Other ways to estimate a CI estimate + 2 standard deviations Get SD from simulation and/or from formula Normal-based inference (TBI applet) Larger sample sizes Interpretation: I’m 95% confident that the parameter is between these two values Procedure works 95% of the time
Questions? Optional Review Session Tonight Building 38, Room 219 Starting at 7:40