Review Tests of Significance. Single Proportion.

Slides:



Advertisements
Similar presentations
Regression and correlation methods
Advertisements

Objectives 10.1 Simple linear regression
ANALYZING MORE GENERAL SITUATIONS UNIT 3. Unit Overview  In the first unit we explored tests of significance, confidence intervals, generalization, and.
Lecture 3 Outline: Thurs, Sept 11 Chapters Probability model for 2-group randomized experiment Randomization test p-value Probability model for.
Two Quantitative Variables
Paired Data: One Quantitative Variable Chapter 7.
CHAPTER 24: Inference for Regression
Objectives (BPS chapter 24)
Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.
Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.
Significance Testing Chapter 13 Victor Katch Kinesiology.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Lecture 5 Outline – Tues., Jan. 27 Miscellanea from Lecture 4 Case Study Chapter 2.2 –Probability model for random sampling (see also chapter 1.4.1)
Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null.
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
Chapter 9 Hypothesis Testing.
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan.
Correlation & Regression
Inference for regression - Simple linear regression
Hypothesis Testing:.
Chapter 13: Inference in Regression
Overview Definition Hypothesis
Jeopardy Hypothesis Testing T-test Basics T for Indep. Samples Z-scores Probability $100 $200$200 $300 $500 $400 $300 $400 $300 $400 $500 $400.
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 2: Basic techniques for innovation data analysis. Part I: Statistical inferences.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Chapter 9 Comparing More than Two Means. Review of Simulation-Based Tests  One proportion:  We created a null distribution by flipping a coin, rolling.
More About Significance Tests
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Inference for Linear Regression Conditions for Regression Inference: Suppose we have n observations on an explanatory variable x and a response variable.
STA Lecture 161 STA 291 Lecture 16 Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately)
Significance Tests: THE BASICS Could it happen by chance alone?
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Essential Statistics Chapter 131 Introduction to Inference.
Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed.
CHAPTER 17: Tests of Significance: The Basics
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
통계적 추론 (Statistical Inference) 삼성생명과학연구소 통계지원팀 김선우 1.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
MATH 2400 Ch. 15 Notes.
Ch 10 – Intro To Inference 10.1: Estimating with Confidence 10.2 Tests of Significance 10.3 Making Sense of Statistical Significance 10.4 Inference as.
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 8 First Part.
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 4 First Part.
CHAPTER 15: Tests of Significance The Basics ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
CHAPTER 9 Testing a Claim
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Ex St 801 Statistical Methods Inference about a Single Population Mean.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
AP Statistics Chapter 11 Notes. Significance Test & Hypothesis Significance test: a formal procedure for comparing observed data with a hypothesis whose.
Learning Objectives After this section, you should be able to: The Practice of Statistics, 5 th Edition1 DESCRIBE the shape, center, and spread of the.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Uncertainty and confidence Although the sample mean,, is a unique number for any particular sample, if you pick a different sample you will probably get.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 9 Testing a Claim 9.1 Significance Tests:
CHAPTER 15: Tests of Significance The Basics ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Chapter 8: Estimating with Confidence
BPS - 5th Ed. Chapter 231 Inference for Regression.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Simulation-based inference beyond the introductory course Beth Chance Department of Statistics Cal Poly – San Luis Obispo
STA248 week 121 Bootstrap Test for Pairs of Means of a Non-Normal Population – small samples Suppose X 1, …, X n are iid from some distribution independent.
Simulation-Based Approach for Comparing Two Means
CHAPTER 12 More About Regression
CHAPTER 26: Inference for Regression
Stat 217 – Day 17 Review.
Significance Tests: The Basics
CHAPTER 12 More About Regression
CHAPTER 12 More About Regression
Presentation transcript:

Review Tests of Significance

Single Proportion

 Theory-based works well when number of successes and failures are at least 10.  A normal distribution is used to predict what the null distribution looks like. (These are centered on the proportion under the null hypothesis.)

Comparing Two Proportions

 Our statistic is the observed difference in proportions 0.67 – 0.20 = Dolphin group Control group Total Improved10 (67%)3 (20%)13 Did Not Improve51217 Total15 30

Comparing Two Proportions  If the null hypothesis is true (dolphin therapy is not better) we would have 13 improvers and 17 non-improvers regardless of the group they were in.  Any differences we see between groups arise solely from the randomness in the assignment to the groups.  Randomly assign the groups to the improvers and non-improvers and recalculate the statistic many times.

Comparing Two Proportions  We did 1000 repetitions to develop a null distribution and found that just 13 out of 1000 results had a difference of 0.47 or higher (p-value = 0.013).

Comparing Two Proportions  Just like with a single proportion, the theory-based test works well when number of successes and failures are at least 10 in each group.  Again, a normal distribution is used to predict the shape of the null distribution. (These are always centered at 0.)

Comparing Two Means  Null hypothesis: There is no association between which bike is used and commute time Commute time is not affected by which bike is used. (µ carbon = µ steel OR µ carbon – µ steel = 0)  Alternative hypothesis: There is an association between which bike is used and commute time Commute time is affected by which bike is used. ( µ carbon ≠ µ steel OR µ carbon – µ steel ≠ 0)

Comparing Two Means Bike type Sample size Sample mean Sample SD Carbon frame min6.25 min Steel frame min4.89 min Our statistic is the observed difference in means – = 0.53.

Comparing Two Means The Original Data  Shuffling assumes the null hypothesis that the bike has no effect on commute times.  Calculate the simulated statistic after shuffling.  Repeating this many times develops a null distribution Shuffled Results

Comparing Two Means Strength of Evidence  705 of 1000 repetition are 0.53 or farther away from 0.  p-value =

Comparing Two Means  A theory-based test works well here when the sample size is at least 20.  A t-distribution is used to predict the shape of the null distribution and it is centered on 0.

Matched Pairs

 In this type of test the data starts off as two separate groups. But there is a naturally pairing. In this case the times for the same person running both paths.  So we need to look at the differences.

Matched Pairs Subject narrow angle … wide angle … diff …

Matched Pairs  The null basically says the running path doesn’t matter.  So we can randomly decide which time goes with the which path (Notice we don’t break our pairs.)  Each time we do this, compute a simulated difference in means.  We repeat this process many times to develop a null distribution. Subject narrow angle … wide angle … diff …

Matched Pairs

 A theory-based test works well when the sample size is at least 20.  Like comparing two means, a t- distribution is used to predict the null distribution.  The data used in this test are the differences and this is the same test that is used for a single mean. (Except in testing a single mean, we compare data to any number, not typically just 0.)

Comparing Multiple Proportions

Single Vehicle Lead Vehicle Following Vehicle Total Complete Stop 151 (85.8%) 38 (90.5%) 76 (77.6%) 265 Not Complete Stop 25 (14.2%) 4 (9.5%) 22 (22.4%) 51 Total

Comparing Multiple Proportions  If there is no association between arrival pattern and whether or not a vehicle stops it basically means it doesn’t matter what the arrival pattern is. Some vehicles will stop no matter what the arrival pattern and some vehicles won’t.  We can model this by shuffling either the explanatory or response variables. (The applet will shuffle the response.) and recomputing the MAD statistic many times.

Comparing Multiple Proportions  Simulated values of the statistic for 1000 shuffles  P-value = 0.083

Comparing Multiple Proportions  Theory-based tests work well for multiple proportions if the number of successes and failures are at least 10. (Just like with all proportions.)  The MAD statistic is not used in theory- based, but the chi-squared statistic (and hence a chi-squared distribution) is.  This is test is called a chi-squared test of association.

Comparing Multiple Means  Null: There is no association between whether and when a picture was shown and comprehension of the passage (µ no picture = µ picture before = µ picture after)  Alternative: There is an association between whether and when a picture was shown and comprehension of the passage (At least one of the mean comprehension scores will be different.)

Comparing Multiple Means Means MAD = (|3.21−4.95|+|3.21−3.37|+|4.95−3.37|)/3 = 1.16.

Comparing Multiple Means  Simulated values of the statistic for 5000 shuffles  P-value =

Comparing Multiple Means  Since we have a small p-value we can conclude at least one of the mean comprehension scores is different.  We can do pairwise confidence intervals to find which means are significantly different than the other means.

Comparing Multiple Means  Theory-based tests work well when we have at a sample size of least 20 in each group. (Like all tests with means.)  The MAD statistic is not used, but an F- statistic (and hence an F distribution) is.  Just like the MAD, the larger the F- statistic, the larger the strength of evidence (and hence a smaller p-value).  This test is called Analysis of Variance or ANOVA.

Correlation/Regression  Null: There is no association between heart rate and body temperature. (ρ = 0 or β = 0)  Alternative: There is a positive linear association between heart rate and body temperature. (ρ > 0 or β > 0)

Correlation/Regression HR Tmp HR Tmp r = 0.378

Correlation/Regression  If there is no association, we can break apart the temperatures and their corresponding heart rates by scrambling one of the variables. Just like we did in previous tests. We will do this by scrambling one of the variables.  After each scramble, we will compute the appropriate statistic, either correlation or the slope of the regression equation.  Repeat this many times to develop a null distribution.

Correlation/Regression  We found that 68/1000 times we had a simulated correlation greater than or equal to

Correlation/Regression  Theory-based test work well when the values of the response variable are normally distributed for each value of the explanatory variable and these normal distributions have similar variability.  We can use either the correlation or the slope of the regression line as the statistic.  A t-distribution, centered at 0, is used.

Review Confidence Intervals

 Tests of significance answer yes/no questions.  Is there strong evidence that Buzz is not just guessing?  Is there strong evidence that swimming with dolphins help reduce depression symptoms?  Sometimes we might just might want an estimate of a population parameter. E.g. What proportion of the voters will vote in the next election?

Confidence Intervals  Confidence intervals are interval estimates of a population parameter.  A population parameter is some fixed measurement for a population such as a proportion (or long-term probability), a difference in two proportions, a mean, a difference in means, or a slope of a regression equation.  These intervals give plausible (believable, credible) values for the parameter.

2SD Confidence Intervals  The observed statistics we found are used as the center of these intervals.  We used 2 standard deviations of an appropriate null distribution as our margin of error to give us a 95% confidence interval. Observed statistic ± 2SD Remember the observed statistic can be a single mean or proportion, slope of a regression line, or a difference in two means or proportions.

Supersize Drinks  A survey found 46% of 1093 randomly selected NYC voters supported the ban on large soft drinks.  What is our estimate of the population proportion that supports the ban?  0.46 ± 2(0.015) or 0.46 ± 0.03  43% to 49%

Meaning of a confidence interval  What does 95% confidence mean?  If we resampled 1093 NYC voters over and over and each time produced 95% confidence intervals, 95% of the time we would capture the true proportion of all NYC voters that favor the ban.  The interval (like the observed proportion) is random. The population parameter is fixed.

Theory-based confidence intervals  Using theory-based techniques, confidence intervals can easily be found and the confidence levels can easily be adjusted.  The same validity conditions we use for tests of significance should also be used for confidence intervals.

What effects the width of CI?  As the level of confidence increases, the width of the confidence interval increases. The wider interval, the more confident we captured the parameter. (The wider the net, the more confident we capture the fish.)  As the sample size increases, the width of the confidence interval decreases. Larger sample sizes give us more information, thus we can be more accurate.

Connecting confidence intervals and tests of significance

Significance level and confidence level

Review Big Ideas

Terminology  The population is the entire set of observational units we want to know something about.  The sample is the subgroup of the population on which we actually record data.  A statistic is a number calculated from the observed data.  A parameter is the same type of number as the statistic, but represents the underlying process or the population from which the sample was selected.

Terminology  Standard deviation (SD) is the most common measure of variability.  We can think of standard deviation as average distance values are from their mean.  A distribution is skewed to the right if the right side extends much farther than the left side.

Hypotheses and Null Distribution  The null hypothesis (H 0 ) is the chance explanation. (=)  The alternative hypothesis (H a ) is you are trying to show is true. (, or ≠)  A null distribution is the distribution of simulated statistics that represent the chance outcome.

Significance and p-value  Results are statistically significant if they are unlikely to arise by random chance (or a true null hypothesis).  The p-value is the proportion of the simulated statistics in the null distribution that are at least as extreme as the value of the observed statistic.  The smaller the p-value, the stronger the evidence against the null.

Guidelines for evaluating strength of evidence from p-values  p-value >0.10, not much evidence against null hypothesis  0.05 < p-value < 0.10, moderate evidence against the null hypothesis  0.01 < p-value < 0.05, strong evidence against the null hypothesis  p-value < 0.01, very strong evidence against the null hypothesis

Three S Strategy  Statistic: Compute the statistic from the observed data.  Simulate: Identify a model that represents a chance explanation. Repeatedly simulate values of the statistic that could have happened when the chance model is true and form a distribution. (Null distribution)  Strength of evidence: Consider whether the value of the observed statistic is unlikely to occur when the chance model is true. (p-value)

Standardized statistics and 2-sided tests  The standardized statistic is the number of standard deviations the observed statistic is above (or below) the mean of the null distribution.  Two-sided tests increase the p-value (it about doubles in simulation-based and exactly doubles in theory-based)  Two-sided tests are said to be more conservative. More evidence is needed to conclude alternative.

Biased / Simple Random Sampling  A sampling method is biased if statistics from samples consistently over or under-estimate the population parameter.  A simple random sample is the easiest way to insure that your sample is unbiased. Taking SRS’s allow us to infer our results to the population from which is was drawn.  Simple random sampling is a way of selecting members of a population so that every sample of a certain size from a population has the same chance of being chosen.

Types of Variables  When two variables are involved in a study, they are often classified as explanatory and response  Explanatory variable (Independent, Predictor) The variable we think is “explaining” the change in the response variable.  Response variable (Dependent) The variable we think is being impacted or changed by the explanatory variable.

Random assignment / Causation  Confounding variables are controlled in experiments due to the random assignment of subjects to treatment groups since this tends to balance out all other variables between the groups.  Thus, cause and effect conclusions are possible in experiments through random assignment. (It must be a well run experiment.)

Random vs. Random  With observational studies, random sampling is often done. This allows us to make inferences from the sample to the population where the sample was drawn.  With experiments, random assignment is done. This allows us to conclude causation.

Overall Test  We used one overall test (chi-squared test or ANOVA) when comparing more than two proportions or more than two means.  We do these overall tests since performing many individual tests increases the possibility of making a type I error (false- positive or false alarm).  If significance is found in an overall test, then we can follow up with individual tests or confidence intervals.

Correlation  Correlation measures the strength and direction of a linear association between two quantitative variables.  Correlation is a number between -1 and 1.  With positive correlation one variable increases as the other increases.  With negative correlation one variable decreases as the other increases.  The closer it is to either -1 or 1 the closer the points fit to a line.

 The least-squares regression line is the most common way getting a mathematical model (linear equation) for an association between two quantitative variables.  Slope is the predicted change in the response variable for one-unit change in the explanatory variable. Regression