STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample

Slides:



Advertisements
Similar presentations
Tests of Hypotheses Based on a Single Sample
Advertisements

6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
1 Difference Between the Means of Two Populations.
Click on image for full.pdf article Links in article to access datasets.
Chapter 11 Multiple Regression.
Inference about a Mean Part II
Chapter 9 Hypothesis Testing.
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
UWHC Scholarly Forum April 17, 2013 Ismor Fischer, Ph.D. UW Dept of Statistics, UW Dept of Biostatistics and Medical Informatics
Power and Sample Size IF IF the null hypothesis H 0 : μ = μ 0 is true, then we should expect a random sample mean to lie in its “acceptance region” with.
Confidence Intervals and Hypothesis Testing - II
Fundamentals of Hypothesis Testing: One-Sample Tests
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
Introduction to Hypothesis Testing: One Population Value Chapter 8 Handout.
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
Confidence intervals are one of the two most common types of statistical inference. Use a confidence interval when your goal is to estimate a population.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
1 Chapter 8 Hypothesis Testing 8.2 Basics of Hypothesis Testing 8.3 Testing about a Proportion p 8.4 Testing about a Mean µ (σ known) 8.5 Testing about.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
Lecture 9 Chap 9-1 Chapter 2b Fundamentals of Hypothesis Testing: One-Sample Tests.
Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.
Chap 8-1 Fundamentals of Hypothesis Testing: One-Sample Tests.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Understanding Basic Statistics Fourth Edition By Brase and Brase Prepared by: Lynn Smith Gloucester County College Chapter Nine Hypothesis Testing.
1 Definitions In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test is a standard procedure for testing.
Introduction to Basic Statistical Methods Part 1: “Statistics in a Nutshell” UWHC Scholarly Forum March 19, 2014 Ismor Fischer, Ph.D. UW Dept of Statistics.
Introduction to Basic Statistical Methods Part 1: Statistics in a Nutshell UWHC Scholarly Forum May 21, 2014 Ismor Fischer, Ph.D. UW Dept of Statistics.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Chapter Nine Hypothesis Testing.
Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests
Statistics for Managers Using Microsoft® Excel 5th Edition
Lecture Slides Elementary Statistics Twelfth Edition
More on Inference.
Introduction to Statistics for Engineers
Chapter 4 Continuous Random Variables and Probability Distributions
STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample
CHAPTER 6 Statistical Inference & Hypothesis Testing
Chapters 20, 21 Hypothesis Testing-- Determining if a Result is Different from Expected.
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Statistical inference: distribution, hypothesis testing
STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample
Chapter 8: Inference for Proportions
Week 10 Chapter 16. Confidence Intervals for Proportions
Hypothesis Testing: Hypotheses
Hypothesis Testing Summer 2017 Summer Institutes.
When we free ourselves of desire,
Two-sided p-values (1.4) and Theory-based approaches (1.5)
Chapter 9 Hypothesis Testing.
More on Inference.
Elementary Statistics
9 Tests of Hypotheses for a Single Sample CHAPTER OUTLINE
Chapter 9 Hypothesis Testing.
Review for Exam 2 Some important themes from Chapters 6-9
Problems: Q&A chapter 6, problems Chapter 6:
Elementary Statistics
Confidence Interval Estimation
Essential Statistics Introduction to Inference
Hypothesis Testing A hypothesis is a claim or statement about the value of either a single population parameter or about the values of several population.
Chapter 7: The Normality Assumption and Inference with OLS
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
Last Update 12th May 2011 SESSION 41 & 42 Hypothesis Testing.
CHAPTER 9 Testing a Claim
Chapter 9 Hypothesis Testing: Single Population
CHAPTER 9 Testing a Claim
Section 8.2 Day 2.
Presentation transcript:

STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample Chapter 8 - Tests of Hypotheses Based on a Single Sample Chapter 9 - Inferences Based on Two Samples Chapter 10 - Analysis of Variance Chapter 11 - Multifactor Analysis of Variance Chapter 12 - Simple Linear Regression and Correlation (see section 5.2) Chapter 13 - Nonlinear and Multiple Regression Chapter 14 - Goodness-of-Fit Tests and Categorical Data Analysis Chapter 15 - Distribution-Free Procedures Chapter 16 - Quality Control Methods

STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample Chapter 8 - Tests of Hypotheses Based on a Single Sample Chapter 7 - Statistical Intervals Based on a Single Sample

STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample 7.1 - Basic Properties of Confidence Intervals 7.2 - Large-Sample Confidence Intervals for a Population Mean and Proportion 7.3 - Intervals Based on a Normal Population Distribution 7.4 - Confidence Intervals for the Variance and Standard Deviation of a Normal Pop Chapter 8 - Tests of Hypotheses Based on a Single Sample 8.1 - Hypotheses and Test Procedures 8.2 - Z-Tests for Hypotheses about a Population Mean 8.3 - The One-Sample T-Test 8.4 - Tests Concerning a Population Proportion 8.5 - Further Aspects of Hypothesis Testing

STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample 7.1 - Basic Properties of Confidence Intervals 7.2 - Large-Sample Confidence Intervals for a Population Mean 7.3 - Intervals Based on a Normal Population Distribution 7.4 - Confidence Intervals for the Variance and Standard Deviation of a Normal Pop and Proportion Chapter 8 - Tests of Hypotheses Based on a Single Sample 8.1 - Hypotheses and Test Procedures 8.2 - Z-Tests for Hypotheses about a Population Mean 8.3 - The One-Sample T-Test 8.4 - Tests Concerning a Population Proportion 8.5 - Further Aspects of Hypothesis Testing

STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample 7.1 - Basic Properties of Confidence Intervals 7.2 - Large-Sample Confidence Intervals for a Population Mean and Proportion 7.3 - Intervals Based on a Normal Population Distribution 7.4 - Confidence Intervals for the Variance and Standard Deviation of a Normal Pop Chapter 8 - Tests of Hypotheses Based on a Single Sample 8.1 - Hypotheses and Test Procedures 8.2 - Z-Tests for Hypotheses about a Population Mean 8.3 - The One-Sample T-Test 8.4 - Tests Concerning a Population Proportion 8.5 - Further Aspects of Hypothesis Testing

Right-cick on image for full .pdf article Links in article to access datasets

“Parameter Estimation” Example: One Mean POPULATION “Random Variable” X = Age (years) Women in U.S. who have given birth “Parameter Estimation” Improve this point estimate of μ to an “interval estimate” of μ, via the… Present: Assume that X follows a “normal distribution” in the population, with std dev σ = 1.5 yrs, but unknown mean μ = ? That is, X ~ N(μ, 1.5). Estimate the parameter value μ. “Sampling Distribution of ” standard deviation σ = 1.5 This is referred to as a “point estimate” of μ from the sample. mean μ = ??? Random Sample size n = 400 mean {x1, x2, x3, x4, … , x400} FORMULA

Population Distribution of X Sampling Distribution of for any sample size n. If X ~ N(μ, σ), then… POPULATION X = Age of women in U.S. who have given birth μ “standard error” X μ standard deviation σ = 1.5 yrs

Sampling Distribution of μ | μ To obtain an “interval estimate” of μ we first ask the following general question: “standard error” Suppose is any random sample mean. Find a “margin of error” (d) so that there is a 95% probability that the interval contains μ.

standard normal distribution d = (z.025)(s.e.) = (1.96)(.075 yrs) = 0.147 yrs Sampling Distribution of μ | μ “standard error” standard normal distribution N(0, 1) 0.95 0.025 0.025 Z -z.025 +z.025

IMPORTANT DEF’NS and FACTS standard normal distribution d = (z.025)(s.e.) = (1.96)(.075 yrs) = 0.147 yrs d is called the “95% margin of error” and is equal to the product of the “.025 critical value” (i.e., z.025 = 1.96) times the “standard error” (i.e., ). | μ The “confidence level” is 95%. The “significance level” is 5%. For any random sample mean the “95% confidence interval” is It contains μ with probability 95%. standard normal distribution N(0, 1) In this example, the 95% CI is 0.95 For instance, if a particular sample yields the 95% CI is (25.6 – 0.147, 25.6 + 0.147) = (25.543, 25.747) yrs. It contains μ with 95% “confidence.” 0.025 0.025 Z -z.025 +z.025

IMPORTANT DEF’NS and FACTS standard normal distribution d = (z.025)(s.e.) = (1.96)(.075 yrs) = 0.147 yrs d = (zα/2)(s.e.) d is called the “95% margin of error” and is equal to the product of the “.025 critical value” (i.e., z.025 = 1.96) times the “standard error” (i.e., ). “100(1 – α)% margin of error” | μ “α/2 zα/2) 1 – α The “confidence level” is 95%. 1 – α. The “significance level” is 5%. α. For any random sample mean the “95% confidence interval” is It contains μ with probability 95%. “100(1 – α)% “confidence interval” 1 – α. standard normal distribution N(0, 1) In this example, the 95% CI is 1 – α 0.95 For instance, if a particular sample yields the 95% CI is (25.6 – 0.147, 25.6 + 0.147) = (25.543, 25.747) yrs. It contains μ with 95% “confidence.” α/2 0.025 α/2 0.025 Z -z.025 -zα/2 +z.025 +zα/2

IMPORTANT DEF’NS and FACTS standard normal distribution d = (zα/2)(s.e.) d is called the “95% margin of error” and is equal to the product of the “.025 critical value” (i.e., z.025 = 1.96) times the “standard error” (i.e., ). “100(1 – α)% margin of error” | μ “α/2 zα/2) 1 – α The “confidence level” is 95%. 1 – α. The “significance level” is 5%. α. For any random sample mean the “95% confidence interval” is It contains μ with probability 95%. “100(1 – α)% “confidence interval” 1 – α. standard normal distribution N(0, 1) What happens if we change α? Example: α = .05, 1 – α = .95 Example: α = .10, 1 – α = .90 Example: α = .01, 1 – α = .99 1 – α 0.95 +2.575 -2.575 -1.96 +1.96 +1.645 -1.645 | α/2 0.025 α/2 0.025 Z Why not ask for α = 0, i.e., 1 – α = 1? Because then the critical values → ± ∞. -z.025 -zα/2 +z.025 +zα/2

IMPORTANT DEF’NS and FACTS 95% margin of error (z.025)(s.e.) = (1.96)(.075 yrs) = 0.147 yrs In this example, the 95% CI is For instance, if a particular sample yields the 95% CI is (25.6 – 0.147, 25.6 + 0.147) = (25.543, 25.747) yrs. It contains μ with 95% “confidence.” ? μ | standard normal distribution N(0, 1) 0.95 0.025 +1.96 -1.96 Z In principle, over the long run, the probability that a random interval contains μ will approach 95%. … etc… BUT….

IMPORTANT DEF’NS and FACTS 95% margin of error (z.025)(s.e.) = (1.96)(.075 yrs) = 0.147 yrs In this example, the 95% CI is For instance, if a particular sample yields the 95% CI is (25.6 – 0.147, 25.6 + 0.147) = (25.543, 25.747) yrs. It contains μ with 95% “confidence.” μ | standard normal distribution N(0, 1) 0.95 0.025 +1.96 -1.96 Z In practice, only a single, fixed interval is generated from a single random sample, so technically, “probability” does not apply. In principle, over the long run, the probability that a random interval contains μ will approach 95%. NOW, let us introduce and test a specific hypothesis… BUT….

STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample 7.1 - Basic Properties of Confidence Intervals 7.2 - Large-Sample Confidence Intervals for a Population Mean and Proportion 7.3 - Intervals Based on a Normal Population Distribution 7.4 - Confidence Intervals for the Variance and Standard Deviation of a Normal Pop Chapter 8 - Tests of Hypotheses Based on a Single Sample 8.1 - Hypotheses and Test Procedures 8.2 - Z-Tests for Hypotheses about a Population Mean 8.3 - The One-Sample T-Test 8.4 - Tests Concerning a Population Proportion 8.5 - Further Aspects of Hypothesis Testing

Random Sample Statistical Inference and Hypothesis Testing Women in U.S. who have given birth Study Question: Has “age at first birth” of women in the U.S. changed over time? Statistical Inference and Hypothesis Testing POPULATION “Random Variable” X = Age at first birth “Null Hypothesis” Year 2010: Suppose we know that X follows a “normal distribution” (a.k.a. “bell curve”) in the population. Present: Is μ = 25.4 still true? H0: public education, awareness programs socioeconomic conditions, etc. Or, is the “alternative hypothesis” HA: μ ≠ 25.4 true? That is, X ~ N(25.4, 1.5). i.e., either or ? (2-sided) μ < 25.4 μ > 25.4 standard deviation σ = 1.5 μ < 25.4 μ > 25.4 Does the sample statistic tend to support H0, or refute H0 in favor of HA? mean μ = 25.4 Random Sample {x1, x2, x3, x4, … , x400} FORMULA mean

Objective: Hypothesis Testing… via Confidence Interval We have now seen: 95% CONFIDENCE INTERVAL FOR µ 25.543 25.747 “point estimate” for μ BASED ON OUR SAMPLE DATA, the true value of μ today is between 25.543 and 25.747, with 95% “confidence.” FORMAL CONCLUSIONS: The 95% confidence interval corresponding to our sample mean does not contain the “null value” of the population mean, μ = 25.4. Based on our sample data, we may reject the null hypothesis H0: μ = 25.4 in favor of the two-sided alternative hypothesis HA: μ ≠ 25.4, at the α = .05 significance level. INTERPRETATION: According to the results of this study, there exists a statistically significant difference between the mean ages at first birth in 2010 (25.4 yrs) and today, at the 5% significance level. Moreover, the evidence from the sample data suggests that the population mean age today is older than in 2010, rather than younger. NOTE THAT THE CONFIDENCE INTERVAL ONLY DEPENDS ON THE SAMPLE, NOT A SPECIFIC NULL HYPOTHESIS!!!

Objective: Hypothesis Testing… via Confidence Interval What if…? We have now seen: 95% CONFIDENCE INTERVAL FOR µ 25.347 25.053 95% CONFIDENCE INTERVAL FOR µ 25.747 25.543 “point estimate” for μ “point estimate” for μ BASED ON OUR SAMPLE DATA, the true value of μ today is between 25.543 and 25.747, with 95% “confidence.” BASED ON OUR SAMPLE DATA, the true value of μ today is between 25.053 and 25.347, with 95% “confidence.” ? FORMAL CONCLUSIONS: The 95% confidence interval corresponding to our sample mean does not contain the “null value” of the population mean, μ = 25.4. Based on our sample data, we may reject the null hypothesis H0: μ = 25.4 in favor of the two-sided alternative hypothesis HA: μ ≠ 25.4, at the α = .05 significance level. INTERPRETATION: According to the results of this study, there exists a statistically significant difference between the mean ages at first birth in 2010 (25.4 yrs) and today, at the 5% significance level. Moreover, the evidence from the sample data suggests that the population mean age today is older than in 2010, rather than younger. NOTE THAT THE CONFIDENCE INTERVAL ONLY DEPENDS ON THE SAMPLE, NOT A SPECIFIC NULL HYPOTHESIS!!! younger than in 2010, rather than older.

IF the null hypothesis H0: μ = 25.4 is indeed true, then… Objective: Hypothesis Testing… via Acceptance Region Objective: Hypothesis Testing… via Confidence Interval IF the null hypothesis H0: μ = 25.4 is indeed true, then… 95% margin of error (z.025)(s.e.) = (1.96)(.075 yrs) = 0.147 yrs Sampling Distribution of “Null” Distribution of μ “standard error” … and out here… …with 5% probability. 0.95 … we would expect a random sample mean to lie in here, with 95% probability… 0.025 0.025 95% ACCEPTANCE REGION FOR H0 | μ = 25.4 25.253 25.547 25.4

Objective: Hypothesis Testing… via Acceptance Region Our data value lies in the 5% REJECTION REGION. We have now seen: 95% ACCEPTANCE REGION FOR H0 25.253 25.547 IF H0 is true, then we would expect a random sample mean to lie between 25.253 and 25.547, with 95% probability. FORMAL CONCLUSIONS: The 95% acceptance region for the null hypothesis does not contain the sample mean of Based on our sample data, we may reject the null hypothesis H0: μ = 25.4 in favor of the two-sided alternative hypothesis HA: μ ≠ 25.4, at the α = .05 significance level. INTERPRETATION: According to the results of this study, there exists a statistically significant difference between the mean ages at first birth in 2010 (25.4 yrs) and today, at the 5% significance level. Moreover, the evidence from the sample data suggests that the population mean age today is older than in 2010, rather than younger. NOTE THAT THE ACCEPTANCE REGION ONLY DEPENDS ON THE NULL HYPOTHESIS, NOT ON THE SAMPLE!!!

Approximately 95% of the sample mean values are contained between Relation between CI and AR: Approximately 95% of the sample mean values are contained between and is called the 95% margin of error  Sample 1  Sample 2  Sample 3  Sample 4  Sample 5 etc…

Approximately 95% of the sample mean values are contained between But from the samples’ point of view… Approximately 95% of the sample mean values are contained between and is called the 95% margin of error   Sample 1 Sample 2 Sample 3 Sample 4 Sample 5

  But from the samples’ point of view… Approximately 95% of the intervals from to contain , and approx 5% do not. Approximately 95% of the sample mean values are contained between and is called the 95% margin of error   Sample 1 Sample 2 Sample 3 Sample 4 Sample 5

IF the null hypothesis H0: μ = 25.4 is indeed true, then… Objective: Hypothesis Testing… via Acceptance Region Objective: Hypothesis Testing… via “p-value” - measures the strength of the rejection IF the null hypothesis H0: μ = 25.4 is indeed true, then… > 1.96 what is the probability of obtaining a random sample mean that is as, or more, extreme than the one actually obtained? i.e., 0.2 yrs OR MORE away from μ = 25.4, ON EITHER SIDE (since the alternative hypothesis is 2-sided)? < .05 statistically significant | μ = 25.4 25.253 25.547 95% ACCEPTANCE REGION FOR H0 0.025 0.95 > 1-pnorm(2.6667) > pnorm(-2.6667) > pnorm(25.2, 25.4, .075) [1] 0.003830001 0.00383

IF the null hypothesis H0: μ = 25.4 is indeed true, then… Objective 2: Hypothesis Testing… via Acceptance Region Objective: Hypothesis Testing… via “p-value” - measures the strength of the rejection IF the null hypothesis H0: μ = 25.4 is indeed true, then… > 1.96 what is the probability of obtaining a random sample mean that is as, or more, extreme than the one actually obtained? i.e., 0.2 yrs OR MORE away from μ = 25.4, ON EITHER SIDE (since the alternative hypothesis is 2-sided)? < .05 α statistically significant | μ = 25.4 25.253 25.547 95% ACCEPTANCE REGION FOR H0 0.025 0.95 1 – α α / 2 100(1 – α)% ACCEPTANCE REGION FOR H0 -zα/2 +zα/2 0.00383

IF the null hypothesis H0: μ = 25.4 is indeed true, then… Objective 2: Hypothesis Testing… via Acceptance Region Objective: Hypothesis Testing… via “p-value” - measures the strength of the rejection IF the null hypothesis H0: μ = 25.4 is indeed true, then… > 1.96 what is the probability of obtaining a random sample mean that is as, or more, extreme than the one actually obtained? i.e., 0.2 yrs OR MORE away from μ = 25.4, ON EITHER SIDE (since the alternative hypothesis is 2-sided)? < .05 α statistically significant | μ = 25.4 25.253 25.547 95% ACCEPTANCE REGION FOR H0 0.025 0.95 1 – α α / 2 100(1 – α)% ACCEPTANCE REGION FOR H0 -zα/2 +zα/2 Very informally, the p-value of a sample is the probability (hence a number between 0 and 1) that it “agrees” with the null hypothesis. Hence a very small p-value indicates strong evidence against the null hypothesis. The smaller the p-value, the stronger the evidence, and the more “statistically significant” the finding (e.g., p < .0001). 0.00383

p-value If p-value < , then reject H0; significance!  = .05 If p-value < , then reject H0; significance! ... But interpret it correctly! p-value

~ Summary of Hypothesis Testing for One Mean ~ Assume the population random variable is normally distributed, i.e., X  N(μ, σ). NULL HYPOTHESIS H0: μ = μ0 (“null value”) Test null hypothesis at significance level α. ALTERNATIVE HYPOTHESIS HA: μ  μ0 i.e., either μ < μ0 or μ > μ0 (“two-sided”) CONFIDENCE INTERVAL Compute the sample mean Compute the 100(1 – α)% “margin of error” = (critical value)(standard error) Then the 100(1 – α)% CI = Formal Conclusion: Reject null hypothesis at level α, Statistical significance! Otherwise, retain it. zα/2 if CI does not contain μ0.

~ Summary of Hypothesis Testing for One Mean ~ Assume the population random variable is normally distributed, i.e., X  N(μ, σ). NULL HYPOTHESIS H0: μ = μ0 (“null value”) Test null hypothesis at significance level α. ALTERNATIVE HYPOTHESIS HA: μ  μ0 i.e., either μ < μ0 or μ > μ0 (“two-sided”) ACCEPTANCE REGION Compute the sample mean Compute the 100(1 – α)% “margin of error” = (critical value)(standard error) Then the 100(1 – α)% AR = Formal Conclusion: Reject null hypothesis at level α, Statistical significance! Otherwise, retain it. zα/2 if AR does not contain

~ Summary of Hypothesis Testing for One Mean ~ Assume the population random variable is normally distributed, i.e., X  N(μ, σ). NULL HYPOTHESIS H0: μ = μ0 (“null value”) Test null hypothesis at significance level α. ALTERNATIVE HYPOTHESIS HA: μ  μ0 i.e., either μ < μ0 or μ > μ0 (“two-sided”) p-value Compute the sample mean Compute the z-score If +, then the p-value = 2 P(Z ≥ z-score ). If –, then the p-value = 2 P(Z ≤ z-score ). Formal Conclusion: Reject null hypothesis Statistical significance! Otherwise, retain it. Remember: “The smaller the p-value, the stronger the rejection, and the more statistically significant the result.” Z ~ N(0, 1) z-score if p < α.

statistically significant Objective: Hypothesis Testing… 1-sided tests The alternative hypothesis usually reflects the investigator’s belief! 2-sided test H0: μ = 25.4 HA: μ  25.4 p-value In this case,  = .05 is split evenly between the two tails, left and right. 1-sided tests “Right-tailed” H0: μ  25.4 HA: μ > 25.4 Here, all of  = .05 is in the right tail. < .05 The alternative hypothesis usually reflects the investigator’s belief! statistically significant | μ = 25.4 95% ACCEPTANCE REGION FOR H0 0.95 25.253 25.547 0.025 .00383

? < .05 statistically significant Use 1-sided tests sparingly! Objective: Hypothesis Testing… 1-sided tests 2-sided test H0: μ = 25.4 HA: μ  25.4 p-value In this case,  = .05 is split evenly between the two tails, left and right. 1-sided tests “Right-tailed” H0: μ  25.4 HA: μ > 25.4 Here, all of  = .05 is in the right tail. < .05 statistically significant Use 1-sided tests sparingly! 0.95 0.05 0.025 0.025 .00383 .00383 95% ACCEPTANCE REGION FOR H0 95% ACCEPTANCE REGION FOR H0 | μ = 25.4 25.253 25.547 ?

? < .05 >> .05 statistically significant p-value 25.2 25.2 Objective: Hypothesis Testing… 1-sided tests 2-sided test H0: μ = 25.4 HA: μ  25.4 p-value 25.2 25.2 In this case,  = .05 is split evenly between the two tails, left and right. 25.2 1-sided tests “Right-tailed” H0: μ  25.4 HA: μ > 25.4 Here, all of  = .05 is in the right tail. “Left-tailed” H0: μ  25.4 HA: μ < 25.4 Here, all of  = .05 is in the left tail. < .05 >> .05 The alternative hypothesis usually reflects the investigator’s belief! strong support of null hypothesis statistically significant 0.95 0.05 95% ACCEPTANCE REGION FOR H0 | μ = 25.4 ?

Subject: basic calculation of p-values for z-test STATBOT 312 Subject: basic calculation of p-values for z-test Calculate… from H0 Test Statistic “z-score” = sign of z-score? 1 – table entry table entry HA: μ ≠ μ0? HA: μ < μ0 HA: μ > μ0 2 × table entry 2 × (1 – table entry) – + Calculate… from H0 Test Statistic “z-score” = HA: μ ≠ μ0? HA: μ < μ0 HA: μ > μ0 1 – table entry table entry sign of z-score? 2 × table entry 2 × (1 – table entry) – +

Loose Ends… Normality check?  unknown? n = ?

Given: X ~ N(μ , σ ) Normally-distributed population random variable, with unknown mean, but known standard deviation H0: μ = μ0 Null Hypothesis HA: μ ≠ μ0 Alternative Hypothesis (2-sided)  significance level (or equivalently, confidence level 1 – ) n sample size From this, we obtain… “standard error” s.e. sample mean sample standard deviation …with which to test the null hypothesis (via CI, AR, p-value). In practice however, it is far more common that the true population standard deviation σ is unknown. So we must estimate it from the sample! (estimate) x1, x2,…, xn Recall that

n sample size Given: See Ch. 4.6 (311) X ~ N(μ , σ ) Normally-distributed population random variable, with unknown mean, but known standard deviation H0: μ = μ0 Null Hypothesis HA: μ ≠ μ0 Alternative Hypothesis (2-sided)  significance level (or equivalently, confidence level 1 – ) n sample size From this, we obtain… “standard error” s.e. sample mean sample standard deviation …with which to test the null hypothesis (via CI, AR, p-value). In practice however, it is far more common that the true population standard deviation σ is unknown. So we must estimate it from the sample! This introduces additional variability from one sample to another… PROBLEM??? Not if n is “large”…say,  30. (estimate) But what if n < 30? T-test! x1, x2,…, xn Recall that

X is a linear function of Z How do we check that this assumption is reasonable, when all we have is a sample? And what do we do if it’s not, or we can’t tell? IF our data approximates a bell curve, then its quantiles should “line up” with those of N(0, 1). Z ~ N(0, 1) X is a linear function of Z

X is a linear function of Z How do we check that this assumption is reasonable, when all we have is a sample? And what do we do if it’s not, or we can’t tell? Sample quantiles IF our data approximates a bell curve, then its quantiles should “line up” with those of N(0, 1). Z ~ N(0, 1) X is a linear function of Z Q-Q plot Normal scores plot Normal probability plot

X is a linear function of Z How do we check that this assumption is reasonable, when all we have is a sample? And what do we do if it’s not, or we can’t tell? IF our data approximates a bell curve, then its quantiles should “line up” with those of N(0, 1). X is a linear function of Z Q-Q plot Normal scores plot Normal probability plot qqnorm(mysample) qqline(mysample) (R uses a slight variation to generate quantiles…)

And what do we do if it’s not, or we can’t tell? How do we check that this assumption is reasonable, when all we have is a sample? And what do we do if it’s not, or we can’t tell? IF our data approximates a bell curve, then its quantiles should “line up” with those of N(0, 1). X is a linear function of Z Q-Q plot Normal scores plot Normal probability plot qqnorm(mysample) qqline(mysample) (R uses a slight variation to generate quantiles…) Formal statistical tests exist; see notes. Method can be extended to other models

And what do we do if it’s not, or we can’t tell? How do we check that this assumption is reasonable, when all we have is a sample? And what do we do if it’s not, or we can’t tell? Use a mathematical “transformation” of the data (e.g., log, square root,…). x = rchisq(1000, 15) hist(x) y = log(x) hist(y) X is said to be “log-normal.”

And what do we do if it’s not, or we can’t tell? How do we check that this assumption is reasonable, when all we have is a sample? And what do we do if it’s not, or we can’t tell? Use a mathematical “transformation” of the data (e.g., log, square root,…). qqnorm(x, pch = 19, cex = .5) qqline(x) qqnorm(y, pch = 19, cex = .5) qqline(y)

“Cauchy distribution” How do we check that this assumption is reasonable, when all we have is a sample? And what do we do if it’s not, or we can’t tell? Use a mathematical “transformation” of the data (e.g., log, square root,…). “Cauchy distribution”

How do we check that this assumption is reasonable, when all we have is a sample? And what do we do if it’s not, or we can’t tell? Use a mathematical “transformation” of the data (e.g., log, square root,…). So then what????

“Statistical Inference” POPULATION via… “Hypothesis Testing” Study Question: Has “Mean (i.e., average) Age at First Birth” of women in the U.S. changed since 2010 (25.4 yrs old)? Present Day: Assume “Mean Age at First Birth” follows a normal distribution (i.e., “bell curve”) in the population. Assume Population Distribution X H0: pop mean age  = 25.4 (i.e., no change since 2010) Random Sample size n = 400 ages The reasonableness of the normality assumption is empirically verifiable, and in fact formally testable from the sample data. If violated (e.g., skewed) or inconclusive (e.g., small sample size), then “distribution-free” nonparametric tests should be used instead… Examples: Sign Test, Wilcoxon Signed Rank Test (= Mann-Whitney U Test) x4 x1 x3 x2 x5 … etc… x400

“Statistical Inference” POPULATION via… “Hypothesis Testing” Study Question: Has “Mean (i.e., average) Age at First Birth” of women in the U.S. changed since 2010 (25.4 yrs old)? Present Day: Assume “Mean Age at First Birth” follows a normal distribution (i.e., “bell curve”) in the population. Population Distribution X H0: pop mean age  = 25.4 (i.e., no change since 2010) Random Sample size n = 400 ages x4 x1 Sample size n partially depends on the power of the test, i.e., the desired probability of correctly rejecting a false null hypothesis (80% or more). Coming up next! x3 x2 x5 … etc… x400