On Thursday, John will provide information about the project Next homework: John will announce in class on Thursday, and it’ll be due the following Thursday.

Slides:



Advertisements
Similar presentations
Statistics Review – Part II Topics: – Hypothesis Testing – Paired Tests – Tests of variability 1.
Advertisements

Inference Sampling distributions Hypothesis testing.
Chapter 10 Section 2 Hypothesis Tests for a Population Mean
1 Difference Between the Means of Two Populations.
Chapter 7 Sampling and Sampling Distributions
HW: –Due Next Thursday (4/18): –Project proposal due next Thursday (4/18).
Where we’ve been & where we’re going We can use data to address following questions: 1.Question:Is a mean = some number? Large sample z-test and CI Small.
1 The Basics of Regression Regression is a statistical technique that can ultimately be used for forecasting.
HW: See Web Project proposal due next Thursday. See web for more detail.
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #16.
1 Hypothesis Testing In this section I want to review a few things and then introduce hypothesis testing.
Inferences About Process Quality
Chapter 9 Hypothesis Testing.
Chapter 8 Introduction to Hypothesis Testing
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
Confidence Intervals and Hypothesis Testing - II
Fundamentals of Hypothesis Testing: One-Sample Tests
Lesson Comparing Two Means.
More About Significance Tests
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Statistical Inferences Based on Two Samples Chapter 9.
+ Chapter 9 Summary. + Section 9.1 Significance Tests: The Basics After this section, you should be able to… STATE correct hypotheses for a significance.
Inference for a Single Population Proportion (p).
Comparing Two Population Means
STA Lecture 161 STA 291 Lecture 16 Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately)
Dan Piett STAT West Virginia University
EDUC 200C Friday, October 26, Goals for today Homework Midterm exam Null Hypothesis Sampling distributions Hypothesis testing Mid-quarter evaluations.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
CHAPTER 18: Inference about a Population Mean
Chapter 10 Inferences from Two Samples
Don’t forget HW due on Tuesday. Assignment is on web.
Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed.
Water Test Take 1 cup from each sleeve –See numbers on bottom of cup –Numbers should be a # < 100 and that number –For small # (
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
On Thursday, I’ll provide information about the project Due on Friday after last class. Proposal will be due two weeks from today (April 15 th ) You’re.
INCM 9201 Quantitative Methods Confidence Intervals for Means.
5.1 Chapter 5 Inference in the Simple Regression Model In this chapter we study how to construct confidence intervals and how to conduct hypothesis tests.
Confidence Intervals Lecture 3. Confidence Intervals for the Population Mean (or percentage) For studies with large samples, “approximately 95% of the.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests Statistics.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
Introduction to the Practice of Statistics Fifth Edition Chapter 6: Introduction to Inference Copyright © 2005 by W. H. Freeman and Company David S. Moore.
Lecture 9 Chap 9-1 Chapter 2b Fundamentals of Hypothesis Testing: One-Sample Tests.
Lesson Comparing Two Means. Knowledge Objectives Describe the three conditions necessary for doing inference involving two population means. Clarify.
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 8 First Part.
Chapter 8 Parameter Estimates and Hypothesis Testing.
More Examples: There are 4 security checkpoints. The probability of being searched at any one is 0.2. You may be searched more than once in total and all.
Chap 8-1 Fundamentals of Hypothesis Testing: One-Sample Tests.
Fall 2002Biostat Statistical Inference - Proportions One sample Confidence intervals Hypothesis tests Two Sample Confidence intervals Hypothesis.
Review: Large Sample Confidence Intervals 1-  confidence interval for a mean: x +/- z  /2 s/sqrt(n) 1-  confidence interval for a proportion: p +/-
© Copyright McGraw-Hill 2004
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 1 – Slide 1 of 26 Chapter 11 Section 1 Inference about Two Means: Dependent Samples.
Announcements Exam 1 key posted on web HW 7 (posted on web) Due Oct. 24 Bonus E Due Oct. 24 Office Hours –this week: –Wed. 8-11, 3-4 –Fri For Bonus.
AP Statistics Chapter 11 Notes. Significance Test & Hypothesis Significance test: a formal procedure for comparing observed data with a hypothesis whose.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Homework and project proposal due next Thursday. Please read chapter 10 too…
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Class Six Turn In: Chapter 15: 30, 32, 38, 44, 48, 50 Chapter 17: 28, 38, 44 For Class Seven: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 Read.
Statistics 200 Objectives:
Statistics 200 Objectives:
Where we’ve been & where we’re going
INF397C Introduction to Research in Information Studies Spring, Day 12
Announcements Exams are graded. I’ll hand them back at end of class. Solution will be on the web. Many people did better than I think they’d guess… Easy.
Chapter 9 Hypothesis Testing.
Problems: Q&A chapter 6, problems Chapter 6:
Lesson Comparing Two Means.
Chapter 12: Comparing Independent Means
Hypothesis Testing.
Statistical inference for the slope and intercept in SLR
Presentation transcript:

On Thursday, John will provide information about the project Next homework: John will announce in class on Thursday, and it’ll be due the following Thursday. Please hand in this week’s homework before you leave. Announcements

Hypothesis Testing: 20,000 Foot View 1.Set up the hypothesis to test and collect data Hypothesis to test: H O

Hypothesis Testing: 20,000 Foot View 1.Set up the hypothesis to test and collect data 2.Assuming that the hypothesis is true, are the observed data likely? Data are deemed “unlikely” if the test statistics is in the extreme of its distribution when H O is true. Hypothesis to test: H O

Hypothesis Testing: 20,000 Foot View 1.Set up the hypothesis to test and collect data 2.Assuming that the hypothesis is true, are the observed data likely? 3.If not, then the alternative to the hypothesis must be true. Data are deemed “unlikely” if the test statistics is in the extreme of its distribution when H O is true. Alternative to H O is H A Hypothesis to test: H O

Hypothesis Testing: 20,000 Foot View 1.Set up the hypothesis to test and collect data 2.Assuming that the hypothesis is true, are the observed data likely? 3.If not, then the alternative to the hypothesis must be true. 4.P-value describes how likely the observed data are assuming H O is true. (i.e. answer to Q#2 above) Data are deemed “unlikely” if the test statistics is in the extreme of its distribution when H O is true. “Unlikely” if p-value <  (smaller p-value = less likely) Alternative to H O is H A Hypothesis to test: H O

Large Sample Test for a Proportion: Taste Test Data 33 people drink two unlabeled cups of cola (1 is coke and 1 is pepsi) p = proportion who correctly identify drink = 20/33 = 61% Question: is this statistically significantly different from 50% (random guessing) at  = 10%?

Large Sample Test for a Proportion: Taste Test Data H O : p = 0.5 H A : p does not equal 0.5 Test statistic: z = | (p -.5)/sqrt( p(1-p)/n) | = | (.61-.5)/sqrt(.61*.39/33) | = 1.25 Reject if z > z 0.10/2 = It’s not, so there’s not enough evidence to reject H O.

Large Sample Test for a Proportion: Taste Test Data P-value Pr( |(P-p)/sqrt(P Q/n)| > |(p-p)/sqrt(p q/n)| when H 0 is true) =Pr( |(P-0.5)/sqrt(P Q/n) | > |1.25 | when H 0 is true) =2*Pr( Z > 1.25) where Z~N(0,1) = 21% i.e. “How likely is a test statistic of 1.25 when true p = 50%?”

The test The test statistic is: z = | (p -.5)/sqrt(.5(1-.5)/n) | = | (.61-.5)/sqrt(.25/33) | = 1.22 Note that since.25 >= p(1-p) for any p, this is more conservative (larger denominator = smaller test statistic). Either way is fine.

Difference between two means PCB Data –Sample 1: Treatment is to expose cells to a certain PCB (number 156) –Sample 2: Treatment is to expose the cells to PCB 156 and another chemical compound (estradiol) Response = estrogen produced by cells Question: Can we conclude that average estrogen produced in sample 1 is different from average by sample 2 (at  = 0.05)?

H 0 :  1 –  2 = 0 H A :  1 –  2 does not = 0 Test statistic: |(Estimate – value under H 0 )/Std Dev(Estimate)| z = (x 1 – x 2 )/sqrt(s 1 2 /n 1 + s 2 2 /n 2 ) Reject if |z| > z  /2 P-value = 2*Pr[ Z > (x 1 – x 2 )/sqrt(s 1 2 /n 1 + s 2 2 /n 2 )] where Z~N(0,1). Form of the test

nx s PCB PCB156+E |z| = |-0.229/sqrt( / /64)| = |-1.41| = 1.41 z  /2 = z 0.05/2 = z = 1.96 So don’t reject. P-value = 2*Pr(Z > 1.41) = 16% Pr( Test statistic > 1.41 when H O is true) Data

Test statistic: |z| = |(Estimate – assumed value under H 0 )/(Std Dev of the Estimator)| (note that Std Dev of the estimator is the Standard Dev of the individual data points that go into the estimators divided by the square root of n) Reject if |z| > z  /2 P-value = 2*Pr( Z > z ) where Z~N(0,1). In the previous example, what was the estimator? What was its standard error? Note the similarity to a confidence interval for the difference between two means. In General, Large Sample 2 sided Tests:

Large Sample Hypothesis Tests: summary for means Single mean HypothesesTest (level 0.05) H O :  = kReject H O if |(x-k)/s/sqrt(n)|>1.96 H A :  does not = kp-value: 2*Pr(Z>|(x-k)/s/sqrt(n)|) where Z~N(0,1) Difference between two means HypothesesTest (level 0.05) H O :     = DLet d = x 1 – x 2 H A :     does not = D Let SE = sqrt(s 1 2 /n 2 + s 2 2 /n 2 ) Reject H O if |(d-D)/SE|>1.96 p-value: 2*Pr(Z>|(d-D)/SE|) where Z~N(0,1)

Large Sample Hypothesis Tests: summary for proportions Single proportion HypothesesTest (level 0.05) H O : true p = kReject H O if |(p-k)/sqrt(p(1-p)/n)|>1.96 H A : p does not = kp-value: 2*Pr(Z>|(p-k)/sqrt(p(1-p)/n)|) where Z~N(0,1) Difference between two proportions HypothesesTest (level 0.05) H O : p 1 -p 2 = 0Let d = p 1 – p 2 H A : p 1 -p 2 does not = 0 Let p = total “success”/(n 1 +n 2 ) Let SE = sqrt(p(1-p)/n 1 + p(1-p)/n 2 ) Reject H O if |(d)/SE|>1.96 p-value: 2*Pr(Z>|(d)/SE|) where Z~N(0,1)

A two sided level  hypothesis test, H 0 :  =k vs H A :  does not equal k is rejected if and only if k is not in a 1-  confidence interval for the mean. A one sided level  hypothesis test, H 0 :  k is rejected if and only if a level 1-2  confidence interval is completely to the left of k. Hypothesis tests versus confidence intervals The following is discussed in the context of tests / CI’s for a single mean, but it’s true for all the confidence intervals / tests we have done.

The previous slide said that confidence intervals can be used to do hypothesis tests. CI’s are “better” since they contain more information. Fact: Hypothesis tests and p-values are very commonly used by scientists who use statistics. Advice: 1.Use confidence intervals to do hypothesis testing 2.know how to compute / and interpret p-values Hypothesis tests versus confidence intervals

Project and Exam 2: Work in groups. (preferably 2 or 3 people) Collect and analyze data. –Analysis should include: summary statistics and good graphical displays. –Also, should include at least one of the following: confidence intervals, hypothesis tests, analysis of variance, regression, contingency tables. In many cases power calculations will be required too (I’ll tell you after reading your proposals). –Write-up: General description of what you are doing and questions the statistics will address. Statistics and interpretation. (<6 pages) One page proposal (who and what) due this Thursday. Project due May 13th. Exam 2 is optional. If you want to take it, contact me to schedule a time.

We talked about Type 1 and Type 2 Errors generally before, now we’ll compute some of the associated probabilities. Truth H 0 True H A True Action Fail to Reject H 0 Reject H 0 correct Type 1 error Type 2 error Significance level =  =Pr(Making type 1 error) Power = 1–Pr(Making type 2 error)

Example: Dietary Folate Data from the Framingham Heart Study n = 333 Elderly Men Mean = x = Std Dev = s = Can we conclude that the mean is greater than 300 at 5% significance? (same as 95% confidence)

In terms of our folate example, suppose we repeated the experiment and sampled 333 new people Pr( Type 1 error ) = Pr( reject H 0 when mean is 300 ) = Pr( |Z| > z ) = Pr( Z > 1.96 ) + Pr( Z < ) = 0.05 =  When mean is 300, then Z, the test statistic, has a standard normal distribution. Note that the test is designed to have type 1 error = 

Power = Pr( reject H 0 when mean is not 300 ) = Pr( reject H 0 when mean is 310) = Pr( |(X-300)/193.4/sqrt(333)| > 1.96) = Pr( (X-300)/10.6 > 1.96 )+Pr( (X-300)/ ) + Pr(X < 279.2) = Pr( (X – 310)/10.6 > ( )/10.6 ) + Pr( (X – 310)/10.6 < ( )/10.6 ) = Pr( Z > 1.02 ) + Pr( Z < ) where Z~N(0,1) = = 0.15 What’s the power to detect an increase of of least 10 units of folate for a new experiment? (using a 2 sided test for a single mean)

In other words, if the true mean is 310 and the standard error is 10.6, then there’s an 85% chance that we will not detect it in a new experiment at the 5% level. If 310 is scientifically significantly different from 300, then this means that our experiment is likely to be wasted.

If all else is held constant, then: As n increases, power goes up. As standard deviation of x decreases, power goes up. As  increases, power goes up.

Picture for Power True Mean Power Power for n=333 and  = 0.05 “Pr(Reject H O when it’s false)” As n increases and/or  increases and/or std dev decreases, these curves become steeper

Power calculations are an integral part of planning any experiment: Given: –a certain level of  –preliminary estimate of std dev (of x’s that go into x) –difference that is of interest Compute required n in order for power to be at least 85% (or some other percentage...)

“Post Hoc Power” Note that it is nonsense to do a power calculation after you have collected the data and conclude that you “would’ve seen significant results if you’d had a larger sample size (or lower standard deviation, etc). The fallacy is you would get the same x-bar if you collected more data. After the experiment, you only know that x-bar is likely to be in its confidence interval. You do not know where!

Power calculations are an integral part of planning any experiment: Bad News: Algebraically messy (but you should know how to do them) Good News: Minitab can be used to do them: Menu: Stat: Power and Sample Size… –Inputs: 1.required power 2.difference of interest –Output: Result = required sample size –Options: Change , one sided versus 2 sided tests

Inference from Small Samples Chapter 10 Data from a manufacturer of child’s pajamas Want to develop materials that take longer before they burn. Run an experiment to compare four types of fabrics. (They considered other factors too, but we’ll only consider the fabrics. Source: Matt Wand)

Fabric B u r n T i m e Fabric Data: Tried to light 4 samples of 4 different (unoccupied!) pajama fabrics on fire. Higher # means less flamable Mean=16.85 std dev=0.94 Mean=10.95 std dev=1.237 Mean=10.50 std dev=1.137 Mean=11.00 std dev=1.299

Confidence Intervals? Suppose we want to make confidence intervals of mean “burn time” for each fabric type. Can I use: x +/- z  /2 s/sqrt(n) for each one? Why or why not?

Answer: t n-1 is the “t distribution” with n-1 degrees of freedom (df) Sample size (n=4) is too small to justify central limit theorem based normal approximation. More precisely: –If x i is normal, then (x –  )/[  /sqrt(n)] is normal for any n. –x i is normal, then (x –  )/[s/sqrt(n)] is normal for n > 30. –New: Suppose x i is approximately normal (and an independent sample). Then (x –  )/[s/sqrt(n)] ~ t n-1 Parameter: (number of data points used to estimate s) - 1

“Student” t-distribution (like a normal distribution, but w/ “heavier tails”) t dist’t with 3df Normal dist’n As df increases, t n-1 becomes the normal dist’n. Indistinguishable for n > 30 or so. Idea: estimating std dev leads to “more variability”. More variability = higher chance of “extreme” observation

t-based confidence intervals 1-  level confidence interval for a mean: x +/- t  /2,n-1 s/sqrt(n) where t  /2,n-1 is a number such that Pr(T > t  /2,n-1 ) =  /2 and T~t n-1 (see table opposite normal table inside of book cover…)

Back to burn time example xst 0.025,3 95% CI Fabric (15.35,18.35) Fabric (8.98, 12.91) Fabric (8.69, 12.31) Fabric (8.93, 13.07)

t-based Hypothesis test for a single mean Mechanics: replace z  /2 cutoff with t  /2,n-1 ex: fabric 1 burn time data H 0 : mean is 15 H A : mean isn’t 15 Test stat: |( )/(0.94/sqrt(4))| = 3.94 Reject at  =5% since 3.94>t 0.025,3 =3.182 P-value = 2*Pr(T>3.94) where T~t 3. This is between 2% and 5% since t 0.025,3 =3.182 and t 0.01,3 = (pvalue=2*0.0146) from software) See minitab: basis statistics: 1 sample t test Idea: t-based tests are harder to pass than large sample normal based test. Why does that make sense?