HW: See Web Project proposal due next Thursday. See web for more detail.

Slides:



Advertisements
Similar presentations
Inferences based on TWO samples
Advertisements

Happiness comes not from material wealth but less desire. 1.
Section 9.3 Inferences About Two Means (Independent)
Business and Economics 9th Edition
Confidence Interval and Hypothesis Testing for:
Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.
Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Significance Tests Chapter 13.
Comparing Two Population Means The Two-Sample T-Test and T-Interval.
Chapter 9: Inferences for Two –Samples
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 7 th Edition Chapter 10 Hypothesis Testing:
Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.
Comparison of 2 or more means ( See Chapter 11) e.g. n=16, df=15, alpha=0.05 t- statistic under H0 are ±2.13 Is  =  0 ? -- consider versus or One sample.
HW: –Due Next Thursday (4/18): –Project proposal due next Thursday (4/18).
6.4 One and Two-Sample Inference for Variances. Example - Problem 26 – Page 435  D. Kim did some crude tensile strength testing on pieces of some nominally.
Where we’ve been & where we’re going We can use data to address following questions: 1.Question:Is a mean = some number? Large sample z-test and CI Small.
Chap 11-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 11 Hypothesis Testing II Statistics for Business and Economics.
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #16.
1/45 Chapter 11 Hypothesis Testing II EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008.
Let sample from N(μ, σ), μ unknown, σ known.
5-3 Inference on the Means of Two Populations, Variances Unknown
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 10-1 Chapter 10 Two-Sample Tests Basic Business Statistics 10 th Edition.
Two Sample Tests Ho Ho Ha Ha TEST FOR EQUAL VARIANCES
On Thursday, John will provide information about the project Next homework: John will announce in class on Thursday, and it’ll be due the following Thursday.
Linear Regression Inference
Inference about means. Central Limit Theorem X 1,X 2,…,X n independent identically distributed random variables with mean μ and variance σ 2, then as.
Education 793 Class Notes T-tests 29 October 2003.
T-distribution & comparison of means Z as test statistic Use a Z-statistic only if you know the population standard deviation (σ). Z-statistic converts.
Laws of Logic and Rules of Evidence Larry Knop Hamilton College.
Ch7 Inference concerning means II Dr. Deshi Ye
More About Significance Tests
Dependent Samples: Hypothesis Test For Hypothesis tests for dependent samples, we 1.list the pairs of data in 2 columns (or rows), 2.take the difference.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Statistical Inferences Based on Two Samples Chapter 9.
1 Design of Engineering Experiments Part 2 – Basic Statistical Concepts Simple comparative experiments –The hypothesis testing framework –The two-sample.
Chapter 9 Hypothesis Testing and Estimation for Two Population Parameters.
10-1 Introduction 10-2 Inference for a Difference in Means of Two Normal Distributions, Variances Known Figure 10-1 Two independent populations.
Hypothesis Testing CSCE 587.
One Sample Inf-1 If sample came from a normal distribution, t has a t-distribution with n-1 degrees of freedom. 1)Symmetric about 0. 2)Looks like a standard.
Week 111 Power of the t-test - Example In a metropolitan area, the concentration of cadmium (Cd) in leaf lettuce was measured in 7 representative gardens.
1 Objective Compare of two matched-paired means using two samples from each population. Hypothesis Tests and Confidence Intervals of two dependent means.
June 25, 2008Stat Lecture 14 - Two Means1 Comparing Means from Two Samples Statistics 111 – Lecture 14 One-Sample Inference for Proportions and.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 9 Inferences Based on Two Samples.
For 95 out of 100 (large) samples, the interval will contain the true population mean. But we don’t know  ?!
Chapter 10 Inferences from Two Samples
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 10 Comparing Two Populations or Groups 10.2.
STATISTICAL INFERENCE PART VIII HYPOTHESIS TESTING - APPLICATIONS – TWO POPULATION TESTS 1.
1 Chapter 9 Inferences from Two Samples 9.2 Inferences About Two Proportions 9.3 Inferences About Two Means (Independent) 9.4 Inferences About Two Means.
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 8 th Edition Chapter 10 Hypothesis Testing:
Confidence Intervals Lecture 3. Confidence Intervals for the Population Mean (or percentage) For studies with large samples, “approximately 95% of the.
Week111 The t distribution Suppose that a SRS of size n is drawn from a N(μ, σ) population. Then the one sample t statistic has a t distribution with n.
Stats Lunch: Day 3 The Basis of Hypothesis Testing w/ Parametric Statistics.
Review: Large Sample Confidence Intervals 1-  confidence interval for a mean: x +/- z  /2 s/sqrt(n) 1-  confidence interval for a proportion: p +/-
- We have samples for each of two conditions. We provide an answer for “Are the two sample means significantly different from each other, or could both.
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
1 Objective Compare of two matched-paired means using two samples from each population. Hypothesis Tests and Confidence Intervals of two dependent means.
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons Business Statistics, 4e by Ken Black Chapter 10 Statistical Inferences about Two.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 1 – Slide 1 of 26 Chapter 11 Section 1 Inference about Two Means: Dependent Samples.
ISMT253a Tutorial 1 By Kris PAN Skewness:  a measure of the asymmetry of the probability distribution of a real-valued random variable 
STATISTICAL INFERENCE PART VI HYPOTHESIS TESTING 1.
MATB344 Applied Statistics I. Experimental Designs for Small Samples II. Statistical Tests of Significance III. Small Sample Test Statistics Chapter 10.
366_7. T-distribution T-test vs. Z-test Z assumes we know, or can calculate the standard error of the distribution of something in a population We never.
Homework and project proposal due next Thursday. Please read chapter 10 too…
Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Difference Between Two Means.
Lecture 8 Estimation and Hypothesis Testing for Two Population Parameters.
Week 101 Test on Pairs of Means – Case I Suppose are iid independent of that are iid. Further, suppose that n 1 and n 2 are large or that are known. We.
Chapter 7 Inference Concerning Populations (Numeric Responses)
Where we’ve been & where we’re going
Towson University - J. Jung
Inferences on Two Samples Summary
Happiness comes not from material wealth but less desire.
Presentation transcript:

HW: See Web Project proposal due next Thursday. See web for more detail.

Inference from Small Samples Chapter 10 Data from a manufacturer of child’s pajamas Want to develop materials that take longer before they burn. Run an experiment to compare four types of fabrics. (They considered other factors too, but we’ll only consider the fabrics. Source: Matt Wand)

Fabric B u r n T i m e Fabric Data: Tried to light 4 samples of 4 different (unoccupied!) pajama fabrics on fire. Higher # means less flamable Mean=16.85 std dev=0.94 Mean=10.95 std dev=1.237 Mean=10.50 std dev=1.137 Mean=11.00 std dev=1.299

Confidence Intervals? Suppose we want to make confidence intervals of mean “burn time” for each fabric type. Can I use: x +/- z  /2 s/sqrt(n) for each one? Why or why not?

Answer: t n-1 is the “t distribution” with n-1 degrees of freedom (df) Sample size (n=4) is too small to justify central limit theorem based normal approximation. More precisely: –If x i is normal, then (x –  )/[  /sqrt(n)] is normal for any n. –x i is normal, then (x –  )/[s/sqrt(n)] is normal for n > 30. –New: Suppose x i is approximately normal (and an independent sample). Then (x –  )/[s/sqrt(n)] ~ t n-1 (number of data points used to estimate x) - 1

What are degrees of freedom? Think of them as a parameter t-distribution has one parameter: df Normal distribution has 2 parameters: mean and variance

“Student” t-distribution (like a normal distribution, but w/ “heavier tails”) t dist’t with 3df Normal dist’n As df increases, t n-1 becomes the normal dist’n. Indistinguishable for n > 30 or so. Idea: estimating std dev leads to “more variability”. More variability = higher chance of “extreme” observation

t-based confidence intervals 1-  level confidence interval for a mean: x +/- t  /2,n-1 s/sqrt(n) where t  /2,n-1 is a number such that Pr(T > t  /2,n-1 ) =  /2 where T~t n-1 (see table opposite normal table inside of book cover…)

Back to burn time example xst 0.025,3 95% CI Fabric (15.35,18.35) Fabric (8.98, 12.91) Fabric (8.69, 12.31) Fabric (8.93, 13.07)

t-based Hypothesis test for a single mean Mechanics: replace z  /2 cutoff with t  /2,n-1 ex: fabric 1 burn time data H 0 : mean is 15 H A : mean isn’t 15 Test stat: |( )/(0.94/sqrt(4))| = 3.94 Reject at  =5% since 3.94>t 0.025,3 =3.182 P-value = 2*Pr(T>3.94) where T~t 3. This is between 2% and 5% since t 0.025,3 =3.182 and t 0.01,3 = (pvalue=2*0.0146) from software) See minitab: basis statistics: 1 sample t test Idea: t-based tests are harder to pass than large sample normal based test. Why does that make sense?

Comparison of 2 means: Example: –Is mean burn time of fabric 2 different from mean burn time of fabric 3? –Why can’t we answer this w/ the hypothesis test: H 0 : mean of fabric 2 = 10.5 H A : mean of fabric 2 doesn’t = 10.5 –What’s the appropriate hypothesis test? x for fabric 3

H 0 : mean fab 2 – mean fab 3 = 0 H A : mean fab 2 – mean fab 3 not = 0 Let’s do this w/ a confidence interval (  =0.05). 95% Large sample CI would be: (x 2 – x 3 ) +/- z  /2 sqrt[s 2 2 /n 2 + s 2 3 /n 3 ] Can’t use this because it will be “too narrow” (i.e. claim 95% CI but actually it’s an 89%...)

CI is based on small sample distribution of difference between means. That distribution is different depending on whether the variances of the two means are approximately equal equal or not Small sample CI: –If var(fabric 2) is approximately = var(fabric 3), then just replace z  /2 with t ,n2+n3-2 df = n2+n3-2 = (n2-1)+(n3-1) This is called “pooling” the variances. –If not, then use software. (Software adjusts the degrees of freedom for an “appoximate” confidence interval.) Rule of thumb: OK if 1/3<(S 2 3 /S 2 2 )<3 More conservative Read section 10.4

Two-sample T for f2 vs f3 N Mean StDev SE Mean f f Difference = mu f2 - mu f3 Estimate for difference: % CI for difference: (-1.606, 2.506) T-Test of difference = 0 (vs not =): T-Value = 0.54 P-Value = DF = 6 Both use Pooled StDev = 1.19 Minitab: Stat: Basic statistics: 2 sample t

Hypothesis test: comparison of 2 means As in the 1 mean case, replace z  /2 with the appropriate “t based” cutoff value. When  2 1 approximately =  2 2 then test statistic is t=|(x 1 –x 2 )+/-sqrt(s 2 1 /n 1 +s 2 2 /n 2 )| Reject if t > t  /2,n1+n2-2 Pvalue = 2*Pr(T > t) where T~t n1+n2-2 For unequal variances, software adjusts df on cutoff.

“Paired T-test” In previous comparison of two means, the data from sample 1 and sample 2 were unrelated. (Fabric 2 and Fabric 3 observations are independent.) Consider following experiment: –“separated identical twins” (adoption) experiments. 15 sets of twins 1 twin raised in city and 1 raised in country measure IQ of each twin want to compare average IQ of people raised in cities versus people raised in the country since twins share common genetic make up, IQs within a pair of twins probably are not independent

Data: One Way of Looking At it Number IQ = city = country City mean = Country mean = The twins are “linked” by these #s country city [1,] [2,] [3,] [4,] [5,] [6,] [7,] [8,] [9,] [10,] [11,] [12,] [13,] [14,] [15,]

country city diff [1,] [2,] [3,] [4,] [5,] [6,] [7,] [8,] [9,] [10,] [11,] [12,] [13,] [14,] [15,] country - city Index IQ Mean difference = (country – city) Of course, Mean difference = mean( country ) - mean( city ) If we want to test “difference = 0”, we need variance of differences.

Paired t-test One twin’s observation is dependent on the other twin’s observation, but the differences are independent across twins. As a result, estimate var(differences) with sample variance of differences. This is not the same as var( city ) + var( country) As a result, we can do an ordinary one sample t-test on the differences. This is called a “paired t-test”. When data naturally come in pairs and the pairs are related, a “paired t-test” is appropriate.

“Paired T-test” Minitab: basic statistics: paired t-test: Paired T for Country - City N Mean StDev SE Mean Country City Difference % CI for mean difference: (-7.71, -0.56) T-Test of mean difference = 0 (vs not = 0): T-Value = P-Value = Compare this to a 2-sample t-test

Compare “Paired T-test” vs “2 sample t-test” Paired T for Country - City N Mean StDev SE Mean Country City Difference % CI for mean difference: (-7.71, -0.56) T-Test of mean difference = 0 (vs not = 0): T-Value = P-Value = Two-sample T for Country vs City N Mean StDev SE Mean Country City Difference = mu Country - mu City Estimate for difference: % CI for difference: (-27.6, 19.3) T-Test of difference = 0 (vs not =): T-Value = P-Value = DF = 28 Both use Pooled StDev = 31.4

Estimate of difference is the same, –but the variance estimate is very different: Paired: std dev(difference) = sample: sqrt[ (31.0^2 /15) + (31.7^2/15) ] = –“cutoff” is different (df) too: t 0.025,13 for paired t 0.025,28 for 2 sample Compare “Paired T-test” vs “2 sample t-test”

Estimate of difference is the same, –but the variance estimate is very different: Paired: std dev(difference) = sample: sqrt[ (31.0^2 /15) + (31.7^2/15) ] = –“cutoff” is different (df) too: t 0.025,13 for paired t 0.025,28 for 2 sample Compare “Paired T-test” vs “2 sample t-test”

Where we’ve been We can use data to address following questions: 1.Question:Is a mean = some number a. Answer:If n>30, large sample “Z” test and confidence interval for means (chapters 8 and 9) b. Answer:If n<=30 and data is approximately normal, then “t” test and confidence intervals for means (chapter 10) 2.Question:Is a proportion = some percentage Answer:If n>30, large sample “Z” test and confidence interval for proportions (chapters 8 and 9) If n<=30, t-test is not appropriate

Where we’ve been 3.Question:Is a difference between two means = some # a. Answer:If n>30 and samples are independent (not paired), large sample “Z” test and confidence interval for means (chapters 8 and 9) b. Answer:If n<=30 and samples are independent (not paired), large sample “t” test and confidence interval for means (chapter 10) c. Answer: If samples are dependent, paired t-test (chap 10) 4.Question:Is a difference between two proportions = some % Answer:If n>30 and samples are independent, “Z” test for proportions (chapters 8 and 9) (no t-test…)