June 18, 2008Stat 111 - Lecture 11 - Confidence Intervals 1 Introduction to Inference Sampling Distributions, Confidence Intervals and Hypothesis Testing.

Slides:



Advertisements
Similar presentations
Inference on Proportions. What are the steps for performing a confidence interval? 1.Assumptions 2.Calculations 3.Conclusion.
Advertisements

June 9, 2008Stat Lecture 8 - Sampling Distributions 1 Introduction to Inference Sampling Distributions Statistics Lecture 8.
Hypothesis Testing A hypothesis is a claim or statement about a property of a population (in our case, about the mean or a proportion of the population)
Inference Sampling distributions Hypothesis testing.
Lecture 4 Chapter 11 wrap-up
Business Statistics for Managerial Decision
Lecture Inference for a population mean when the stdev is unknown; one more example 12.3 Testing a population variance 12.4 Testing a population.
4-1 Statistical Inference The field of statistical inference consists of those methods used to make decisions or draw conclusions about a population.
Sampling Distributions
Inference about a Mean Part II
Inferences About Process Quality
Chapter 9 Hypothesis Testing.
June 23, 2008Stat Lecture 13 - One Mean1 Inference for a Population Mean Confidence Intervals and Tests with unknown variance and Two- sample Tests.
June 19, 2008Stat Lecture 12 - Testing 21 Introduction to Inference More on Hypothesis Tests Statistics Lecture 12.
Overview Definition Hypothesis
Confidence Intervals and Hypothesis Testing - II
Fundamentals of Hypothesis Testing: One-Sample Tests
Inference for One-Sample Means
Section 10.1 ~ t Distribution for Inferences about a Mean Introduction to Probability and Statistics Ms. Young.
More About Significance Tests
Lecture 3: Review Review of Point and Interval Estimators
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Statistical Inferences Based on Two Samples Chapter 9.
1 Power and Sample Size in Testing One Mean. 2 Type I & Type II Error Type I Error: reject the null hypothesis when it is true. The probability of a Type.
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
LECTURE 19 THURSDAY, 14 April STA 291 Spring
Section 8.1 Estimating  When  is Known In this section, we develop techniques for estimating the population mean μ using sample data. We assume that.
June 25, 2008Stat Lecture 14 - Two Means1 Comparing Means from Two Samples Statistics 111 – Lecture 14 One-Sample Inference for Proportions and.
Confidence intervals are one of the two most common types of statistical inference. Use a confidence interval when your goal is to estimate a population.
Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed.
The Practice of Statistics Third Edition Chapter 10: Estimating with Confidence Copyright © 2008 by W. H. Freeman & Company Daniel S. Yates.
Chapter 20 Testing hypotheses about proportions
Lecture 16 Dustin Lueker.  Charlie claims that the average commute of his coworkers is 15 miles. Stu believes it is greater than that so he decides to.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Chapter 23 Inference for One- Sample Means. Steps for doing a confidence interval: 1)State the parameter 2)Conditions 1) The sample should be chosen randomly.
Introduction to Inferece BPS chapter 14 © 2010 W.H. Freeman and Company.
The z test statistic & two-sided tests Section
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
Statistics 101 Chapter 10 Section 2. How to run a significance test Step 1: Identify the population of interest and the parameter you want to draw conclusions.
Introduction to the Practice of Statistics Fifth Edition Chapter 6: Introduction to Inference Copyright © 2005 by W. H. Freeman and Company David S. Moore.
Lecture 18 Dustin Lueker.  A way of statistically testing a hypothesis by comparing the data to values predicted by the hypothesis ◦ Data that fall far.
Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall 9-1 σ σ.
Section A Confidence Interval for the Difference of Two Proportions Objectives: 1.To find the mean and standard error of the sampling distribution.
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 8 First Part.
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 4 First Part.
Lecture 17 Dustin Lueker.  A way of statistically testing a hypothesis by comparing the data to values predicted by the hypothesis ◦ Data that fall far.
Unit 8 Section 8-3 – Day : P-Value Method for Hypothesis Testing  Instead of giving an α value, some statistical situations might alternatively.
Chapter 8 Parameter Estimates and Hypothesis Testing.
Chap 8-1 Fundamentals of Hypothesis Testing: One-Sample Tests.
MeanVariance Sample Population Size n N IME 301. b = is a random value = is probability means For example: IME 301 Also: For example means Then from standard.
Stats Lunch: Day 3 The Basis of Hypothesis Testing w/ Parametric Statistics.
26134 Business Statistics Tutorial 11: Hypothesis Testing Introduction: Key concepts in this tutorial are listed below 1. Difference.
AP Statistics Section 11.1 B More on Significance Tests.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
June 30, 2008Stat Lecture 16 - Regression1 Inference for relationships between variables Statistics Lecture 16.
© Copyright McGraw-Hill 2004
Applied Quantitative Analysis and Practices LECTURE#14 By Dr. Osman Sadiq Paracha.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.1 Confidence Intervals: The.
16/23/2016Inference about µ1 Chapter 17 Inference about a Population Mean.
4-1 Statistical Inference Statistical inference is to make decisions or draw conclusions about a population using the information contained in a sample.
+ Homework 9.1:1-8, 21 & 22 Reading Guide 9.2 Section 9.1 Significance Tests: The Basics.
9.3 Hypothesis Tests for Population Proportions
Introduction to Inference
Chapter 9 Hypothesis Testing.
Significance Test for the Difference of Two Proportions
Comparing Means from Two Samples
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Problems: Q&A chapter 6, problems Chapter 6:
STA 291 Spring 2008 Lecture 18 Dustin Lueker.
Presentation transcript:

June 18, 2008Stat Lecture 11 - Confidence Intervals 1 Introduction to Inference Sampling Distributions, Confidence Intervals and Hypothesis Testing Statistics Lecture 11

June 18, 2008Stat Lecture 11 - Confidence Intervals 2 Administrative Notes No homework due Monday Homework 4 will be due next Wednesday June 24

June 18, 2008Stat Lecture 11 - Confidence Intervals 3 Sampling Distribution of Sample Mean Distribution of values taken by statistic in all possible samples of size n from the same population Assume: observations are independent and sampled from a population with mean  and variance  2 Population Parameters:  and  2 Sample 1 of size n Sample 2 of size n Sample 3 of size n Sample 4 of size n Sample 5 of size n Sample 6 of size n Sample 7 of size n Sample 8 of size n. Distribution of these values?

June 18, 2008Stat Lecture 11 - Confidence Intervals 4 Sampling Distribution of Sample Mean The center of the sampling distribution of the sample mean is the population mean: Over all samples, the sample mean will, on average, be equal to the population mean (no guarantees for 1 sample!) The spread of the sampling distribution of the sample mean is As sample size increases, variance of the sample mean decreases! Central Limit Theorem: if the sample size is large enough, then the sample mean has an approximately Normal distribution This is true no matter what the shape of the distribution of the original data!

June 18, 2008Stat Lecture 11 - Confidence Intervals 5 Confidence Intervals Sample mean is the best estimate of  However, we realize that the sample mean is probably not exactly equal to population mean, and that we would get a different value of the sample mean in another sample Solution is to use our sample mean as the center of an entire interval of likely values for our population mean  Population Sample Parameter:  Statistic: Sampling Inference Estimation ?

June 18, 2008Stat Lecture 11 - Confidence Intervals 6 Example: Chips Ahoy Nabisco guarantees that each 18 oz bag of Chips Ahoy contains at least 1,000 chips. In late 1990’s, company challenged students to confirm their claim Air Force Academy administered the study by sampling 42 bags from 275 sent by company

June 18, 2008Stat Lecture 11 - Confidence Intervals 7 Example: Chips Ahoy Dataset: number of chips per bag in 42 sampled bags What can we can we say about the average number of chips  in the “population” of all bags produced? Our sample mean is 1216 chips per bag, but that is probably not exactly equal to our population mean We need to use our sample mean to calculate an interval of likely values for our population mean

June 18, 2008Stat Lecture 11 - Confidence Intervals 8 Sampling Dist. for Sample Mean We assume that each observation is an independent sample from the population with unknown mean  We also assume (for now) that the population variance  2 is known to be equal to s 2 = (117.5) 2 From last class, we know that the sampling distribution of the sample mean is Normal:

June 18, 2008Stat Lecture 11 - Confidence Intervals 9 Confidence Interval for  From the rule, we know that 95% of sample means should be within 2 SDs (36.2 chips) of the population mean  So 95% of the time (or in 95% of the samples), the population mean  should be contained in the interval ( , ) = (1225.3,1297.7) We call this a 95% confidence interval for 

June 18, 2008Stat Lecture 11 - Confidence Intervals 10 General Confidence Intervals for  The 95% is called the confidence level of the interval More generally, we can make an interval with any confidence level we want. The general formula for a 100·C % confidence interval for population mean  is: is called the critical value of the interval. It is the value from a standard normal table that gives you a tail probability of (1-C)/2. SD( )

June 18, 2008Stat Lecture 11 - Confidence Intervals 11 Example of Critical Values 95% interval means that C=0.95 so 1-C = 0.05 So we need a value Z * that gives us a tail probability (area under curve) of (1-C)/2 = Looking at Standard Normal Table, we see that Z * = 1.96 In previous example, we rounded Z * to 2

June 18, 2008Stat Lecture 11 - Confidence Intervals 12 Interpretation of confidence intervals If many different samples from same population were collected (each giving us a 95% confidence interval for  ), then 95% of these intervals should contain the true population mean  Note that the interval for any one sample may not contain  ! All confidence intervals have the same form: Estimate ± Margin of Error For population means, the estimate is the sample mean and the margin of error is Z * ·SD( ) where critical value Z * is determined by our confidence level

June 18, 2008Stat Lecture 11 - Confidence Intervals 13 Margin of Error and Sample Size Before we sample our Chips Ahoy cookies bags, we want to decide the minimum number of bags needed for a certain margin of error (saves on cookies!) Confidence intervals for population mean  have a margin of errorwhich means If we want a confidence level of 95% (so Z * =1.96) and we want a margin of error less than 100 chips, then so we need to sample at least 6 bags of chips.

June 18, 2008Stat Lecture 11 - Confidence Intervals 14 Sampling Distribution for Proportion We also want to calculate confidence intervals for a population proportion p. From last class, we know the sampling distribution of the sample proportion is also Normal:

June 18, 2008Stat Lecture 11 - Confidence Intervals 15 Confidence Intervals for Proportions Based on the sampling distribution, our confidence interval for the population proportion p is However, this interval is not very useful, since it still depends on the unknown population proportion p. We fix this by using our sample proportion in place of p, so our 100·C% confidence interval for p is:

June 18, 2008Stat Lecture 11 - Confidence Intervals 16 Example: Gallup Poll Sample of 2000 people asked if they will vote for McCain. Had a sample proportion of 0.51 for “yes.” What is a 95% confidence interval for the true population proportion p of McCain voters? In this example, n = 2000, = 0.51 and we use Z * =1.96 so our 95% confidence interval for p is: Interval contains values on both sides of 0.5, so we are not confident that McCain will win the election!

June 18, 2008Stat Lecture 11 - Confidence Intervals 17 Note of Caution All of these confidence intervals were calculated using the normal distribution, which is justified by the central limit theorem However, by doing this, we are making the assumptions that the sample size is large and the population variance is known! We will see later how to calculate confidence intervals when these assumptions are not true

June 18, 2008Stat Lecture 11 - Hyp. Tests18 Hypothesis Testing Now use our sampling distribution results for a different type of inference: –testing a specific hypothesis In some problems, we are not interested in calculating a confidence interval, but rather we want to see whether our data confirm a specific hypothesis This type of inference is sometimes called statistical decision making, but the more common term is hypothesis testing

June 18, 2008Stat Lecture 11 - Hyp. Tests19 Example: Blackout Baby Boom New York City experienced a major blackout on November 9, 1965 many people were trapped for hours in the dark and on subways, in elevators, etc. Nine months afterwards (August 10, 1966), the NY Times claimed that the number of births were way up They attributed the increased births to the blackout, and this has since become urban legend! Does the data actually support the claim of the NY Times? Using data, we will test the hypothesis that the birth rate in August 1966 was different than the usual birth rate

June 18, 2008Stat Lecture 11 - Hyp. Tests20 Number of Births in NYC, August 1966 SunMonTueWedThuFriSat We want to test this data against the usual birth rate in NYC, which is 430 births/day First two weeks

June 18, 2008Stat Lecture 11 - Hyp. Tests21 Steps for Hypothesis Testing 1.Formulate your hypotheses: Need a Null Hypothesis and an Alternative Hypothesis 2.Calculate the test statistic: Test statistic summarizes the difference between data and your null hypothesis 3.Find the p-value for the test statistic: How probable is your data if the null hypothesis is true?

June 18, 2008Stat Lecture 11 - Hyp. Tests22 Null and Alternative Hypotheses Null Hypothesis (H 0 ) is (usually) an assumption that there is no effect or no change in the population Alternative hypothesis (H a ) states that there is a real difference or real change in the population If the null hypothesis is true, there should be little discrepancy between the observed data and the null hypothesis If we find there is a large discrepancy, then we will reject the null hypothesis Both hypotheses are expressed in terms of different values for population parameters

June 18, 2008Stat Lecture 11 - Hyp. Tests23 Example: NYC blackout and birth rates Let  be the mean birth rate in August 1966 Null Hypothesis: Blackout has no effect on birth rate, so August 1966 should be the same as any other month H 0 :  = 430 (usual birth rate) Alternative Hypothesis: Blackout did have an effect on the birth rate H a :   430 This is a two-sided alternative, which means that we are considering a change in either direction We could instead use a one-sided alternative that only considers changes in one direction Eg. only alternative is an increase in birth rate H a :  >430

June 18, 2008Stat Lecture 11 - Hyp. Tests24 Test Statistic Now that we have a null hypothesis, we can calculate a test statistic The test statistic measures the difference between the observed data and the null hypothesis Specifically, the test statistic answers the question: “How many standard deviations is our observed sample value from the hypothesized value?” For our birth rate dataset, the observed sample mean is and our hypothesized mean is 430 To calculate the test statistic, we need the standard deviation of our sample mean

June 18, 2008Stat Lecture 11 - Hyp. Tests25 Sampling Distribution of Sample Mean The center of the sampling distribution of the sample mean is the population mean: The spread of the sampling distribution of the sample mean is Central Limit Theorem: if the sample size is large enough, then the sample mean has an approximately Normal distribution

June 18, 2008Stat Lecture 11 - Hyp. Tests26 Test Statistic for Sample Mean Sample mean has a standard deviation of so our test statistic T is: T is the number of standard deviations between our sample mean and the hypothesized mean  0 is the notation we use for our hypothesized mean To calculate our test statistic T, we need to know the population standard deviation  For now we will make the assumption that  is the same as our sample standard deviation s Later, we will correct this assumption!

June 18, 2008Stat Lecture 11 - Hyp. Tests27 Test Statistic for Birth Rate Example For our NYC births/day example, we have a sample mean of 433.6, a hypothesized mean of 430 and a sample standard deviation of 39.4 Our test statistic is: So, our sample mean is standard deviations different from what it should be if there was no blackout effect Is this difference statistically significant?

June 18, 2008Stat Lecture 11 - Hyp. Tests28 Probability values (p-values) Assuming the null hypothesis is true, the p- value is the probability we get a value as far from the hypothesized value as our observed sample value The smaller the p-value is, the more unrealistic our null hypothesis appears For our NYC birth-rate example, T=0.342 Assuming our population mean really is 430, what is the probability that we get a test statistic of or greater?

June 18, 2008Stat Lecture 11 - Hyp. Tests29 Calculating p-values To calculate the p-value, we use the fact that the sample mean has a normal distribution Under the null hypothesis, the sample mean has a normal distribution with mean  0 and standard deviation so the test statistic: has a standard normal distribution!

June 18, 2008Stat Lecture 11 - Hyp. Tests30 p-value for NYC dataset If our alternative hypothesis was one-sided (H a :  >430), then our p-value would be Since are alternative hypothesis was two-sided our p- value is the sum of both tail probabilities p-value = = T = 0.342T = prob = 0.367

June 18, 2008Stat Lecture 11 - Hyp. Tests31 Statistical Significance If the p-value is smaller than , we say the data are statistically significant at level  The most common  -level to use is  = 0.05 Next class, we will see that this relates to 95% confidence intervals! The  -level is used as a threshold for rejecting the null hypothesis If the p-value < , we reject the null hypothesis that there is no change or difference

June 18, 2008Stat Lecture 11 - Hyp. Tests32 Conclusions for NYC birth-rate data The p-value = for the NYC birth-rate data, so we can clearly not reject the null hypothesis at  -level of 0.05 Another way of saying this is that the difference between null hypothesis and our data is not statistically significant So, we conclude that the data do not support the idea that there was a different birth rate than usual for the first two weeks of August, No blackout baby boom effect!

June 18, 2008Stat Lecture 11 - Hyp. Tests33 Next Class - Lecture 12 More Hypothesis Testing! Moore, McCabe and Craig: Section