Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Means and Proportions as Random Variables Chapter 9.

Slides:



Advertisements
Similar presentations
Chapter 18 Sampling distribution models
Advertisements

Chapter 6 Sampling and Sampling Distributions
A Sampling Distribution
SAMPLING DISTRIBUTIONS Chapter How Likely Are the Possible Values of a Statistic? The Sampling Distribution.
Sampling Distributions and Sample Proportions
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 9.1 Chapter 9 Sampling Distributions.
1 Chapter 12 Inference About One Population Introduction In this chapter we utilize the approach developed before to describe a population.In.
Statistics and Quantitative Analysis U4320
Sampling Distributions
6-1 Stats Unit 6 Sampling Distributions and Statistical Inference - 1 FPP Chapters 16-18, 20-21, 23 The Law of Averages (Ch 16) Box Models (Ch 16) Sampling.
The Diversity of Samples from the Same Population Thought Questions 1.40% of large population disagree with new law. In parts a and b, think about role.
Copyright © 2010 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.
Math 161 Spring 2008 What Is a Confidence Interval?
Copyright ©2011 Brooks/Cole, Cengage Learning Understanding Sampling Distributions: Statistics as Random Variables Chapter 9 1.
Chapter 7 Sampling and Sampling Distributions
Chapter 9 Chapter 10 Chapter 11 Chapter 12
1 Chapter 12 Inference About a Population 2 Introduction In this chapter we utilize the approach developed before to describe a population.In this chapter.
Sampling Distributions
Copyright © 2014, 2013, 2010 and 2007 Pearson Education, Inc. Chapter Hypothesis Tests Regarding a Parameter 10.
Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson6-1 Lesson 6: Sampling Methods and the Central Limit Theorem.
Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. The Diversity of Samples from the Same Population Chapter 19.
Sampling Theory Determining the distribution of Sample statistics.
Inferential Statistics
Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. Rejecting Chance – Testing Hypotheses in Research Chapter 22.
Statistics for Managers Using Microsoft® Excel 7th Edition
Copyright © 2012 Pearson Education. All rights reserved Copyright © 2012 Pearson Education. All rights reserved. Chapter 10 Sampling Distributions.
AM Recitation 2/10/11.
1 Sampling Distributions Presentation 2 Sampling Distribution of sample proportions Sampling Distribution of sample means.
Sampling Theory Determining the distribution of Sample statistics.
Copyright ©2011 Brooks/Cole, Cengage Learning Understanding Sampling Distributions: Statistics as Random Variables Chapter 9 1.
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved Estimation and Confidence Intervals.
More About Significance Tests
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Statistical Inferences Based on Two Samples Chapter 9.
7-1 Estim Unit 7 Statistical Inference - 1 Estimation FPP Chapters 21,23, Point Estimation Margin of Error Interval Estimation - Confidence Intervals.
Estimates and Sample Sizes Lecture – 7.4
Chapter 8 Confidence Intervals Statistics for Business (ENV) 1.
Rule of sample proportions IF:1.There is a population proportion of interest 2.We have a random sample from the population 3.The sample is large enough.
Chapter 7: Sample Variability Empirical Distribution of Sample Means.
Estimation This is our introduction to the field of inferential statistics. We already know why we want to study samples instead of entire populations,
Copyright © 2012 Pearson Education. All rights reserved © 2010 Pearson Education Copyright © 2012 Pearson Education. All rights reserved. Chapter.
Chapter 7 Sampling Distributions Statistics for Business (Env) 1.
Sampling Distributions. Sampling Distribution Is the Theoretical probability distribution of a sample statistic Is the Theoretical probability distribution.
Inferential Statistics Part 1 Chapter 8 P
V. Katch Movement Science Review Application of the Normal Distribution.
Sample Means & Proportions
Chapter 7: Sampling Distributions Section 7.1 How Likely Are the Possible Values of a Statistic? The Sampling Distribution.
Inference: Probabilities and Distributions Feb , 2012.
Ka-fu Wong © 2003 Chap 6- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.
Copyright © 2009 Pearson Education, Inc. 8.1 Sampling Distributions LEARNING GOAL Understand the fundamental ideas of sampling distributions and how the.
Sampling Distributions Sampling Distributions. Sampling Distribution Introduction In real life calculating parameters of populations is prohibitive because.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
UNIT 3 YOUR FINAL EXAMINATION STUDY MATERIAL STARTS FROM HERE Copyright ©2011 Brooks/Cole, Cengage Learning 1.
Sampling Theory Determining the distribution of Sample statistics.
AP STATS: WARM UP I think that we might need a bit more work with graphing a SAMPLING DISTRIBUTION. 1.) Roll your dice twice (i.e. the sample size is 2,
Review Normal Distributions –Draw a picture. –Convert to standard normal (if necessary) –Use the binomial tables to look up the value. –In the case of.
Chapter 6 Test Review z area ararea ea
Chapter 6 Sampling and Sampling Distributions
Copyright ©2011 Brooks/Cole, Cengage Learning Understanding Sampling Distributions: Statistics as Random Variables UNIT V 1.
Sampling Distributions
FINAL EXAMINATION STUDY MATERIAL PART I
Understanding Sampling Distributions: Statistics as Random Variables
The Diversity of Samples from the Same Population
Chapter 5 Sampling Distributions
Combining Random Variables
Chapter 5: Sampling Distributions
Daniela Stan Raicu School of CTI, DePaul University
Keller: Stats for Mgmt & Econ, 7th Ed Sampling Distributions
Chapter 5: Sampling Distributions
Presentation transcript:

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Means and Proportions as Random Variables Chapter 9

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc Understanding Dissimilarity Among Samples Suppose knew most samples were likely to provide an answer that is within 10% of the population answer. Then would also know the population answer should be within 10% of whatever our specific sample gave. => Have a good guess about the population value based on just the sample value. Key: Need to understand what kind of dissimilarity we should expect to see in various samples from the same population.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 3 Statistics and Parameters A statistic is a numerical value computed from a sample. Its value may differ for different samples. e.g. sample mean, sample standard deviation s, and sample proportion. A parameter is a numerical value associated with a population. Considered fixed and unchanging. e.g. population mean , population standard deviation , and population proportion p.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 4 Sampling Distributions The distribution of possible values of a statistic for repeated samples of the same size from a population is called the sampling distribution of the statistic. Each new sample taken => sample statistic will change. Many statistics of interest have sampling distributions that are approximately normal distributions

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 5 Example 9.1 Mean Hours of Sleep for College Students Survey of n = 190 college students. “How many hours of sleep did you get last night?” Sample mean = 7.1 hours. If we repeatedly took samples of 190 and each time computed the sample mean, the histogram of the resulting sample mean values would look like the histogram at the right:

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc Sampling Distributions for Sample Proportions Suppose (unknown to us) 40% of a population carry the gene for a disease, (p = 0.40). We will take a random sample of 25 people from this population and count X = number with gene. Although we expect (on average) to find 10 people (40%) with the gene, we know the number will vary for different samples of n = 25. In this case, X is a binomial random variable with n = 25 and p = 0.4.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 7 Many Possible Samples Four possible random samples of 25 people: Sample 1: X =12, proportion with gene =12/25 = 0.48 or 48%. Sample 2: X = 9, proportion with gene = 9/25 = 0.36 or 36%. Sample 3: X = 10, proportion with gene = 10/25 = 0.40 or 40%. Sample 4: X = 7, proportion with gene = 7/25 = 0.28 or 28%. Note: Each sample gave a different answer, which did not always match the population value of 40%. Although we cannot determine whether one sample will accurately reflect the population, statisticians have determined what to expect for most possible samples.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 8 The Normal Curve Approximation Rule for Sample Proportions Let p = population proportion of interest or binomial probability of success. Let = sample proportion or proportion of successes. If numerous random samples or repetitions of the same size n are taken, the distribution of possible values of is approximately a normal curve distribution with Mean = p Standard deviation = s.d.( ) = This approximate distribution is sampling distribution of.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 9 The Normal Curve Approximation Rule for Sample Proportions Normal Approximation Rule can be applied in two situations: Situation 1: A random sample is taken from a population. Situation 2: A binomial experiment is repeated numerous times. In each situation, three conditions must be met: Condition 1: The Physical Situation There is an actual population or repeatable situation. Condition 2: Data Collection A random sample is obtained or situation repeated many times. Condition 3: The Size of the Sample or Number of Trials The size of the sample or number of repetitions is relatively large, np and n(1-p) must be at least 5 and preferably at least 10.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 10 Examples for which Rule Applies Election Polls: to estimate proportion who favor a candidate; units = all voters. Television Ratings: to estimate proportion of households watching TV program; units = all households with TV. Consumer Preferences: to estimate proportion of consumers who prefer new recipe compared with old; units = all consumers. Testing ESP: to estimate probability a person can successfully guess which of 5 symbols on a hidden card; repeatable situation = a guess.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 11 Example 9.2 Possible Sample Proportions Favoring a Candidate Suppose 40% all voters favor Candidate X. Pollsters take a sample of n = 2400 voters. Rule states the sample proportion who favor X will have approximately a normal distribution with Histogram at right shows sample proportions resulting from simulating this situation 400 times. mean = p = 0.4 and s.d.( ) =

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 12 s.d.( ) =. Estimating the Population Proportion from a Single Sample Proportion In practice, we don’t know the true population proportion p, so we cannot compute the standard deviation of, In practice, we only take one random sample, so we only have one sample proportion. Replacing p with in the standard deviation expression gives us an estimate that is called the standard error of. s.e.( ) =. If = 0.39 and n = 2400, then the standard error is So the true proportion who support the candidate is almost surely between 0.39 – 3(0.01) = 0.36 and (0.01) = 0.42.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc What to Expect of Sample Means Suppose we want to estimate the mean weight loss for all who attend clinic for 10 weeks. Suppose (unknown to us) the distribution of weight loss is approximately N(8 pounds, 5 pounds). We will take a random sample of 25 people from this population and record for each X = weight loss. We know the value of the sample mean will vary for different samples of n = 25. What do we expect those means to be?

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 14 Many Possible Samples Four possible random samples of 25 people: Sample 1: Mean = 8.32 pounds, standard deviation = 4.74 pounds. Sample 2: Mean = 6.76 pounds, standard deviation = 4.73 pounds. Sample 3: Mean = 8.48 pounds, standard deviation = 5.27 pounds. Sample 4: Mean = 7.16 pounds, standard deviation = 5.93 pounds. Note: Each sample gave a different answer, which did not always match the population mean of 8 pounds. Although we cannot determine whether one sample mean will accurately reflect the population mean, statisticians have determined what to expect for most possible sample means.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 15 The Normal Curve Approximation Rule for Sample Means Let  = mean for population of interest. Let  = standard deviation for population of interest. Let = sample mean. If numerous random samples of the same size n are taken, the distribution of possible values of is approximately a normal curve distribution with Mean =  Standard deviation = s.d.( ) = This approximate distribution is sampling distribution of.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 16 The Normal Curve Approximation Rule for Sample Means Normal Approximation Rule can be applied in two situations: Situation 1: The population of measurements of interest is bell-shaped and a random sample of any size is measured. Situation 2: The population of measurements of interest is not bell-shaped but a large random sample is measured. Note: Difficult to get a Random Sample? Researchers usually willing to use Rule as long as they have a representative sample with no obvious sources of confounding or bias.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 17 Examples for which Rule Applies Average Weight Loss: to estimate average weight loss; weight assumed bell-shaped; population = all current and potential clients. Average Age At Death: to estimate average age at which left-handed adults (over 50) die; ages at death not bell-shaped so need n  30; population = all left-handed people who live to be at least 50. Average Student Income: to estimate mean monthly income of students at university who work; incomes not bell-shaped and outliers likely, so need large random sample of students; population = all students at university who work.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 18 Example 9.4 Hypothetical Mean Weight Loss Suppose the distribution of weight loss is approximately N(8 pounds, 5 pounds) and we will take a random sample of n = 25 clients. Rule states the sample mean weight loss will have a normal distribution with Histogram at right shows sample means resulting from simulating this situation 400 times. mean =  = 8 pounds and s.d.( ) = pound Empirical Rule: It is almost certain that the sample mean will be between 5 and 11 pounds.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 19 s.d.( ) =. Standard Error of the Mean In practice, the population standard deviation  is rarely known, so we cannot compute the standard deviation of, In practice, we only take one random sample, so we only have the sample mean and the sample standard deviation s. Replacing  with s in the standard deviation expression gives us an estimate that is called the standard error of. s.e.( ) =. For a sample of n = 25 weight losses, the standard deviation is s = 4.74 pounds. So the standard error of the mean is pounds.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 20 Increasing the Size of the Sample Suppose we take n = 100 people instead of just 25. The standard deviation of the mean would be For samples of n = 25, sample means are likely to range between 8 ± 3 pounds => 5 to 11 pounds. For samples of n = 100, sample means are likely to range only between 8 ± 1.5 pounds => 6.5 to 9.5 pounds. s.d.( ) = pounds. Larger samples tend to result in more accurate estimates of population values than smaller samples.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 21 Sampling for a Long, Long Time: The Law of Large Numbers LLN: the sample mean will eventually get “close” to the population mean  no matter how small a difference you use to define “close.” LLN = peace of mind to casinos, insurance companies. Eventually, after enough gamblers or customers, the mean net profit will be close to the theoretical mean. Price to pay = must have enough $ on hand to pay the occasional winner or claimant.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc What to Expect in Other Situations: CLT The Central Limit Theorem states that if n is sufficiently large, the sample means of random samples from a population with mean  and finite standard deviation  are approximately normally distributed with mean  and standard deviation. Technical Note: The mean and standard deviation given in the CLT hold for any sample size; it is only the “approximately normal” shape that requires n to be sufficiently large.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 23 Example 9.5 California Decco Winnings California Decco lottery game: mean amount lost per ticket over millions of tickets sold is  = $0.35; standard deviation  = $29.67 => large variability in possible amounts won/lost, from net win of $4999 to net loss of $1. mean (loss) =  = $0.35and s.d.( ) = Empirical Rule: The mean loss is almost surely between $0.08 and $0.62 => total loss for the 100,000 tickets is likely between $8,000 to $62,000! There are better ways to invest $100,000. Suppose store sells 100,000 tickets in a year. CLT => distribution of possible sample mean loss per ticket is approximately normal with …

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc Sampling Distribution for Any Statistic Every statistic has a sampling distribution, but the appropriate distribution may not always be normal, or even approximately bell-shaped. Construct an approximate sampling distribution for a statistic by actually taking repeated samples of the same size from a population and constructing a relative frequency histogram for the values of the statistic over the many samples.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 25 Example 9.6 Winning the Lottery by Betting on Birthdays Pennsylvania Cash 5 lottery game: Select 5 numbers from integers 1 to 39. Grand prize won if match all 5 numbers. One strategy = 5 numbers bet correspond to birth days of month for 5 family members => no chance to win if highest number drawn is 32 to 39. What is the probability of this? Statistic of interest = H = highest of five integers randomly drawn without replacement from 1 to 39. e.g. if numbers selected are 3, 12, 22, 36, 37 then H = 37.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 26 Example 9.6 Winning the Lottery by Betting on Birthdays (cont) Summarized below: value of H for 1560 games. Highest number over 31 occurred in 72% of the games. Most common value of H = 39 in 13.5% of games.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc Standardized Statistics If conditions are met, these standardized statistics have, approximately, a standard normal distribution N(0,1).

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 28 Example 9.7 Unpopular TV Shows Networks cancel shows with low ratings. Ratings based on random sample of households, using the sample proportion watching show as estimate of population proportion p. If p < 0.20, show will be cancelled. Suppose in a random sample of 1600 households, 288 are watching (for proportion of 288/1600 = 0.18). Is it likely to see = 0.18 even if p were 0.20 (or higher)? The sample proportion of 0.18 is about 2 standard deviations below the mean of 0.20.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc Student’s t-Distribution: Replacing  with s If the sample size n is small, this standardized statistic will not have a N(0,1) distribution but rather a t-distribution with n – 1 degrees of freedom (df). Dilemma: we generally don’t know . Using s we have: More on t-distributions and tables of probability areas in Chapters

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 30 Example 9.8 Standardized Mean Weights Claim: mean weight loss is  = 8 pounds. Sample of n =25 people gave a sample mean weight loss of = 8.32 pounds and a sample standard deviation of s = 4.74 pounds. Is the sample mean of 8.32 pounds reasonable to expect if  = 8 pounds? The sample mean of 8.32 is only about one-third of a standard error above 8, which is consistent with a population mean weight loss of 8 pounds.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc Statistical Inference Confidence Intervals: uses sample data to provide an interval of values that the researcher is confident covers the true value for the population. Hypothesis Testing or Significance Testing: uses sample data to attempt to reject the hypothesis that nothing interesting is happening, i.e. to reject the notion that chance alone can explain the sample results.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 32 Case Study 9.1 Do Americans Really Vote When They Say They Do? Election of 1994: Time Magazine Poll: n = 800 adults (two days after election), 56% reported that they had voted. Info from Committee for the Study of the American Electorate: only 39% of American adults had voted. If p = 0.39 then sample proportions for samples of size n = 800 should vary approximately normally with … mean = p = 0.39 and s.d.( ) =

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 33 Case Study 9.1 Do Americans Really Vote When They Say They Do? If respondents were telling the truth, the sample percent should be no higher than 39% + 3(1.7%) = 44.1%, nowhere near the reported percentage of 56%. If 39% of the population voted, the standardized score for the reported value of 56% is … It is virtually impossible to obtain a standardized score of 10.