Understanding Sampling Distributions: Statistics as Random Variables

Slides:



Advertisements
Similar presentations
Chapter 18 Sampling distribution models
Advertisements

Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 18 Sampling Distribution Models.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.
SAMPLING DISTRIBUTIONS Chapter How Likely Are the Possible Values of a Statistic? The Sampling Distribution.
The Diversity of Samples from the Same Population Thought Questions 1.40% of large population disagree with new law. In parts a and b, think about role.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Means and Proportions as Random Variables Chapter 9.
Copyright © 2010 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.
Copyright ©2011 Brooks/Cole, Cengage Learning Understanding Sampling Distributions: Statistics as Random Variables Chapter 9 1.
Chapter 7 Sampling and Sampling Distributions
Sampling Distributions
Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. The Diversity of Samples from the Same Population Chapter 19.
Inferential Statistics
Copyright © 2012 Pearson Education. All rights reserved Copyright © 2012 Pearson Education. All rights reserved. Chapter 10 Sampling Distributions.
Copyright ©2011 Brooks/Cole, Cengage Learning Understanding Sampling Distributions: Statistics as Random Variables Chapter 9 1.
More About Significance Tests
Copyright © 2010 Pearson Education, Inc. Slide
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Statistical Inferences Based on Two Samples Chapter 9.
STA Lecture 161 STA 291 Lecture 16 Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately)
STA291 Statistical Methods Lecture 16. Lecture 15 Review Assume that a school district has 10,000 6th graders. In this district, the average weight of.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Copyright ©2011 Nelson Education Limited The Normal Probability Distribution CHAPTER 6.
Chapter 18: Sampling Distribution Models
Chapter 18: Sampling Distribution Models AP Statistics Unit 5.
Copyright © 2009 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.
1 Chapter 18 Sampling Distribution Models. 2 Suppose we had a barrel of jelly beans … this barrel has 75% red jelly beans and 25% blue jelly beans.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 18 Sampling Distribution Models.
February 2012 Sampling Distribution Models. Drawing Normal Models For cars on I-10 between Kerrville and Junction, it is estimated that 80% are speeding.
Sampling Distribution Models Chapter 18. Toss a penny 20 times and record the number of heads. Calculate the proportion of heads & mark it on the dot.
V. Katch Movement Science Review Application of the Normal Distribution.
Chapter Part 1 Sampling Distribution Models for.
Sample Means & Proportions
Chapter 7: Sampling Distributions Section 7.1 How Likely Are the Possible Values of a Statistic? The Sampling Distribution.
Sampling Distributions Chapter 18. Sampling Distributions A parameter is a measure of the population. This value is typically unknown. (µ, σ, and now.
Chapter 18 Sampling distribution models math2200.
Chapter 18 Sampling Distribution Models *For Means.
1 Probability and Statistics Confidence Intervals.
UNIT 3 YOUR FINAL EXAMINATION STUDY MATERIAL STARTS FROM HERE Copyright ©2011 Brooks/Cole, Cengage Learning 1.
Sampling Distributions Chapter 18. Sampling Distributions If we could take every possible sample of the same size (n) from a population, we would create.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.
Copyright © 2009 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.
Sampling Distributions Chapter 18. Sampling Distributions A parameter is a number that describes the population. In statistical practice, the value of.
Copyright ©2011 Brooks/Cole, Cengage Learning Understanding Sampling Distributions: Statistics as Random Variables UNIT V 1.
Class Six Turn In: Chapter 15: 30, 32, 38, 44, 48, 50 Chapter 17: 28, 38, 44 For Class Seven: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 Read.
Sampling Distributions
Sampling Distributions Chapter 18
Sampling Distributions – Sample Means & Sample Proportions
And distribution of sample means
CHAPTER 12 More About Regression
Ch. 18 – Sampling Distribution Models (Day 1 – Sample Proportions)
Sampling Distribution Models
FINAL EXAMINATION STUDY MATERIAL PART I
LECTURE 24 TUESDAY, 17 November
The Diversity of Samples from the Same Population
Chapter 5 Sampling Distributions
Chapter 18: Sampling Distribution Models
Combining Random Variables
Sampling Distributions and The Central Limit Theorem
Chapter 5 Sampling Distributions
Sampling Distribution Models
Chapter 5 Sampling Distributions
AP Statistics: Chapter 18
Sampling Distribution Models
Sampling Distributions
CHAPTER 12 More About Regression
Confidence Intervals for Proportions
Comparing Two Proportions
Sampling Distributions and The Central Limit Theorem
Stats: Modeling the World
Chapter 5: Sampling Distributions
Presentation transcript:

Understanding Sampling Distributions: Statistics as Random Variables Chapter 5_Part B Understanding Sampling Distributions: Statistics as Random Variables Copyright ©2011 Brooks/Cole, Cengage Learning

EXPECTATIONS SAMPLING DISTRIBUTION FOR ONE SAMPLE PROPORTION; SAMPLING DISTRIBUTION FOR ONE SAMPLE MEAN; Copyright ©2011 Brooks/Cole, Cengage Learning

Parameters, Statistics, and Statistical Inference A statistic is a numerical value computed from a sample. Its value may differ for different samples. e.g. sample mean , sample standard deviation s, and sample proportion . A parameter is a numerical value associated with a population. Considered fixed and unchanging. e.g. population mean m, population standard deviation s, and population proportion p. Copyright ©2011 Brooks/Cole, Cengage Learning

Inferential Statistics Involves Estimation Hypothesis Testing Purpose Make decisions about population characteristics Population? Copyright ©2011 Brooks/Cole, Cengage Learning

Statistical Inference Statistical Inference: making conclusions about population parameters on basis of sample statistics. Two most common procedures: Confidence intervals: an interval of values that the researcher is fairly sure will cover the true, unknown value of the population parameter. Hypothesis tests: uses sample data to attempt to reject a hypothesis about the population. Copyright ©2011 Brooks/Cole, Cengage Learning

Sampling Distribution For One Sample Proportion PROBLEM FORMULATION: SUPPOSE THAT p IS AN UNKNOWN PROPORTION OF ELEMENTS OF A CERTAIN TYPE S IN A POPULATION. EXAMPLES PROPORTION OF LEFT - HANDED PEOPLE; PROPORTION OF HIGH SCHOOL STUDENTS WHO ARE FAILING A READING TEST; PROPORTION OF VOTERS WHO WILL VOTE FOR MR. X. Copyright ©2011 Brooks/Cole, Cengage Learning

Notation Estimating the proportion falling into a category of a categorical variable. Population parameter: p = proportion in the population falling into that category. Sample estimate: = proportion in the sample falling into that category. Copyright ©2011 Brooks/Cole, Cengage Learning

Estimation of p TO ESTIMATE p, WE SELECT A SIMPLE RANDOM SAMPLE (SRS), OF SIZE SAY, n = 1000, AND COMPUTE THE SAMPLE PROPORTION. SUPPOSE THE NUMBER OF THE TYPE WE ARE INTERESTED IN, IN THIS SAMPLE OF n = 1000 IS x = 437. THEN THE SAMPLE PROPORTION IS COMPUTED USING THE FORMULA Copyright ©2011 Brooks/Cole, Cengage Learning

Estimation of p Copyright ©2011 Brooks/Cole, Cengage Learning

WHAT IS THE ERROR OF ESTIMATION? THAT IS, WHAT IS IS THERE A MODEL (PROBABILITY DISTRIBUTION MODEL) THAT CAN HELP US FIND THE BEST ESTIMATE OF THE TRUE PROPORTION OF p? LET’S START THE ANALYSIS BY FIRST ANSWERING THE SECOND QUESTION. Copyright ©2011 Brooks/Cole, Cengage Learning

THE APPROACH   Copyright ©2011 Brooks/Cole, Cengage Learning

Sample Distribution Table Sample of size n X = Number of type S Sample 1 Sample 2 . . . . . . Sample k Copyright ©2011 Brooks/Cole, Cengage Learning

Sampling Distributions Copyright ©2011 Brooks/Cole, Cengage Learning

REMARKS ON OBSERVING THE HISTOGRAM THE HISTOGRAM ABOVE IS AN EXAMPLE OF WHAT WE WOULD GET IF WE COULD SEE ALL THE PROPORTIONS FROM ALL POSSIBLE SAMPLES. THAT DISTRIBUTION HAS A SPECIAL NAME. IT IS CALLED THE SAMPLING DISTRIBUTION OF THE PROPORTIONS. OBSERVE THAT THE HISTOGRAM IS UNIMODAL, ROUGHLY SYMMETRIC, AND IT’S CENTERED AT p. Copyright ©2011 Brooks/Cole, Cengage Learning

Example – Gene For A Disease Suppose (unknown to us) 40% of a population carry the gene for a disease, (p = 0.40). We will take a random sample of 25 people from this population and count X = number with gene. Although we expect (on average) to find 10 people (40%) with the gene, we know the number will vary for different samples of n = 25. In this case, X is a binomial random variable with n = 25 and p = 0.4. Copyright ©2011 Brooks/Cole, Cengage Learning

Many Possible Samples Four possible random samples of 25 people: Note: Sample 1: X =12, proportion with gene =12/25 = 0.48 or 48%. Sample 2: X = 9, proportion with gene = 9/25 = 0.36 or 36%. Sample 3: X = 10, proportion with gene = 10/25 = 0.40 or 40%. Sample 4: X = 7, proportion with gene = 7/25 = 0.28 or 28%. Note: Each sample gave a different answer, which did not always match the population value of p. Although we cannot determine whether one sample will accurately reflect the population, statisticians have determined what to expect for most possible samples. Copyright ©2011 Brooks/Cole, Cengage Learning

Example – Gene For A Disease Copyright ©2011 Brooks/Cole, Cengage Learning

Sampling Distribution For One Sample Proportion Statistics as Random Variables Each new sample taken  value of the sample statistic will change. The distribution of possible values of a statistic for repeated samples of the same size from a population is called the sampling distribution of the statistic. Many statistics of interest have sampling distributions that are approximately normal distributions Copyright ©2011 Brooks/Cole, Cengage Learning

WHAT THEN IS THE APPROPRIATE PROBABILITY MODEL? ANSWER: IT IS AMAZING AND FORTUNATE THAT A NORMAL MODEL IS JUST THE RIGHT ONE FOR THE HISTOGRAMS OF SAMPLE PROPORTIONS. HOW GOOD IS THE NORMAL MODEL? IT IS GOOD IF THE FOLLOWING ASSUMPTIONS AND CONDITIONS HOLD. Copyright ©2011 Brooks/Cole, Cengage Learning

ASSUMPTIONS AND CONDITIONS INDEPENDENCE ASSUMPTION: THE SAMPLED VALUES MUST BE INDEPENDENT OF EACH OTHER. SAMPLE SIZE ASSUMPTION: THE SAMPLE SIZE, n, MUST BE LARGE ENOUGH REMARK: ASSUMPTIONS ARE HARD – OFTEN IMPOSSIBLE TO CHECK. THAT’S WHY WE ASSUME THEM. GLADLY, SOME CONDITIONS MAY PROVIDE INFORMATION ABOUT THE ASSUMPTIONS. Copyright ©2011 Brooks/Cole, Cengage Learning

CONDITIONS RANDOMIZATION CONDITION: THE DATA VALUES MUST BE SAMPLED RANDOMLY. IF POSSIBLE, USE SIMPLE RANDOM SAMPLING DESIGN TO SAMPLE THE POPULATION OF INTEREST. 10% CONDITION: THE SAMPLE SIZE, n, MUST BE NO LARGER THAN 10% OF THE POPULATION OF INTEREST. SUCCESS/FAILURE CONDITION: THE SAMPLE SIZE HAS TO BE BIG ENOUGH SO THAT WE EXPECT AT LEAST 10 SUCCESSES AND AT LEAST 10 FAILLURES. THAT IS, Copyright ©2011 Brooks/Cole, Cengage Learning

Sampling Distribution for a Sample Proportion (The Central Limit Theorem) Let p = population proportion of interest or binomial probability of success. Let = sample proportion or proportion of successes. If numerous random samples or repetitions of the same size n are taken, the distribution of possible values of is approximately a normal curve distribution with Mean = p Standard deviation = s.d.( ) = This approximate distribution is sampling distribution of . Copyright ©2011 Brooks/Cole, Cengage Learning

Estimating the Population Proportion from a Single Sample Proportion In practice, we don’t know the true population proportion p, so we cannot compute the standard deviation of , s.d.( ) = . In practice, we only take one random sample, so we only have one sample proportion . Replacing p with in the standard deviation expression gives us an estimate that is called the standard error of . s.e.( ) = . . Copyright ©2011 Brooks/Cole, Cengage Learning

More Examples for which Rule Applies Election Polls: to estimate proportion who favor a candidate; units = all voters. Television Ratings: to estimate proportion of households watching TV program; units = all households with TV. Consumer Preferences: to estimate proportion of consumers who prefer new recipe compared with old; units = all consumers. Testing ESP: to estimate probability a person can successfully guess which of 5 symbols on a hidden card; repeatable situation = a guess. Copyright ©2011 Brooks/Cole, Cengage Learning

EXAMPLE FROM PRACTICE SHEET ASSUME THAT 30% OF STUDENTS AT A UNIVERSITY WEAR CONTACT LENSES (A) WE RANDOMLY PICK 100 STUDENTS. LET REPRESENT THE PROPORTION OF STUDENTS IN THIS SAMPLE WHO WEAR CONTACTS. WHAT’S THE APPROPRIATE MODEL FOR THE DISTRIBUTION OF ? SPECIFY THE NAME OF THE DISTRIBUTION, THE MEAN, AND THE STANDARD DEVIATION. BE SURE TO VERIFY THAT THE CONDITIONS ARE MET. (B) WHAT’S THE APPROXIMATE PROBABILITY THAT MORE THAN ONE THIRD OF THIS SAMPLE WEAR CONTACTS? Copyright ©2011 Brooks/Cole, Cengage Learning

SOLUTION Copyright ©2011 Brooks/Cole, Cengage Learning

EXAMPLE FROM PRACTICE SHEET INFORMATION ON A PACKET OF SEEDS CLAIMS THAT THE GERMINATION RATE IS 92%. WHAT’S THE PROBABILITY THAT MORE THAN 95% OF THE 160 SEEDS IN THE PACKET WILL GERMINATE? BE SURE TO DISCUSS YOUR ASSUMPTIONS AND CHECK THE CONDITIONS THAT SUPPORT YOUR MODEL. Copyright ©2011 Brooks/Cole, Cengage Learning

SOLUTION Copyright ©2011 Brooks/Cole, Cengage Learning

Sampling Distribution for One Sample Mean Suppose we want to estimate the mean weight loss for all who attend clinic for 10 weeks. Suppose (unknown to us) the distribution of weight loss is approximately N(8 pounds, 5 pounds). We will take a random sample of 25 people from this population and record for each X = weight loss. We know the value of the sample mean will vary for different samples of n = 25. What do we expect those means to be? Copyright ©2011 Brooks/Cole, Cengage Learning

Familiar Examples Estimating the mean of a quantitative variable. Example research questions: What is the mean time that college students watch TV per day? What is the mean pulse rate of women? Population parameter: m = population mean for the variable Sample estimate: = sample mean for the variable Copyright ©2011 Brooks/Cole, Cengage Learning

Many Possible Samples Four possible random samples of 25 people: Note: Sample 1: Mean = 8.32 pounds, standard deviation = 4.74 pounds. Sample 2: Mean = 6.76 pounds, standard deviation = 4.73 pounds. Sample 3: Mean = 8.48 pounds, standard deviation = 5.27 pounds. Sample 4: Mean = 7.16 pounds, standard deviation = 5.93 pounds. Note: Each sample gave a different answer, which did not always match the population mean of 8 pounds. Although we cannot determine whether one sample mean will accurately reflect the population mean, statisticians have determined what to expect for most possible sample means. Copyright ©2011 Brooks/Cole, Cengage Learning

The Normal Curve Approximation Rule for Sample Means (The Central Limit Theorem) Let m = mean for population of interest. Let s = standard deviation for population of interest. Let = sample mean. If numerous random samples of the same size n are taken, the distribution of possible values of is approximately a normal curve distribution with Mean = m Standard deviation = s.d.( ) = This approximate distribution is sampling distribution of . Copyright ©2011 Brooks/Cole, Cengage Learning

Standard Error of the Mean In practice, the population standard deviation s is rarely known, so we cannot compute the standard deviation of , s.d.( ) = . In practice, we only take one random sample, so we only have the sample mean and the sample standard deviation s. Replacing s with s in the standard deviation expression gives us an estimate that is called the standard error of . s.e.( ) = . For a sample of n = 25 weight losses, the standard deviation is s = 4.74 pounds. So the standard error of the mean is 0.948 pounds. Copyright ©2011 Brooks/Cole, Cengage Learning

ASSUMPTIONS AND CONDITIONS INDEPENDENCE ASSUMPTION: THE SAMPLED VALUES MUST BE INDEPENDENT OF EACH OTHER SAMPLE SIZE ASSUMPTION: THE SAMPLE SIZE MUST BE SUFFICIENTLY LARGE. REMARK: WE CANNOT CHECK THESE DIRECTLY, BUT WE CAN THINK ABOUT WHETHER THE INDEPENDENCE ASSUMPTION IS PLAUSIBLE. Copyright ©2011 Brooks/Cole, Cengage Learning

CONDITIONS RANDOMIZATION CONDITION: THE DATA VALUES MUST BE SAMPLED RANDOMLY, OR THE CONCEPT OF A SAMPLING DISTRIBUTION MAKES NO SENSE. IF POSSIBLE, USE SIMPLE RANDOM SAMPLING DESIGN TO ABTAIN THE SAMPLE. 10% CONDITION: WHEN THE SAMPLE IS DRAWN WITHOUT REPLACEMENT (AS IS USUALLY THE CASE), THE SAMPLE SIZE, n, SHOULD BE NO MORE THAN 10% OF THE POPULATION. LARGE ENOUGH SAMPLE CONDITION: IF THE POPULATION IS UNIMODAL AND SYMMETRIC, EVEN A FAIRLY SMALL SAMPLE IS OKAY. IF THE POPULATION IS STRONGLY SKEWED, IT CAN TAKE A PRETTY LARGE SAMPLE TO ALLOW USE OF A NORMAL MODEL TO DESCRIBE THE DISTRIBUTION OF SAMPLE MEANS Copyright ©2011 Brooks/Cole, Cengage Learning

Examples for which Rule Applies Average Weight Loss: to estimate average weight loss; weight assumed bell-shaped; population = all current and potential clients. Average Age At Death: to estimate average age at which left-handed adults (over 50) die; ages at death not bell-shaped so need n  30; population = all left-handed people who live to be at least 50. Average Student Income: to estimate mean monthly income of students at university who work; incomes not bell-shaped and outliers likely, so need large random sample of students; population = all students at university who work. Copyright ©2011 Brooks/Cole, Cengage Learning

EXAMPLES FROM PRACTICE SHEET Copyright ©2011 Brooks/Cole, Cengage Learning

Increasing the Size of the Sample Suppose we take n = 100 people instead of just 25. The standard deviation of the mean would be s.d.( ) = pounds. For samples of n = 25, sample means are likely to range between 8 ± 3 pounds => 5 to 11 pounds. For samples of n = 100, sample means are likely to range only between 8 ± 1.5 pounds => 6.5 to 9.5 pounds. Larger samples tend to result in more accurate estimates of population values than smaller samples. Copyright ©2011 Brooks/Cole, Cengage Learning

The Central Limit Theorem (CLT) The Central Limit Theorem states that if n is sufficiently large, the sample means of random samples from a population with mean m and finite standard deviation s are approximately normally distributed with mean m and standard deviation . Technical Note: The mean and standard deviation given in the CLT hold for any sample size; it is only the “approximately normal” shape that requires n to be sufficiently large. Copyright ©2011 Brooks/Cole, Cengage Learning

Sampling Distribution for Any Statistic Every statistic has a sampling distribution, but the appropriate distribution may not always be normal, or even approximately bell-shaped. Construct an approximate sampling distribution for a statistic by actually taking repeated samples of the same size from a population and constructing a relative frequency histogram for the values of the statistic over the many samples. Copyright ©2011 Brooks/Cole, Cengage Learning