Lecture 3 Preview: Interval Estimates and the Central Limit Theorem Review Populations, Samples, Estimation Procedures, and the Estimate’s Probability.

Slides:



Advertisements
Similar presentations
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Advertisements

© 2011 Pearson Education, Inc
Estimation in Sampling
Chapter 8: Estimating with Confidence
The Diversity of Samples from the Same Population Thought Questions 1.40% of large population disagree with new law. In parts a and b, think about role.
Sampling Distributions (§ )
Introduction to Statistics
McGraw-Hill Ryerson Copyright © 2011 McGraw-Hill Ryerson Limited. Adapted by Peter Au, George Brown College.
Sampling Distributions
CHAPTER 6 Statistical Analysis of Experimental Data
Standard error of estimate & Confidence interval.
June 18, 2008Stat Lecture 11 - Confidence Intervals 1 Introduction to Inference Sampling Distributions, Confidence Intervals and Hypothesis Testing.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 7 Sampling Distributions.
STA Lecture 161 STA 291 Lecture 16 Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately)
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Copyright ©2011 Nelson Education Limited The Normal Probability Distribution CHAPTER 6.
Sampling Distribution ● Tells what values a sample statistic (such as sample proportion) takes and how often it takes those values in repeated sampling.
Chapter 7 Probability and Samples: The Distribution of Sample Means
1 Chapter 7 Sampling Distributions. 2 Chapter Outline  Selecting A Sample  Point Estimation  Introduction to Sampling Distributions  Sampling Distribution.
Chapter 7 Sampling Distributions Statistics for Business (Env) 1.
Lecture 7 Preview: Estimating the Variance of an Estimate’s Probability Distribution Review: Ordinary Least Squares (OLS) Estimation Procedure Importance.
3 Some Key Ingredients for Inferential Statistics.
Sampling distributions rule of thumb…. Some important points about sample distributions… If we obtain a sample that meets the rules of thumb, then…
Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran.
Estimating a Population Mean:  Known
Chapter 7: Sampling Distributions Section 7.1 How Likely Are the Possible Values of a Statistic? The Sampling Distribution.
1 Chapter 9: Sampling Distributions. 2 Activity 9A, pp
Lecture 6 Preview: Ordinary Least Squares Estimation Procedure  The Properties Clint’s Assignment: Assess the Effect of Studying on Quiz Scores General.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Unit 5: Estimating with Confidence Section 11.1 Estimating a Population Mean.
POLS 7000X STATISTICS IN POLITICAL SCIENCE CLASS 5 BROOKLYN COLLEGE-CUNY SHANG E. HA Leon-Guerrero and Frankfort-Nachmias, Essentials of Statistics for.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
Ex St 801 Statistical Methods Inference about a Single Population Mean (CI)
Chapter Eleven Sample Size Determination Chapter Eleven.
1 STAT 500 – Statistics for Managers STAT 500 Statistics for Managers.
MATH Section 4.4.
+ Unit 5: Estimating with Confidence Section 8.3 Estimating a Population Mean.
Sampling Distributions Chapter 9 First, a word from our textbook A statistic is a numerical value computed from a sample. EX. Mean, median, mode, etc.
+ Chapter 8 Estimating with Confidence 8.1Confidence Intervals: The Basics 8.2Estimating a Population Proportion 8.3Estimating a Population Mean.
The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.
6-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
Sampling and Sampling Distributions. Sampling Distribution Basics Sample statistics (the mean and standard deviation are examples) vary from sample to.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
!! DRAFT !! STA 291 Lecture 14, Chap 9 9 Sampling Distributions
Chapter 8: Estimating with Confidence
Understanding Sampling Distributions: Statistics as Random Variables
Normal Distribution and Parameter Estimation
Hypotheses and test procedures
ECO 173 Chapter 10: Introduction to Estimation Lecture 5a
Lecture 2 Preview: Essentials of Probability and Estimation Procedures
Review: Clint’s Dilemma and Estimation Procedures
Lecture 8 Preview: Interval Estimates and Hypothesis Testing
Lecture 18 Preview: Explanatory Variable/Error Term Independence Premise, Consistency, and Instrumental Variables Review Regression Model Standard Ordinary.
ECO 173 Chapter 10: Introduction to Estimation Lecture 5a
Econ 3790: Business and Economics Statistics
MATH 2311 Section 4.4.
Chapter 8: Estimating with Confidence
Sampling Distributions
Sampling Distribution of a Sample Proportion
Chapter 8: Estimating with Confidence
Best Fitting Line Clint’s Assignment Simple Regression Model
Chapter 8: Estimating with Confidence
Sampling Distributions (§ )
Essential Statistics Sampling Distributions
Sampling Distributions
Chapter 8: Estimating with Confidence
Warmup Which of the distributions is an unbiased estimator?
Chapter 8: Estimating with Confidence
MATH 2311 Section 4.4.
Presentation transcript:

Lecture 3 Preview: Interval Estimates and the Central Limit Theorem Review Populations, Samples, Estimation Procedures, and the Estimate’s Probability Distribution Why Is the Mean of the Estimate’s Probability Distribution Important? Why Is the Variance of the Estimate’s Probability Distribution Important? Normal Distribution: A Way to Estimate Probabilities Relative Frequency Interpretation of Probability Random Variables Clint’s Dilemma and His Opinion Poll Interval Estimates Central Limit Theorem Properties of the Normal Distribution Using the Normal Distribution Table: An Example Justifying the Use of the Normal Distribution Normal Distribution’s Rules of Thumb Mean and Variance of the Estimate’s Probability Distribution for a Sample Size of T

Review Populations, Samples, and Estimation Procedures Question: How can we use sample information to draw inferences about a population? Random Variables: Before the experiment is conducted: Bad news. What we do not know: We cannot determine the numerical value of the random variable with certainty before the experiment is conducted. Good news. What we do know: On the other hand, we can often calculate the random variable’s probability distribution telling us how likely it is for the random variable to equal each of its possible numerical values. Relative Frequency Interpretation of Probability: After many, many repetitions of the experiment the distribution of the numerical values from the experiment mirrors the random variable’s probability distribution. The mean reflects the center of the distribution. The variance reflects the spread of the distribution. An example, Clint’s poll: 12 of the 16 individuals polled support Clint.EstFrac =.75 Question: Does this poll definitely prove that Clint is ahead? Answer: No. It is possible for 12 (or more) individuals to support Clint in one poll even when the election is a toss up. Question: How do we describe a distribution? Distribution of the Numerical Values  After many, many repetitions Probability Distribution Answer: Center (Mean) and Spread (Variance)

Opinion Poll: Sample Size Equals T Write the names of every individual in the population on a card. Perform the following procedure T times: Thoroughly shuffle the cards. Randomly draw one card. Ask that individual if he/she supports Clint; the answer determines the numerical value of v t : Replace the card. Calculate the fraction of those polled supporting Clint. Question: What do we know about the v t ’s? From our last class – Sample Size of 2: Mean[v 1 ] = Mean[v 2 ] = p Mean[v t ] = p for each t; that is, Mean[v 1 ] = Mean[v 2 ] = … = Mean[v T ] = p From our last class – Sample Size of 2: Var[v 1 ] = Var[v 2 ] = p(1  p) Var[v t ] = p(1  p) for each t; that is, Var[v 1 ] = Var[v 2 ] = … = Var[v T ] = p(1  p) where T = Sample Size From out last class – Sample Size of 2: v 1 and v 2 are independent; their covariance equals 0 The v t ’s are independent; hence, their covariances equal 0. where p = ActFrac = Actual fraction of the population supporting Clint v t equals 1 if the t th individual polled supports Clint; 0 otherwise. The estimated fraction, EstFrac, is a random variable.

Mean[v t ] = p for each t; that is, Mean[v 1 ] = Mean[v 2 ] = … = Mean[v T ] = p Var[v t ] = p(1  p) for each t; that is, Var[v 1 ] = Var[v 2 ] = … = Var[v T ] = p(1  p) The v t ’s are independent; that is, all their covariances equal 0 where p = ActFrac = Actual fraction of the population supporting Clint Mean[cx] = cMean[x] Mean[x + y] = Mean[x] + Mean[y] How many p terms are there?T Mean[cx] = cMean[x] Mean[x + y] = Mean[x] + Mean[y] Mean[v 1 ] = Mean[v 2 ] = … = Mean[v T ] = p Distribution Center: Mean of the Estimate’s Probability Distribution

Mean[v t ] = p for each t; that is, Mean[v 1 ] = Mean[v 2 ] = … = Mean[v T ] = p Var[v t ] = p(1  p) for each t; that is, Var[v 1 ] = Var[v 2 ] = … = Var[v T ] = p(1  p) The v t ’s are independent; hence, all their covariances equal 0 where p = ActFrac = Actual fraction of the population supporting Clint Var[cx] = c 2 Var[x] Var[x + y] = Var[x] + 2Cov[x, y] + Var[y] How many p(1  p) terms are there? Var[x + y] = Var[x] + Var[y] Summary: T Var[cx] = c 2 Var[x] Var[x + y] = Var[x] + Var[y] Var[v 1 ] = Var[v 2 ] = … = Var[v T ] = p(1  p) Distribution Spread: Variance of the Estimate’s Probability Distribution

Simulations: Confirming the equations. Mean[EstFrac] = ActFrac = p Var[EstFrac] = Mean of Variance of Mean (Average) of Variance of EstFrac’s EstFrac’s Numerical Values Numerical Values Sample Prob Prob Simulation of EstFrac from of EstFrac from Size Dist Dist Repetitions the Experiments the Experiments >1,000,000 .50 .25 >1,000,000 .50 .125 >1,000,000 .50 .01 >1,000,000 .50 .0025 >1,000,000 .50  Two Questions Why is the distribution center (mean) important? Why is the distribution spread (variance) important? Relative Frequency Interpretation of Probability: After many, many repetitions of the experiment, the distribution of the actual numerical values mirrors the probability distribution of the random variable. Both distributions have the same mean and variance.  Lab 3.1 More specifically, Mean[EstFrac] = ActFrac. Why is this important?

Question: Why is the mean of the estimate’s probability distribution important? A mean describes the center of its probability distribution. Mean[EstFrac] = ActFrac Conceptually, an estimation procedure is unbiased whenever it does not systematically underestimate or overestimate the actual population fraction. If the probability distribution is symmetric, we have even more intuition. the chances that the estimated fraction is too low the chances that the estimated fraction is too high equal  Average of the estimate’s numerical values after many, many repetitions Unbiased Estimation Procedure Formally, an estimation procedure is unbiased whenever the mean of the estimated fraction’s probability distribution equals the actual population fraction. Relative Frequency Interpretation of Probability  Lab 3.2 Mean[EstFrac] Probability Distribution of EstFrac ActFrac EstFrac In one poll, So, we have already shown that Clint’s estimation procedure is unbiased. Average of the estimate’s numerical values after many, many repetitions = ActFrac = Now we have some intuition.

Question: Why is the variance of the estimate’s probability distribution important when the estimation procedure is unbiased? Claim: When the estimation procedure is unbiased, the reliability of the estimated fraction depends on the variance of the estimated fraction’s probability distribution. Interval Estimate Question: What is the probability that the estimated fraction from a single poll lies close to the actual value? Small probabilityLarge probability  Estimate is unreliable  Estimate is reliable Decide on a close to criterion:.05 Population Fraction = ActFrac = p Simulations: Percent of Repetitions Sample Variance of Random Simulation in which the Numerical Value of Size Variable EstFrac Repetitions EstFrac Lies between.45 and >1,000,000  39% >1,000,000  69% >1,000,000  95% =.50 Question: After many, many repetitions, how frequently is the estimated fraction are close to, within.05 of, the actual population fraction?  Lab 3.3 Quantifying Reliability: Strategy: A simulation and apply the relative frequency interpretation of probability. Interval Estimate Question: What is the probability that the estimated fraction from a single poll lies close to, within.05 of, the actual value?

Probability that the Numerical Value Sample Variance of EstFrac’s of EstFrac Lies between.45 and.55 Size Probability Distribution in a Single Poll (One Repetition) .39 .69 .95 Interval Estimate Question: What is the probability that the numerical value of the estimated fraction from one repetition of the experiment lies close to, within.05 of, the actual population fraction? ActFrac =.50 Simulations: Percent of Repetitions Sample Variance of EstFrac’s Simulation in which the Numerical Value of Size Probability Distribution Repetitions EstFrac Lies between.45 and >1,000,000  39% >1,000,000  69% >1,000,000  95% Relative Frequency Interpretation of Probability: After many, many repetitions of the experiment, the distribution of the numerical values mirrors the probability distribution. The portion of estimates that lie within.05 of the actual value, between.45 and.55, after many, many repetitions How can we use the simulation results to answer the interval estimate question? equals The probability that the estimate lies within.05 of the actual value, between.45 and.55, in a single poll (one repetition) Reconsider the interval estimate question:

Sample Variance of EstFrac’s In a Single Poll (One Repetition): Size Probability Distribution Prob[.45  Numerical Value .55] .39 .69 .95 Variance LargeVariance Small  Small probability that the numerical value of the estimated fraction, EstFrac, from one repetition of the experiment will be close to the actual population fraction, ActFrac.  Large probability that the numerical value of the estimated fraction, EstFrac, from one repetition of the experiment will be close to the actual population fraction, ActFrac.  Estimate is unreliable  Estimate is reliable Variance largeVariance small Probability Distributions of EstFrac Mean[EstFrac] = ActFrac EstFrac Summary: When the estimation procedure is unbiased, the variance tells us how reliable the estimate is. Generalizing, when an estimation procedure is unbiased:

Sample Size = T = 25 Sample Size = T = 100 Mean[EstFrac] = p Sample Size = T = 400 Mean[EstFrac] = p Strategy for Motivating and Illustrating the Central Limit Theorem: Four Steps Central Limit Theorem Motivation: Role of the Standard Deviation Central Limit Theorem: As the sample size becomes larger and larger, we can use the normal distribution to calculate better and better approximations of interval estimates. Step 2: Use simulations to calculate the percent of repetitions that fall within 1, 2, and 3 standard deviations of Mean[EstFrac], the mean EstFrac’s probability distribution. Step 3: Observe an interesting similarity. Step 4: Introduce the normal distribution and use it to calculate the percent of repetitions that fall within 1, 2, and 3 standard deviations of Mean[EstFrac]. Step 1: Mean, variance, and SD for three sample sizes Step 1: Use the equations to calculate the mean, variance, and standard deviation of EstFrac’s probability distribution for three sample sizes, 25, 100, and 400.

Summary of Mean and SD Calculations Sample Size Mean[EstFrac] SD[EstFrac] Interval: 1 SD From-To Values Percent of Repetitions 69.2% Interval: 2 SD’s From-To Values Percent of Repetitions Interval: 3 SD’s From-To Values Percent of Repetitions % % % % % % % % Question: What do these results suggest? Central Limit Theorem Motivation: Role of the Standard Deviation Central Limit Theorem: As the sample size becomes larger and larger, the normal distribution provides better and better approximations of interval estimates. Step 2: Use simulations to calculate the percent of repetitions that fall within 1, 2, and 3 standard deviations of Mean[EstFrac], the mean EstFrac’s probability distribution. Step 3: Observe an interesting similarity. Answer: The standard deviations, the SD’s, appear to be critical.  Lab 3.4

Normal Distribution: The Famed Bell-Shaped Curve The variable z: the “normalized” value of the random variable. z equals the number of standard deviations the value lies from the random variable’s mean: Normal Distribution Table The row specifies the z value’s whole number and its tenths. For example, suppose that z = 1.53: What is the probability that the random variable would lie more than 1.53 standard deviations above its mean? 1.53 SD’s.0630 Normal Distribution: Three Important Properties The normal distribution is bell shaped. The area beneath the normal curve equals 1. The number in the body of the table estimates the probability that the random variable lies more than z standard deviations above its mean The column the z value’s hundredths. z SD’s Probability of being more than z standard deviations about the distribution mean The normal distribution is symmetric around its mean (center).  Normal Distribution

Normal Distribution Rules of Thumb Standard Deviations within Random Probability of Variable’s Mean being within 1 .68 2 .95 3 >.99 Simulations: Percent of Interval: Repetitions within Interval Standard Deviations within Sample Size Random Variable’s Mean  69.2%  68.5%  68.3% 2  96.3%  95.6%  95.5% 3  99.9%  99.8%  99.7% 68.26% 95.44% 99.74% z z z  ( ) =  ( ) =  ( ) = Normal Distribution Percentages The area beneath the normal curve equals 1. The normal distribution is symmetric around its mean (center).Normal Distribution Summary Central Limit Theorem: As the sample size becomes larger and larger, we can use the normal distribution to calculate better and better approximations of interval estimates.

Revisiting Clint’s Dilemma On the eve of the election, Clint must decide whether or not to hold a pre-election party: If he is comfortably ahead, he will not hold the party; he will save his campaign funds for a future political endeavor (or a trip to Cancun). If he is not comfortably ahead, he will hold the party hoping to capture more votes. There is not enough time to canvas everyone, however. What should he do? Econometrician’s Philosophy: If you lack the information to determine the value directly, estimate the value to the best of your ability using the information you do have. Clint’s Estimation Procedure Questionnaire: Are you voting for Clint? Results: 12 students report that they will vote for Clint and 4 against Clint. Estimated fraction of population supporting Clint Clint uses the information collected from the sample to draw inferences about the entire population. Seventy-five percent,.75, of the sample support Clint. This poll suggests that Clint leads. Question: Should Clint be confident that he has the election in hand or should he fund the party? Procedure: Clint selects 16 students at random. =.75