The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.

Slides:



Advertisements
Similar presentations
Chapter 10: Estimating with Confidence
Advertisements

Introduction to Confidence Intervals using Population Parameters Chapter 10.1 & 10.3.
Statistics and Quantitative Analysis U4320
Introduction to Sampling (Dr. Monticino). Assignment Sheet  Read Chapter 19 carefully  Quiz # 10 over Chapter 19  Assignment # 12 (Due Monday April.
6-1 Stats Unit 6 Sampling Distributions and Statistical Inference - 1 FPP Chapters 16-18, 20-21, 23 The Law of Averages (Ch 16) Box Models (Ch 16) Sampling.
Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. 16 Mathematics of Normal Distributions 16.1Approximately Normal.
Chapter 19 Confidence Intervals for Proportions.
Confidence Intervals for Proportions
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 18, Slide 1 Chapter 18 Confidence Intervals for Proportions.
Confidence Intervals for
Point and Confidence Interval Estimation of a Population Proportion, p
Evaluating Hypotheses
Copyright © 2010 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
Chapter 10: Estimating with Confidence
Inferential Statistics
Chapter 19: Confidence Intervals for Proportions
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 9. Hypothesis Testing I: The Six Steps of Statistical Inference.
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
CHAPTER 8 Estimating with Confidence
7-1 Estim Unit 7 Statistical Inference - 1 Estimation FPP Chapters 21,23, Point Estimation Margin of Error Interval Estimation - Confidence Intervals.
Chapter 8 Introduction to Inference Target Goal: I can calculate the confidence interval for a population Estimating with Confidence 8.1a h.w: pg 481:
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.1 Confidence Intervals: The.
ESTIMATES AND SAMPLE SIZES
16-1 Copyright  2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e Chapter 16 The.
CHAPTER 20: Inference About a Population Proportion ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture.
Section 8.1 Estimating  When  is Known In this section, we develop techniques for estimating the population mean μ using sample data. We assume that.
Making Inferences. Sample Size, Sampling Error, and 95% Confidence Intervals Samples: usually necessary (some exceptions) and don’t need to be huge to.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Section 10.1 Confidence Intervals
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 7-1 Review and Preview.
Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics.
FPP Confidence Interval of a Proportion. Using the sample to learn about the box Box models and CLT assume we know the contents of the box (the.
Copyright © 2010 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
6.1 Inference for a Single Proportion  Statistical confidence  Confidence intervals  How confidence intervals behave.
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 8 First Part.
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 4 First Part.
 The point estimators of population parameters ( and in our case) are random variables and they follow a normal distribution. Their expected values are.
Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics.
Introduction to Confidence Intervals using Population Parameters Chapter 10.1 & 10.3.
Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.
Confidence Interval Estimation For statistical inference in decision making:
Stats Lunch: Day 3 The Basis of Hypothesis Testing w/ Parametric Statistics.
Copyright © 2009 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
Chapter 19 Confidence intervals for proportions
Inference: Probabilities and Distributions Feb , 2012.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. PPSS The situation in a statistical problem is that there is a population of interest, and a quantity or.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
Math 3680 Lecture #15 Confidence Intervals. Review: Suppose that E(X) =  and SD(X) = . Recall the following two facts about the average of n observations.
1 Chapter 8 Interval Estimation. 2 Chapter Outline  Population Mean: Known  Population Mean: Unknown  Population Proportion.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
Sampling Design and Analysis MTH 494 LECTURE-11 Ossam Chohan Assistant Professor CIIT Abbottabad.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
Tests of Significance We use test to determine whether a “prediction” is “true” or “false”. More precisely, a test of significance gets at the question.
The expected value The value of a variable one would “expect” to get. It is also called the (mathematical) expectation, or the mean.
Copyright © 2010 Pearson Education, Inc. Slide
The normal approximation for probability histograms.
Review Statistical inference and test of significance.
Introduction Sample surveys involve chance error. Here we will study how to find the likely size of the chance error in a percentage, for simple random.
10.1 Estimating with Confidence Chapter 10 Introduction to Inference.
Statistics 19 Confidence Intervals for Proportions.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.1 Confidence Intervals: The.
The Law of Averages. What does the law of average say? We know that, from the definition of probability, in the long run the frequency of some event will.
Review Law of averages, expected value and standard error, normal approximation, surveys and sampling.
Chapter 8: Inference for Proportions
Chapter 9 Hypothesis Testing.
Presentation transcript:

The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by box models. We first set up a box, then analyze the sample (the draws). These are all about descriptions. But it is often very useful to turn things around: Analyze the draws, then derive conclusions about the box. This is called inference from the sample to the population.

Introduction Suppose a survey organization wants to know the percentage of Democrats in a certain district. They might estimate it by taking a simple random sample. In general, the percentage of Democrats in the sample will be a good estimate to the percentage of Democrats in the district. Since the sample is chosen at random, it is possible to say how accurate the estimate is likely to be, just from the size and composition of the sample. This technique is one of the key ideas in statistical theory.

Example

We all know that the percentage in the sample is different from the sample in the whole district. There will be something off, the chance error. So the candidate is a little worried about the chance error. Because if it is off by as much as 3%, then he loses. However, the pollster tells the candidate a good news: we can be about 95% confident that we are right to within 2%. It looks good.

Questions We may ask some natural questions about the statement spoken by the pollster: Where does the 95% come from? What does it mean by 95% confidence? Where does the 2% come from? Why is that it is within 2%, not 1% or 3%? Let us go back to the example to see how sampling procedure works.

Example First of all, we need to set up a box model as before: In the district, there are 100,000 eligible voters. So in the box, there are 100,000 tickets. The box is just the population. The simple random sample size is 2,500. So there are 2,500 draws made at random. Once again, it does not matter whether it is with or without replacement. In the box, there are two kinds of tickets: 1’s and 0’s. The 1’s stand for the votes for the candidate, the 0’s stand for the other votes. The sum of draws will be the number of voters in the sample who favor the candidate. This completes the model.

Example

Remarks The calculation shows the candidate will obtain the votes around 53%, give or take 1% or so. Therefore, it is unlikely to be off by as much as 3%----that’s 3 SEs off. So he is well on the safe side of 50%, and he should enter the primary. The bootstrap: When sampling from a 0-1 box whose composition is unknown, the SD of the box can be estimated by substituting the fractions of 0’s and 1’s in the sample for the unknown fractions in the box. The estimate is good when the sample is reasonably large.

Remarks The bootstrap procedure may seem crude. But in fact, even with moderate-sized samples, the fractions among the draws are likely to be quite close to the fractions in the box. One thing should be pointed out here: The expected value for the number of draws is unknown. And so is the expected value for the percentage. Note that in the example, 1,328 voters is an observed value. We cannot compute the exact value of the chance error, but we can compute the likely size of the chance error by computing the SE.

Another Example In fall 2005, a city university had 25,000 registered students. To estimate the percentage who were living at home, a simple random sample of 400 students was drawn. It turned out that 317 of them were living at home. Estimate the percentage of students at the university who were living at home in fall What is the standard error to the estimate?

Solution

Remark In the two examples, we focused on simple random sampling, where the mathematics is easiest. In practice, survey organizations use much complicated designs. But with probability methods, it is generally possible to compute how big the chance error are likely to be. This is one of the great advantages of probability methods for drawing samples.

Confidence intervals In the 1 st example, we still don’t know where 95% comes from. What does it mean by 95% confidence? To make inference about the population percentage from the sample, we introduce the confidence interval to interpret the accuracy.

Introduction In the 2 nd example, we know that the likely size of the chance error is about 2%. We have the equation: sample percentage = population percentage + chance error. But it is still possible that we may get 4% off or 6% off. Then it is 2 SEs or 3 SEs away from the population percentage. In that case, it is less likely to happen, due to the probability. Within 1 SE or 2 SEs, it will be more likely to happen. That is, it is about 68% chance that the interval “sample percentage ± SE” covers the population percentage, or it is about 95% chance that the interval “sample percentage ± 2 SEs” covers the population percentage.

Definition

We can always define a confidence interval with a different confidence level: Indeed, any interval with a confidence level except 100% is possible, by going the right number of SEs in either direction from the sample percentage. For instance: The interval “sample percentage ± 1 SE” is a 68%-confidence interval for the population percentage; The interval “sample percentage ± 2 SEs” is a 95%-confidence interval for the population percentage; The interval “sample percentage ± 3 SEs” is a 99.7%-confidence interval for the population percentage.

Definition

Example A simple random sample of 1,600 persons is taken to estimate the percentage of Democrats among the 25,000 eligible voters in a town. It turns out that 917 people in the sample are Democrats. Q: Find a 95%-confidence interval for the percentage of Democrats among all 25,000 eligible voters.

Solution

The tickets corresponding to Democrats are marked 1, and the others are marked 0. The number of Democrats in the sample is like the sum of draws. This completes the model. Since we don’t know the composition of the box, we have to apply the bootstrap procedure to estimate the SD of the box. By previous calculation, the fraction of 1’s can be estimated by Then the fraction of 0’s can be estimated by

Solution

Remarks Confidence levels are often quoted as being “about” so much. For instance, in the previous problem, we can be about 95% confident that between 54.8% and 59.8% of the eligible voters in this town are Democrats. There are two reasons: (i) The standard errors have been estimated from the data. (ii) The normal approximation has been used. So just imagine that the percentage composition of the population is very close to the sample. And the sample size is large enough so that the data follow the normal curve.

Interpreting a confidence interval

It seems very natural to say “There is a 95% chance that the population percentage is between 75% and 83%.” But this causes a problem. In the frequency theory, the probability represents the percentage of the time that something will happen. The percentage of students who were living at home is fixed. It won’t change no matter how many times we sample the students. So this percentage is either between 75% and 83%, or not. In terms of frequency theory, it must be 100% or 0%.

Interpreting a confidence interval Hence, we don’t say “There is a 95% chance that the population percentage is between 75% and 83%.” We say “We are about 95% confident that the population percentage is between 75% and 83%”. The word “confident” is to remind you that the probability here is in the sampling procedure, not in the parameter. The idea of sampling procedure can be explained in the following way:

Sampling procedure The probability (95%) here is about the sample, and the sample percentage follow the normal distribution. Note that the confidence interval depends on the sample. With some samples, the interval “sample percentage ± 2 SEs” cover the population percentage. But with some other samples, the interval fails to cover. So the confidence level of 95% simply states that: for about 95% of all samples, the interval “sample percentage ± 2 SEs” covers the population percentage.

Sampling procedure We usually cannot tell whether the particular interval covers the population percentage or not. This is because we do not know the actual parameter. But we are using a procedure that generally works about 95% of the time. We may think of the procedure as the interval is drawn at random from a box of intervals, where 95% cover the parameter and only 5% fail.

Remark A confidence interval is used when estimating an unknown parameter from sample data. The interval gives a range for the parameter, and a confidence level that the range covers the true value. Probabilities are used when we reason forward, from the box to the draws. Confidence levels are used when we reason backward, from the draws to the box. The idea for confidence level is a bit difficult, because it involves thinking not only about the actual sample but about other samples that could have been drawn. The following is an illustration:

Interpreting confidence intervals Suppose we want to estimate the percentage of red marbles in a large box. We use a computer to simulate 100 samples. (To complete the model, we set the percentage is 80%, which is in reality unknown to us.) Each sample is of size 2,500. For each sample, we compute the 95%- confidence interval using “sample percentage ± 2 SEs”. The percentage is different from sample to sample, and so it the estimated SE. So the intervals have different centers and lengths. About 95% of them should cover the parameter, which is marked by a vertical line. In fact, there are 96 out of 100 do cover the parameter in this simulation.

Comments for sampling From the last chapter, we see that the conclusion----it is the sample size, not the population size mainly determines the accuracy----holds for most probability methods of drawing samples. We have to point out that the formulas for simple random sampling may not apply to other kinds of samples. This is because the probability for drawing tickets from the box is different from the probability for drawing sample with a complicated method. This can be seen by comparing the Gallup Poll samples to simple random samples of the same size.

The Gallup Poll Here is the table comparing the Gallup Poll with a simple random sample.

The Gallup Poll Most errors were considerably larger than the SE for the simple random samples. One reason is that predictions are based only on part of the sample, namely, those people judged likely to vote. This eliminates about half the sample. Here is the new table for comparison. The simple random sample formula is still not doing well.

The Gallup Poll The reasons are that: (i) The process used to screen out the non-voters may break down at times. (ii) Some voters may still not have decided how to vote when they are interviewed. (iii) Voters may change their minds between the last pre-election poll and election day, especially in close contests. As a result, in reality survey organizations have to use more complicated methods for estimating the SE.

Summary With a simple random sample, the sample percentage is used to estimate the population percentage. The bootstrap procedure: when sampling from a 0-1 box whose composition is unknown, the SD of the box can be estimated by substituting the fractions of 0’s and 1’s in the sample for the unknown fractions in the box. A confidence interval for the population percentage is obtained by going the right number of SEs either way from the sample percentage. The confidence level is read off the normal curve. This method should only be used with large samples.

Summary In the frequency theory of probability, parameters are not subject to chance variation. We use confidence statements instead of probability statements. The formulas for simple random sampling may not apply to other kinds of samples, even if the samples are drawn by probability methods.