Stats 120A Review of CIs, hypothesis tests and more.

Slides:



Advertisements
Similar presentations
Chapter 18 Sampling distribution models
Advertisements

Mean, Proportion, CLT Bootstrap
Sampling Distributions Welcome to inference!!!! Chapter 9.
Estimation in Sampling
Confidence Intervals for Proportions
Math 161 Spring 2008 What Is a Confidence Interval?
BCOR 1020 Business Statistics Lecture 18 – March 20, 2008.
Sampling Distributions & Point Estimation. Questions What is a sampling distribution? What is the standard error? What is the principle of maximum likelihood?
Standard error of estimate & Confidence interval.
Review of normal distribution. Exercise Solution.
10.3 Estimating a Population Proportion
POSC 202A: Lecture 9 Lecture: statistical significance.
Estimation of Various Population Parameters Point Estimation and Confidence Intervals Dr. M. H. Rahbar Professor of Biostatistics Department of Epidemiology.
ESTIMATING with confidence. Confidence INterval A confidence interval gives an estimated range of values which is likely to include an unknown population.
Many times in statistical analysis, we do not know the TRUE mean of a population of interest. This is why we use sampling to be able to generalize the.
Albert Morlan Caitrin Carroll Savannah Andrews Richard Saney.
The Standardized Normal Distribution Z is N( 0, 1 2 ) The standardized normal X is N( ,  2 ) 1.For comparison of several different normal distributions.
AP Statistics Chapter 9 Notes.
Many times in statistical analysis, we do not know the TRUE mean of a population of interest. This is why we use sampling to be able to generalize the.
Lesson Confidence Intervals: The Basics. Knowledge Objectives List the six basic steps in the reasoning of statistical estimation. Distinguish.
8.2 Estimating Population Means LEARNING GOAL Learn to estimate population means and compute the associated margins of error and confidence intervals.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Estimation: Sampling Distribution
Chapter 8 Confidence Intervals Statistics for Business (ENV) 1.
Sampling Distribution ● Tells what values a sample statistic (such as sample proportion) takes and how often it takes those values in repeated sampling.
Stat 13, Tue 5/8/ Collect HW Central limit theorem. 3. CLT for 0-1 events. 4. Examples. 5.  versus  /√n. 6. Assumptions. Read ch. 5 and 6.
1 Lecture 3: Introduction to Confidence Intervals Social Science Statistics I Gwilym Pryce
Chapter 9 Sampling Distributions AP Statistics St. Francis High School Fr. Chris, 2001.
Section 2 Part 2.   Population - entire group of people or items for which we are collecting data  Sample – selections of the population that is used.
Chapter 10 – Sampling Distributions Math 22 Introductory Statistics.
STA Lecture 181 STA 291 Lecture 18 Exam II Next Tuesday 5-7pm Memorial Hall (Same place) Makeup Exam 7:15pm – 9:15pm Location TBA.
Confidence Intervals for Proportions
Determination of Sample Size: A Review of Statistical Theory
Sampling Distribution Models Chapter 18. Toss a penny 20 times and record the number of heads. Calculate the proportion of heads & mark it on the dot.
Sampling Distribution WELCOME to INFERENTIAL STATISTICS.
FPP Confidence Interval of a Proportion. Using the sample to learn about the box Box models and CLT assume we know the contents of the box (the.
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 8 First Part.
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 4 First Part.
Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran.
Confidence intervals. Want to estimate parameters such as  (population mean) or p (population proportion) Obtain a SRS and use our estimators, and Even.
Section Estimating a Proportion with Confidence Objectives: 1.To find a confidence interval graphically 2.Understand a confidence interval as consisting.
Chapter 19 Confidence intervals for proportions
AP Statistics Unit 5 Addie Lunn, Taylor Lyon, Caroline Resetar.
Introduction to Inference Sampling Distributions.
8.1 Estimating µ with large samples Large sample: n > 30 Error of estimate – the magnitude of the difference between the point estimate and the true parameter.
Inference for Proportions Section Starter Do dogs who are house pets have higher cholesterol than dogs who live in a research clinic? A.
INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE.
Two-Sample Proportions Inference. Sampling Distributions for the difference in proportions When tossing pennies, the probability of the coin landing.
Confidence Intervals INTRO. Confidence Intervals Brief review of sampling. Brief review of the Central Limit Theorem. How do CIs work? Why do we use CIs?
1 Probability and Statistics Confidence Intervals.
10.1 – Estimating with Confidence. Recall: The Law of Large Numbers says the sample mean from a large SRS will be close to the unknown population mean.
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example: In a recent poll, 70% of 1501 randomly selected adults said they believed.
Uncertainty and confidence Although the sample mean,, is a unique number for any particular sample, if you pick a different sample you will probably get.
1 STAT 500 – Statistics for Managers STAT 500 Statistics for Managers.
Many times in statistical analysis, we do not know the TRUE mean of a population on interest. This is why we use sampling to be able to generalize the.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
Sampling Distributions Chapter 18. Sampling Distributions A parameter is a number that describes the population. In statistical practice, the value of.
Chapter 6 Inferences Based on a Single Sample: Estimation with Confidence Intervals Slides for Optional Sections Section 7.5 Finite Population Correction.
Inference: Conclusion with Confidence
Estimating the Population Mean Income of Lexus Owners
Week 10 Chapter 16. Confidence Intervals for Proportions
Chapter 7 Sampling Distributions.
Econ 3790: Business and Economics Statistics
Chapter 8: Estimating with Confidence
Section 10.1: Confidence Intervals
Chapter 7 Sampling Distributions.
Chapter 7 Sampling Distributions.
Sampling Distribution Models
Sampling Distributions (§ )
Chapter 7 Sampling Distributions.
Presentation transcript:

Stats 120A Review of CIs, hypothesis tests and more

Sample/Population Last time we collected height/armspan data. Is this a sample or a population?

Gallup Poll, 1/9/07 "As you may know, the Bush administration is considering a temporary but significant increase in the number of U.S. troops in Iraq to help stabilize the situation there. Would you favor or oppose this?"

Results Results based on 1004 randomly selected adults (> 18 years) interviewed Jan 5-7, % are opposed. "For results based on this sample, one can say with 95% confidence that the maximum error attributable to sampling and other random effects is ±3 percentage points. "

Pop Quiz Is the value 61% a statistic or a parameter? The margin of error is given as 3%. What does the margin of error measure? a) the variability in the sample b) the variability in the population c) the variability in repeated sampling

Sampling paradigm In the U.S., the proportion of adults who are opposed to a surge is p, (or p*100%). We take a random sample of n = The proportion of our sample ("p hat") is an estimate of the proportion in the population.

A simulation: Choose a value to serve as p (say p =.6) Our "data" consist of 1004 numbers: 0's represent those in favor, 1's are those opposed. x = 589 out of 1004 say "opposed", so p-hat = 589/1004 =.5866 mean(x) =.5866 sd(x) =.4926

xbar=.5866, s =.493

How do we know sample proportion is a good estimate of population proportion? Law of Large Numbers: sample averages (and proportions) converge on population values implying that for finite values, the sample proportion might be close if the sample size is large

Coin flips: sample proportion "settles down" to 0.5

So if we stop earlier, say n = 10 p-hat =.60

Which raises the question: If we stop early, how far away will our sample proportion be from the true value? Or, in a survey setting, if we take a finite sample of n=1004, how far off from the population proportion are we likely to be?

A simulation might help: Assume p =.60 (population proportion) Take sample of n = 1004 and find p-hat. Save this value Repeat above 3 steps times.

The R code (for the record) phat <- c() for (i in 1:10000){ x <- sample(c(0,1),1004,replace=T,prob=c(.4,.6)) temp <- sum(x)/1004 phat <- c(phat,temp)} hist(phat)

each dot represents one survey of 1004 people

10,000 sample proportions, n = 1004

Observe that... sample proportions are centered on the true population value: p =.60 variability is not great: smallest is.54, biggest is.66 distribution is bell- shaped

We've just witnessed the Central Limit Theorem If samples are independent and random and sufficiently large means (and proportions) follow a nearly Normal distribution the mean of the Normal is the mean of the population the SD of the Normal (aka the standard error) is the population SD divided by sqrt(n)

CLT applied to sample proportions phat is distributed with an approx Normal mean is p SE is sqrt(p*(1-p)/n) For our simulation, p =.60 so our p-hats will be centered on.6 with a SD of sqrt(.6*.4/1004) =

We saw Normal mean(phat) = (expected.6) sd(phat) = (expected )

In practice, we don't know p but we can get a good approximation to the standard error using sqrt(phat * (1-phat)/n) rather than sqrt(p*(1-p)/n)

So if we take a random sample of n = 1004 and we see p-hat =.61, we know that: The true value of p can't be far away. SE = sqrt(.61*.39/1004) = So 68% of the time we do this, p will be within of phat And 95% of the time it will be with 2*.0154 = 0.03

Which leads us to conclude that the true proportion of the population that opposes a surge is somewhere in the interval = 0.58 to = 0.64

Confidence intervals This is an example of a 95% confidence interval. Because 95% of all samples will produce a p-hat that is within 2 standard errors of the true value, we are 95% confident that ours is a "good" interval.

Formula A 95% CI for a proportion is estimate +/- 2 * (Standard Error) p-hat +/- 2*sqrt(phat*(1-phat)/n) /- 2*sqrt(.61*.39/1004) (.58,.64) note: our replacing phat for p in SE means we get an approximate value

What does 95% mean? If we repeat this infinitely many times: –take a sample of n = 1004 from population –calculate sample proportion –find an interval using +/- 2 * SE then 95% of these CIs will contain the truth and 5% will not. We see only one: (.58,.64). It is either good or bad, but we are confident it is good.

Where did the 95% come from? It came from the normal curve. The CLT told us that p-hat followed a (approx) normal distribution. For Normal's, 68% of probability is within 1 standard deviation of mean, 95% within 2, 99.7% within 3. A normal table gives other probabilities

phat = % 95% 99.7% 1 SE 2 SEs 3 SEs 1.6 SE 90% Change confidence level by changing the width of margin of error

The CLT applies to any linear combination of the observations assuming observations are randomly sampled, and independent it does NOT matter what the distribution of the population looks like if n is small, the distribution will be only approximately normal, and this might be a very poor approximation

the CLT does NOT apply to non-linear combinations, such as the sample median or the standard deviation non-random samples samples that are dependent

simulation _dist/index.htmlhttp://onlinestatbook.com/stat_sim/sampling _dist/index.html

Summary Confidence Level is a statement about the sampling process, not the sample Margin of error is determined to achieve the desired confidence level We can calculate the confidence level only if we know the sampling distribution: the probability distribution of the sample

Pop Quiz Is the value 61% a statistic or a parameter? The margin of error is given as 3%. What does the margin of error measure? a) the variability in the sample b) the variability in the population c) the variability in repeated sampling

Pop Quiz Is the value 61% a statistic or a parameter? The margin of error is given as 3%. What does the margin of error measure? a) the variability in the sample b) the variability in the population c) the variability in repeated sampling

For next time: In WWII, German army produced tanks with sequential serial numbers. The allies captured a few tanks, and wanted to infer the total number of tanks produced. Suppose you had captured 10 tanks. Come up with three estimators for the total number of tanks. Data: