INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s parameters Sampling error: – The difference between a sample statistic and a population parameter – Probability theory Permits us to estimate the accuracy or representativeness of the sample
The “Catch-22” of Inferential Statistics – When we collect a sample, we know nothing about the population’s distribution of scores We can calculate the mean (x-bar) & standard deviation (s) of our sample, but and are unknown The shape of the population distribution (normal or skewed?) is also unknown
μ = ??? (N= Thousands ) Sample N = 150 Probability Theory Allows Us To Answer: What is the likelihood that a given sample statistic accurately represents a population parameter? X=9.6 Number of serious crimes committed in year prior to prison for inmates entering the prison system
Sampling Distribution (a.k.a. “Distribution of Sample Outcomes”) – “OUTCOMES” = proportions, means, etc. – From repeated random sampling, a mathematical description of all possible sampling event outcomes And the probability of each one – Permits us to make the link between sample and population… What is the probability that a sample finding is reflects the population? Is something that is true of a sample statistic likely to be true of a population parameter?
Relationship between Sample, Sampling Distribution & Population POPULATION SAMPLING DISTRIBUTION (Distribution of sample means, proportions, or other outcomes) SAMPLE
Sampling Distribution: Characteristics Central tendency – Sample means will cluster around the population mean – Since samples are random, the sample means should be distributed equally on either side of the population mean » The mean of the sampling distribution is always equal to the population mean Shape: Normal distribution – Central Limit Theorem: » Regardless of the shape of a raw score distribution (sample or population) of an interval-ratio variable, the sampling distribution will be approximately normal, as long as sample size is ≥ 100
Sampling Distribution: Characteristics – Dispersion: Standard Error (SE) Measures the spread of sampling error that occurs when a population is sampled repeatedly – Same thing as standard deviation of the sampling distribution – Tells exactly how much error, on average, should exist between the sample mean & the population mean – Formula: σ / √N » However, because σ usually isn’t known, s (sample standard deviation) is used to estimate population standard deviation
Sampling Distribution Standard Error Law of Large Numbers: The larger the sample size (N), the more probable it is that the sample mean will be close to the population mean – In other words: a big sample works better (should give a more accurate estimate of the pop.) than a small one – Makes sense if you study the formula for standard error Sampling Distribution Applet
1. Estimation ESTIMATION
Introduction to Estimation Estimation procedures – Purpose: To estimate population parameters from sample statistics – Using the sampling distribution to infer from a sample to the population – Most commonly used for polling data – 2 components: Point estimate Confidence intervals
Estimation Point Estimate: Value of a sample statistic used to estimate a population parameter Confidence Interval: A range of values around the point estimate Confidence Interval Point Estimate Confidence Limit (Lower) Confidence Limit (Upper)
Example CNN Poll (CNN.com; Feb 20, 2009): Slight majority thinks stimulus package will improve economy “The White House's economic stimulus plan isn't a surefire winner with the American public, but a majority does think the recovery plan will help. According to a new poll, fifty-three percent said the plan will improve economic conditions, while 44 percent said it won't stimulate the economy.” “On an individual level, there was less hope for improvement. According to the poll, 67 percent said it would not help them personally.” “The Poll was conducted Wednesday and Thursday (Feb 18-19, 2009), with 1,046 people questioned by telephone. The survey's sampling error is plus or minus 3 percentage points.”
Estimation – POINT ESTIMATES (another way of saying sample statistics) – CONFIDENCE INTERVAL a.k.a. “MARGIN OF ERROR” Indicates that over the long run, 95 percent of the time, the true pop. value will fall within a range of +/- 3 – Point estimates & confidence interval should be reported together “… but a majority does think the recovery plan will help, according to a new poll. Fifty- three percent said the plan will improve economic conditions, while 44 percent said it won't stimulate the economy. …. The Poll was conducted Wednesday and Thursday (Feb 18-19, 2009), with 1,046 people questioned by telephone. The survey's sampling error is plus or minus 3 percentage points.
Estimation1 : Pick Confidence Level – Confidence LEVEL Probability that the unknown population parameter falls within the interval – Alpha ( » The probability that the parameter is NOT within the interval » is the odds of making an error » Confidence level = 1 - Conventionally, confidence level values are almost always 95%or 99%
Procedure for Constructing an Interval Estimate 2. Divide the probability of error equally into the upper and lower tails of the distribution (2.5% error in each tail with 95% confidence level) – Find the corresponding Z score Z scores
Procedure for Constructing an Interval Estimate 3. Construct the confidence interval Proportions (like the eavesdropping poll example): – Sample point estimate (convert % to a proportion): » “Fifty-three percent said the plan will improve economic conditions…” » 0.53 – Sample size (N) = 1,046 – Formula 7.3 in Healey » Numerator = (your proportion) (1- proportion) » 95% confidence level (replicating results from article) » 99% confidence level – intervals widen as level of confidence increases
Example 1: Estimate for the economic recovery poll p =.53 (53% think it will help) Z = 1.96 (95% confidence interval) N = 1046 (sample size) What happens when we… – Recalculate for N = 10,000 – N back to original, recalculate for p. =.90 – Back to original, but change confidence level to 99%