Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright © 2005 by Evan Schofer

Similar presentations


Presentation on theme: "Copyright © 2005 by Evan Schofer"— Presentation transcript:

1 Sociology 5811: Lecture 8: CLT Applications: Confidence Intervals, Examples
Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

2 Announcements Problem Set 3 handed out On course website

3 Review: Sampling Distributions
Q: What is the sampling distribution of the mean? Answer: Sampling Distribution: The distribution of estimates created by taking all possible unique samples (of a fixed size) from a population Q: What is the Standard Error? Answer: The standard deviation of the sampling distribution Q: What does the Standard Error tell you? Answer: How “dispersed” estimates will be around the true parameter value

4 Review: Central Limit Theorem
Q: What does the CLT mean in plain language? 1. As N grows large, the sampling distribution of the mean approaches normality

5 Central Limit Theorem: Visually
s

6 Implications of the C.L.T
Visually: Suppose we observe mu-hat = 16 There are many possible locations of m Sampling distribution But, mu-hat always falls within the sampling distribution

7 Implications of the C.L.T
What is the relation between the Standard Error and the size of our sample (N)? Answer: It is an inverse relationship. The standard deviation of the sampling distribution shrinks as N gets larger Formula: Conclusion: Estimates of the mean based on larger samples tend to cluster closer around the true population mean.

8 Implications of the CLT
The width of the sampling distribution is an inverse function of N (sample size) The distribution of mean estimates based on N = 10 will be more dispersed. Mean estimates based on N = 50 will cluster closer to m. Smaller sample size Larger sample size

9 Confidence Intervals Benefits of knowing the width of the sampling distribution: 1. You can figure out the general range of error that a given point estimate might miss by Based on the range around the true mean that the estimates will fall 2. And, this defines the range around an estimate that is likely to hold the population mean A “confidence interval” Note: These only work if N is large!

10 Confidence Interval Confidence Interval: “A range of values around a point estimate that makes it possible to state the probability that an interval contains the population parameter between its lower and upper bounds.” (Bohrnstedt & Knoke p. 90) It involves a range and a probability Examples: We are 95% confident that the mean number of CDs owned by grad students is between 20 and 45 We are 50% confident the mean rainfall this year will be between 12 and 22 inches.

11 Range where m is unlikely to be Q: Can m be this far from mu-hat?
Confidence Interval Visually: It is probable that m falls near mu-hat Range where m is unlikely to be Probable values of m Q: Can m be this far from mu-hat? Answer: Yes, but it is very improbable

12 Confidence Interval To figure out the range in of “error” in our mean estimate, we need to know the width of the sampling distribution The Standard Error! (S.D. of the sampling dist of the mean) The Central Limit Theorem provides a formula: Problem: We do not know the exact value of sigma-sub-Y, the population standard deviation!

13 Confidence Interval Question: How do we calculate the standard error if we don’t know the population S.D.? Answer: We estimate it using the information we have: Where N is the sample size and s-sub-Y is the sample standard deviation.

14 95% Confidence Interval Example
Suppose a sample of 100 students with mean SAT score of 1020, standard deviation of 200 How do we find the 95% Confidence Interval? If N is large, we know that: 1. The sampling distribution is roughly normal 2. Therefore 95% of samples will yield a mean estimate within 2 standard deviations (of the sampling distribution) of the population mean (m) Thus, 95% of the time, our estimates of m (Y-bar) are within two “standard errors” of the actual value of m .

15 95% Confidence Interval Formula for 95% confidence interval:
Where Y-bar is the mean estimate and sigma (Y-bar) is the standard error Result: Two values – an upper and lower bound Adding our estimate of the standard error:

16 95% Confidence Interval Suppose a sample of 100 students with mean SAT score of 1020, standard deviation of 200 Calculate: Thus, we are 95% confident that the population mean falls between 980 and 1060

17 Confidence Intervals Question: Suppose we want to know the confidence interval for a value other than 95%? How can we find the C.I. For any number? Answer #1: We know that 68% of cases fall within 1 standard deviation, 99% within 3 Q: What is 99% C.I.? (Y-bar = 1020, S.D. = 200)

18 Confidence Intervals Question: Which was a larger range: the 95% CI or 99% CI ? Answer: The 99% range was larger The larger the range, the more likely that the true mean will fall in it It is a safe bet if you specify a very wide range If you want to bet that the mean will fall in a very narrow range, you’ll lose more often.

19 Confidence Intervals Question: Suppose we want to know the confidence interval for a value other than 95%? Answer #2: Look at the “Z-table” Z-table = Normal curve probability distribution with mean 0, SD of 1 Found on Knoke, p. 459 It tells you the % of cases falling within a particular number of S.D.’s of the mean Lists all values, not just 1, 2, and 3!

20 Confidence Intervals: Z-table
Question: What Z-value should we use for 20% confidence interval? Answer: 10% fall from 0 to Z=.26. 20% of cases fall from -.26 to +.26

21 Confidence Intervals General formula for Confidence Interval: Where:
Y-bar is the sample mean Sigma sub-Y-bar is the standard error of mean Z sub a/2 is the Z-value for level of confidence It can be looked up in a Z-table If you want 90%, look up p(0 to Z) of .45

22 Small N Confidence Intervals
If N is large, the C.L.T. assures us that that the sampling distribution is normal This allows us to construct confidence intervals Issue: What if N is not large? The sampling distribution may not be normal Z-distribution probabilities don’t apply… In short: If N is small our confidence interval formula based on Z-scores doesn’t work.

23 Small N Confidence Intervals
Solution: Find another curve that accurately characterizes sampling distribution for small N The “T-distribution” An alternative that accurately approximates the shape of the sampling distribution for small N The T distribution actually a set of distributions with known probabilities Again, we can look up values in a table to determine probabilities associated with a # of standard deviations from the mean.

24 Confidence Intervals for Small N
Small N C. I. Formula: Yields accurate results, even if N is not large Again, the standard error can be estimated by the sample standard deviation:

25 T-Distributions Issue: Which T-distribution do you use?
The T-distribution is a “family” of distributions In a T-Distribution table, you’ll find many T-distributions to choose from One t-distribution for each “degree of freedom” Also called “df” or “DofF” Which T-distribution should you use? For confidence intervals: Use T-distribution for df = N - 1 Ex: If N = 15, then look at T-distribution for df = 14.

26 Looking Up T-Tables Choose the desired probability for a/2
Find t-value in correct row and column Interpretation is just like a Z-score = number of standard errors for C.I.! Choose the correct df (N-1)

27 Uses of Confidence Intervals
What are some uses for confidence intervals? 1. Assessing the general quality of an estimate Ex: Mean level of happiness of graduate students Happiness scored on a measure from 1-10 (10=most) Suppose 95% is: 6 +/- 4 i.e., range = 2 to 10 Question: Is this a “good” estimate? Answer: No, it is not very useful. Something like 6 +/- 1 is a more useful estimate.

28 Uses of Confidence Intervals
2. Comparing a mean estimate to a specific value Ex: Comparing a school’s test scores to a national standard Suppose national standard on a math test is 47 Suppose a sample of students scores 52. Did the school population meet the national standard? If 99% CI is 50-54, then the answer is probably yes If 99% CI is 42-62, it isn’t certain. Ex: A factory makes bolts that must hold 10 kilos Confidence intervals let you verify that the bolts are strong enough, without testing each one.

29 Uses of the Sampling Distribution
Extended example: Let’s figure out what the sampling distribution looks like for a specific population Since the sampling distribution is a probability distribution…. We can then calculate the probability of observing any particular value of Y-bar (given a known m) Note: Later we’ll use the converse logic to draw conclusions about the actual value of m, given an observed Y-bar.

30 Probability of Y-bar, given m
Suppose we have a population with the following characteristics:  = 23,  = 9 What is the probability of picking a sample (N=35) that has a mean of 27 or more? To determine this, we must first determine the shape of the sampling distribution Then we can determine the probability of falling a given distance from it…

31 Probability of Y-bar, given m
Q: According to the Central Limit Theorem, what is the mean of the sampling distribution? A: Same as the population: Second, we must determine the “width” of the sampling distribution: the standard deviation (referred to as Standard Error) The C.L.T says we can calculate it as:

32 Probability of Y-bar, given m
If we know m and the Standard Error, we can draw the sampling distribution of the mean for this population:

33 Probability of Y-bar, given m
We know that 95% of possible Y-bars fall within two Standard Errors (i.e., +/- 3): between 20 and 26

34 Probability of Y-bar, given m
To determine the probability associated with a particular value, convert to Z-scores p(-1<Z<1) is.68, p(-2<Z<2) is.95, etc We use a slightly different Z-score formula than we learned before But it is analogous

35 Probability of Y-bar, given m
Why use a different formula for Z-scores? Old formula calculates # standard deviations a case falls from the sample mean From Y-sub-i to Y-bar New formula tells the number of standard errors a mean estimate falls from the population mean  From Y-bar to mu

36 Probability of Y-bar, given m
Back to the problem: What is the Z-score associated with getting a sample mean of 27 or greater from this population? Sampling distribution mean = 23 Standard error = 1.5

37 Probability of Y-bar, given m
Finally, what is the probability of observing a Z-score of 2.66 (or greater) in a standard normal distribution? To convert Z-scores to probabilities, look it up in a table, such as Knoke p. 463 Area beyond Z=2.66 is .0039 How do we interpret that? Lets look at it visually:

38 Probability of Y-bar, given m
The Z-distribution is a probability distribution Total area under curve = 1.0 Area under half curve is .5 Red are (“Area beyond Z”) = .0039

39 Probability of Y-bar, given m
Is the probability of Z > 2.66 very large? No! Red area = probability of Z > 2.66 = .004, which is .4%

40 Probability of Y-bar, given m
Conclusion: Y-bar of 27 (or larger) should occur only 4 out of 1000 times we sample from this population Possible interpretations: 1. We just experienced an improbable sample 2. Our sample was biased, not representative 3. Maybe we begin to suspect that the population mean () isn’t really 23 after all… Idea: We could “cast doubt on” someone’s claim that m = 23, given this observed Y-bar and S.D. Hypothesis testing is based on this!

41 Conclusions About Means
The previous example started out with the assumption that m = 23 Typically, m will be unknown; Only Y-bar is known But, the same logic can be applied to “test” whether m is likely to equal 23 If observed Y-bar is highly unlikely, we cast doubt on the idea that m is really 23 Example: We can “test” whether a school’s math scores are above national standard of 47 If school sample is far above national average, it is improbable that the school population is at or below 47 Next Class: Hypothesis testing!


Download ppt "Copyright © 2005 by Evan Schofer"

Similar presentations


Ads by Google