Presentation is loading. Please wait.

Presentation is loading. Please wait.

Announcements Exam 1 key posted on web HW 7 (posted on web) Due Oct. 24 Bonus E Due Oct. 24 Office Hours –this week: –Wed. 8-11, 3-4 –Fri. 8-11 For Bonus.

Similar presentations


Presentation on theme: "Announcements Exam 1 key posted on web HW 7 (posted on web) Due Oct. 24 Bonus E Due Oct. 24 Office Hours –this week: –Wed. 8-11, 3-4 –Fri. 8-11 For Bonus."— Presentation transcript:

1 Announcements Exam 1 key posted on web HW 7 (posted on web) Due Oct. 24 Bonus E Due Oct. 24 Office Hours –this week: –Wed. 8-11, 3-4 –Fri. 8-11 For Bonus E look for sample size, margin of error, sampling method, etc. Last time in class, we discussed summary statistics and graphs Today we will cover the sampling distribution of the mean

2 Bonus E: Election Coverage Give a statistical critique of election coverage of next week’s debate If you can’t watch debate, you may use a magazine or newspaper (include copy) Clarity: 2 points Validity: 2 points Brevity: 2 points Typed on paper: due Oct. 24

3 Sample & Population Symbols Sample mean = x (x-bar) Sample SD = s Sample size = n Population mean =  pop Population SD =  pop Standard letters represent sample values (which are known) and Greek letters represent population values (which are unknown and “Greek to me”). The sample values will be used to estimate the population values.

4 Sampling Distribution of the Mean Sampling distribution characterizes all possible sample means and their likelihoods Sampling distribution will be used in hypothesis testing and confidence intervals for the population mean

5 It’s normal! For large enough samples (at least 30 observations) the sampling distribution is normal If the population is normal, the sampling distribution is normal for any sample size. The mean of the normal curve is  pop The SD of the normal curve is  pop /sqrt(n) (n=no. obs)  pop  pop /sqrt(n)

6 Benefits of Normality The horizontal axis corresponds to possible sample values for the mean The height of the curve represents how many samples have a sample mean of that value The most likely sample means are also those closest to the true value. The width of the curve narrows with larger samples - showing that larger samples get closer to the truth!

7 Review so far We’re dealing with numerical data Wish to summarize with mean and SD Want to generalize to population Look at all possible samples and their corresponding means Resulting distribution of sample means is normal Think of the distribution as a histogram of the possible sample means

8 Where to go from here We want to use this fact to generalize our results to the population through hypothesis tests and confidence intervals We know that the normal curve is centered at the right place - the population mean If we can figure out the width of the normal curve, then we know how close the sample mean should be to the population mean We need to relate the resulting normal curve to the standard normal to find areas

9 Converting it to a “Z” Recall that to convert our curve to the standard normal, we use Z = (X-  )/   =  pop  =  pop /sqrt(n) Then we can find the areas and the probabilities of certain samples There is a minor obstacle - we don’t know  pop Let’s estimate  pop with s, the sample SD This will modify the right side of the Z formula Left must be modified to balance this change

10 How to modify “Z” The modification must account for the fact that a sample value was used to replace a true population value The sample standard deviation has its own sampling distribution With larger samples, the sample standard deviation will be closer to the true population standard deviation By accounting for these considerations, the sampling distribution for “Z” was found

11 The t-distribution The correct sampling distribution for the value t = (x-  )/(s/sqrt(n)) is the t- distribution The t-distribution is shaped like the standard normal, but a little shorter with heavier tails The heavier tails account for the fact that the sample standard deviation has it’s own sampling variability The t-distribution has a total area of 1

12 The t-distribution Symmetric about zero “Spread” determined by the degrees of freedom df = n-1 Higher df means that the sample standard deviation is a good approximation Therefore, higher df makes the t more like the z

13 History of the t-distribution Actually it’s called the “Student’s t- distribution” Guiness Brewery in Ireland was trying to use sampling to monitor the quality of its products and hired a mathematician This mathematician developed the t- distribution to address the challenge Published under the name “student” to protect company secrets in early 1900’s

14 Example: Frozen Dinners Each dinner is slightly different Stated No. of calories per meal is 240 Wish to test if this is true (H 0 :  = 240) H A :   240,  = 0.05 Take a simple random sample of 12 dinners Calorie Counts: 255, 244, 239, 242, 265, 245, 259, 248, 225, 226, 251 x = 244.33, s = 12.38 t=(244.33-240) /(12.38/sqrt(12)) =1.21 df = 11 p-value = 0.2508 95 % CI = (236,253)

15 Example: Frozen Dinners The t-curve to the right represents all possible sample values of t if the true pop is normal with mean 240 Our sample is fairly reasonable under this null hypothesis Fail to reject H 0

16 Example: Pennies Wish to determine the average age of penny in circulation Test H 0 :  pop <= 8 Set  = 0.05 Sample 67 pennies (is it simple random?) sample mean = 11.40 sample SD = 8.50 Sample t = 3.28 df = 66 p-value = 0.0008 Reject the null If the true population mean age was 8, we would observe a sample with this high of a mean (or higher) only 0.0008 of the time

17 Example: Pennies If the true population has mean 8, then all possible sample means and SDs (and thus sample t’s) are described by this t-dist The real sample has t=3.28, quite rare if the null is true Reject the null! The real sample doesn’t match the proposed population.

18 Example: Pennies Well, if 8 is not a reasonable value for the population mean, then what is? I start by proposing population values until my observed sample falls in the middle. This results in the confidence interval. The 95% CI in this case is (9.32,13.48) Note: The interval does not include 8! The test and CI agree: 8 is not a reasonable population value.

19 The Formulas Hypothesis Testing t = (x-  Ho )/(s/sqrt(n)) df = n-1 use table to find p-value Confidence Interval x ± t  /2,n-1 s/sqrt(n) Stataquest can perform calculations Not required to memorize, but note: Increasing n narrows CI’s Decreasing  widens CI’s Increasing n reduces the probability of a Type II Increasing  reduces the probability of a Type II Type I is fully controlled by  = Prob of Type I

20 Review and Preview We’re dealing with numerical data Want to make statements about pop mean Sampling distribution of mean is normal SD of resulting normal is unknown, depends on pop SD Estimating the pop SD with sample SD results in t-distribution t distribution is used to make statements about a population mean through hypothesis tests and confidence intervals

21 Review and Preview Both examples done today used the one- sample t-test The one sample t-test is used to make statements about a single population mean Next time we will discuss the paired t- test The paired t-test is used to test the mean change in a population (before-after studies, like Quaker Oats commercial)

22 Using StataQuest: One Sample t procedures Click Editor. Enter data. Click Close. Go to Statistics: Parametric Tests: 1- sample t test. Select the variable of interest and set confidence level For the frozen dinners, this gives: Variable | Obs Mean Std. Dev. ---------+--------------------------------- calories | 12 244.3333 12.38278 Ho: mean = 240 t = 1.21 with 11 d.f. Pr > |t| = 0.2508 95% CI = (236.46568,252.20098) Since our test is two- tailed and SQ gives two-tailed p-values, this is the p-value for our test as well.

23 Using StataQuest: One sample t procedures Following the steps on the previous page for the pennies we get: Variable | Obs Mean Std. Dev. ---------+--------------------------------- age | 67 11.40299 8.501443 Ho: mean = 8 t = 3.28 with 66 d.f. Pr > |t| = 0.0017 95% CI = (9.3293251,13.476645) We want a right tail p- value since our alternative is right- sided Since t>0, we take the 2-tailed p-value given by SQ and divide it by two If the alternative had been left sided, we would take half of SQ’s p-value and subtract it from one

24 Using StataQuest One Sample t procedures Use this method when you have summary stats: x, s, n Go to Calculator: 1- sample t test Enter the requested info (hypothesized mean is the mean stated in null hypothesis) For the pennies –No. obs = 67 –Sample mean = 11.40 –Sample SD = 8.50 –Hyp. Mean = 8 –Conf. Level = 95 Variable | Obs Mean Std. Dev. ---------+--------------------------------- x | 67 11.4 8.5 Ho: mean = 8 t = 3.27 with 66 d.f. Pr > |t| = 0.0017 95% CI = (9.32669,13.47331)


Download ppt "Announcements Exam 1 key posted on web HW 7 (posted on web) Due Oct. 24 Bonus E Due Oct. 24 Office Hours –this week: –Wed. 8-11, 3-4 –Fri. 8-11 For Bonus."

Similar presentations


Ads by Google