Stat 31, Section 1, Last Time Sampling Distributions Binomial Distribution Binomial Probs Normal Approx. to Binomial Counts Scale vs. Proportion Scale
Important Announcement 2nd Midterm Date Changed, from: Tuesday, April 5, to Tuesday, April 12.
Section 5.2: Distrib’n of Sample Means Idea: Study Probability Structure of Based on Drawn independently From same distribution, Having Expected Value: And Standard Deviation:
Expected Value of Sample Mean How does relate to ? Sample mean “has the same mean” as the original data.
Variance of Sample Mean Study “spread” (i.e. quantify variation) of Variance of Sample mean “reduced by ”
S. D. of Sample Mean Since Standard Deviation is square root of Variance, Take square roots to get: S. D. of Sample mean “reduced by ”
Mean & S. D. of Sample Mean Summary: Averaging: Gives same centerpoint Reduces variation by factor of Called “Law of Averages, Part I”
Law of Averages, Part I Some consequences (worth noting): To “double accuracy”, need 4 times as much data. For 10 times accuracy”, need 100 times as much data.
Law of Averages, Part I HW: 5.28 (5.77, 4)
Distribution of Sample Mean Now know center and spread, what about “shape of distribution”? Case 1: If are indep. CAN SHOW: (knew these, news is “mound shape”) Thus work with NORMDIST & NORMINV
Distribution of Sample Mean Case 2: If are “almost anything” STILL HAVE: “approximately”
Distribution of Sample Mean Remarks: Mathematics: in terms of Called “Law of Averages, Part II” Also called “Central Limit Theorem” Gives sense in which Normal Distribution is in the center Hence name “Normal” (ostentatious?)
(any sum of small indep. Random pieces) Law of Averages, Part II More Remarks: Thus we will work with NORMDIST & NORMINV a lot, for averages This is why Normal Dist’n is good model for many different populations (any sum of small indep. Random pieces) Also explains Normal Approximation to the Binomial
Normal Approx. to Binomial Explained by Law of Averages. II, since: For X ~ Binomial (n.p) Can represent X as: Where: Thus X is an average (rescaled sum), so Law of Averages gives Normal Dist’n
Law of Averages, Part II Nice Java Demo: http://www.amstat.org/publications/jse/v6n3/applets/CLT.html 1 Dice (think n = 1): Average Dist’n is flat 2 Dice (n = 1): Average Dist’n is triangle … 5 Dice (n = 5): Looks quite “mound shaped”
Law of Averages, Part II Another cool one: http://www.ruf.rice.edu/~lane/stat_sim/sampling_dist/index.html Create U shaped distribut’n with mouse Simul. samples of size 2: non-Normal Size n = 5: more normal Size n = 10 or 25: mound shaped
Law of Averages, Part II Shows: Even starting from non-normal shape, Class Example: https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg19.xls Shows: Even starting from non-normal shape, Averages become normal More so for more averaging SD smaller with more averaging ( )
Law of Averages, Part II HW: 5.31, 5.33, 5.35, 5.39
And now for something completely different…. A statistics professor was describing sampling theory to his class, explaining how a sample can be studied and used to generalize to a population. ???
Chapter 6: Statistical Inference Main Idea: Form conclusions by quantifying uncertainty (will study several approaches, first is…)
Section 6.1: Confidence Intervals Background: The sample mean, , is an “estimate” of the population mean, How accurate? (there is “variability”, how much?)
Confidence Intervals Recall the Sampling Distribution: (maybe an approximation)
Confidence Intervals Thus understand error as: How to explain to untrained consumers? (who don’t know randomness, distributions, normal curves)
Estimate +- margin of error Confidence Intervals Approach: present an interval With endpoints: Estimate +- margin of error I.e. reflecting variability How to choose ?
Confidence Intervals Choice of “Confidence Interval radius”, i.e. margin of error, : Notes: No Absolute Range (i.e. including “everything”) is available From infinite tail of normal dist’n So need to specify desired accuracy
Confidence Intervals Choice of “Confidence Interval radius”, : Approach: Choose a Confidence Level Often 0.95 (e.g. FDA likes this number for approving new drugs, and it is a common standard for publication in many fields) And take margin of error to include that part of sampling distribution
Confidence Intervals E.g. For confidence level 0.95, want distribution 0.95 = Area = margin of error
Confidence Intervals Computation: Recall NORMINV takes areas (probs), and returns cutoffs Issue: NORMINV works with lower areas Note: lower tail included
Confidence Intervals So adapt needed probs to lower areas…. When inner area = 0.95, Right tail = 0.025 Shaded Area = 0.975 So need to compute:
Confidence Intervals Need to compute: Major problem: is unknown But should answer depend on ? “Accuracy” is only about spread Not centerpoint Need another view of the problem
Confidence Intervals Approach to unknown : Recenter, i.e. look at dist’n Key concept: Centered at 0 Now can calculate as:
Confidence Intervals Computation of: Smaller Problem: Don’t know Approach 1: Estimate with Leads to complications Will study later Approach 2: Sometimes know
Confidence Intervals 138 139.1 113 132.5 140.7 109.7 118.9 134.8 109.6 127.3 115.6 130.4 130.2 111.7 105.5 E.g. Crop researchers plant 15 plots with a new variety of corn. The yields, in bushels per acre are: Assume that = 10 bushels / acre
Confidence Intervals E.g. Find: The 90% Confidence Interval for the mean value , for this type of corn. The 95% Confidence Interval. The 99% Confidence Interval. How do the CIs change as the confidence level increases? Solution, part 1 of: https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg20.xls
Confidence Intervals An EXCEL shortcut: CONFIDENCE Careful: parameter is: 2 tailed outer area So for level = 0.90, = 0.10
Confidence Intervals HW: 6.1, 6.3, 6.5