Download presentation
Presentation is loading. Please wait.
Published byMatilda Madison Poole Modified over 9 years ago
1
Inferring the Mean and Standard Deviation of a Population
2
Central Problem Two important numbers tell us a lot about a distribution of data: Mean tells us the central tendency of the data Standard deviation tells us the spread in the data The problem is … we don’t normally know either of these and must infer them from a SRS of the population
3
Baby Paradox Two hospitals in the same city deliver, on average, a 50:50 ratio of baby girls and baby boys. Hospital A delivers 120 babies a day (on average) while hospital B delivers 12 babies a day (on average). One day there were twice as many boys as girls born in one of the hospitals. In which hospital is this more likely to happen?
4
Measuring the mean… How do we know the mean of a population? Answer: We can either measure every single sample in the population or estimate the mean from a suitable SRS We will assume that the population is normally distributed so X has a normal distribution N( , /√n)
5
Standard Error and Standard Deviation These are two very distinct and different ideas: Standard error measures the uncertainty in the measure of the mean This depends on how YOU measure and sample size Standard deviation measures the spread in the data This is a property of the data set – does not change We can often estimate the standard deviation by measuring the standard error.
6
Standard error is always less than standard deviation SE gets smaller as n grows does not change! SE measures the uncertainty in location of mean measures spread in data
7
t-Distributions If we know then setting a confidence interval on how well our sample mean X measures the true mean is easy: But – if we don’t know then we estimate use the t- distribution:
8
Closer look at t-distributions The t-distribution looks very much like the Normal distribution and as the number of degrees of freedom (df) gets large the two become indistinguishable t-distribution tables are used much the same way as N(0,1) – major difference is the df value
9
Example… You are inspecting a shipment of 10 000 precision machined rods to be used in an engine assembly plant. You select a random sample of 20 and measure the diameters. You find that the average diameter of the sample is 5.465 cm with a standard deviation in the measurements of 0.005 cm. It is critical that the diameters do not exceed 5.471 cm. You are willing to accept a 1% failure rate. Should you accept the shipment?
10
Solution: This would be an example of a 1-tailed t- distribution, = 0.01, t 19,0.01 = 2.539t 19,0.01 = 2.539 A 1% failure rate looks like this:
11
Test the numbers… This implies that 99.998% of the sample will not exceed the threshold diameter Accept!
12
Two-tailed t-Tests In the previous example we looked at whether or not the diameter was less than a maximum allowable value. Just as we have done earlier with confidence intervals we can also specify a maximum allowable range (“plus or minus”) for our mean. Let’s test the mean diameter at a 95% confidence level that is implied by our measurement Use following formula: Margin of error
13
We measured mean diameter as 5.645 cm, s = 0.005 so the upper and lower margins are: We can be 95% confident that the diameters of the parts are in the range (5.463,5.467) cm
14
Example 7.9 Plot data: Identify variables, etc: df = (50-1) = 49 = 0.05 = 23.56, s = 12.52 t = 2.009 Interval = (20.00,27.12) ?
15
Example of a Matched Pairs t-test: Exercise 7.40 Formulate appropriate hypotheses H 0 : no difference H : LH > RH Re-arrange data: find and s (see next page)
16
H o : = 0 df = 25 - 1 = 24 Find Use Excel =tdist(t, df, #tails) Use Table D The probability of the null hypothesis is only 0.004 LH thread takes longer
17
Robustness… A statistical test is considered robust if: It is insensitive to deviations from original assumptions being made. This could include smaller sample size or deviation from normality
18
Rules of thumb – When to use the t-test Small sample sizes (n≈15) and close to normal Mid range sample size (n ≥ 15) as long as distribution not strongly skewed and no outliers Large sample size (n > 40) even if skewed or with some outliers Fine print: Rules of thumb do not obviate the need to always inspect your data! Stemplots or histograms give you insight into just how “skewed” or “outlier-riddled” is your data. Always know what the data set looks like before applying tests.
19
In conclusion… Read 7.1 carefully – we skipped over some terms and discussions of applicability of the t- test Be sure you understand when (and why) we need the t-test Know the difference between standard deviation and Standard Error Try: 7.4, 7.12, 7.26, 7.42
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.