Download presentation
Presentation is loading. Please wait.
Published byMarjorie Bishop Modified over 9 years ago
1
Session V: The Normal Distribution Continuous Distributions (Zar, Chapter 6)
2
The most important continuous distribution: The Normal distribution. 1)Originally defined by Laplace (~1799) 2)Developed by Gauss (~1850): often called the Gaussian distribution although some say he had little to do with it. 3)Named the “Normal Distribution” by Karl Pearson (1920) 4)Has nothing to do with normal vs. abnormal, but is the “common” distribution of many processes. The subject of this chapter: Continuous Distributions
6
Some Review: Histogram Density xx
8
The “Normal” Distribution The Density Function N Recall…
11
From Table B.2 (App 17)
12
About 1/3 About 13.5 % About 2.5% 34% 13.5% 2.5%
13
66.7% 95%
14
A test of the normal distribution:
15
Moments Definitions: Central Moments: First Central Moment is Zero:
16
Second Central Moment is the Variance: Estimated by s 2 “Machine Calculation Formula” for the Variance:
17
Moments (Cont.) Third Moment: Skewness K 3 < 0 Mean < Median Skewed “to the left” K 3 = 0 Mean = median Symmetric K 3 > 0 Mean> Median Skewed “to the right”
18
Normalized as: “Unit”-less Estimated by: and “Machine” formula:
19
Fourth Moment: Kurtosis Normalized by Estimated by and “Unit”-less
20
Why –3? Normal Distribution has uncorrected 4 th moment = 3 K 4 < 0 Lepto-kurtotic K 4 = 0 Meso-kurtotic K 4 > 0 Platy-kurtotic
21
So, for the normal distribution, Many distributions can be characterized by their first four moments. A system called the Pearson system (after E.S. Pearson) is such a system. Not used much any more.
22
So… Why the “Normal” Distribution? Distribution of Means: Universe These are data points. If we selected them again they would be different. X 1 X n X 3 X 2 These are random variables that generate.
23
is a data point, but if we did the experiment over and over again calculating a new each time, they would be different and have a distribution based upon the random variables So if is a random sample, what’s the distribution of ? Now
24
Start easy! n=2
25
For a general “n”: and
26
Central Limit Theorem: or more precisely This means that if is large enough, then is normally distributed, or at least close enough.
27
How large does have to be? Ex: The uniform distribution 1010 0 1
28
AuthorityNumber IBM 12 DAJ ~30 How many samples are necessary before is normally distributed
29
Biomedical considerations Typical sample is actually a large sum of independent elements all with about the same distribution. or, at worst, Sample is the sum of distinct clones
30
Testing the sample for the Normal Distribution! Chi-Square Goodness of Fit Chi-Square we have observed and expected Frequencies. We then combined them in the Chi-Square statistics. For continuous distribution and sample form the histogram: h3h3 h2h2 h4h4 h 1 h5h5 observed: h 1 h 2 h 3, h 4 h 5 expected: ? From H 0
31
Ex:6.1
32
How to Calculate the Probability of a “bin”:
33
* Rule of Thumb: Not more than 20-25% of expected frequencies < 5 and no frequency < 1. Now back to the example!
34
Ex:6.1 3.05 6.33 7.25 5.5 Pooled
35
d.f. = 11 – 1 –2 = 8 Does this mean that Height is Normally Distributed? Yes, N(70.17,10.96)!
36
Probits: A graphical way to test Normality The probit transformation: The value of x that corresponds to a probability if x is normal with mean 0 and variance 1 Sometimes with mean 3 (to make the 95% confidence positive)
37
-3 -2 -1 0 1 2
39
Use of probit paper:
40
If you plot normally distributed data, you get a straight line:
41
Use of probit transformation on SPSS Calculate the ranked data using the Syntax window: rank x. (Creates the ranks in variable rx) compute px=rx/#samples. (Normalizes to 1.00) compute probitx=probit(px). The new variable can be plotted against x. It should be close to a straight line if x is normally distributed.
45
Kolomogorov-Smirnov goodness of fit test to the normal distribution Compare the frequency polygon to the hypothesized curve 1.Calculate the frequency polygon. 2.H 0 : N(mean, variance) or other distribution 3.Find or calculate the parameters (mean, variance, etc.) 4.Calculate the distribution function using H 0 5.Compare:
48
But, these are tests of the original distribution! Do we necessarily care what that distribution is? We more often want to compare the parameters of the distribution. Has the distribution moved as a result of treatment? Different populations? Is the new treatment “better” than the old?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.