Chapter 7: Normal Probability Distributions April 17
In Chapter 7: 7.1 Normal Distributions 4/20/2017 In Chapter 7: 7.1 Normal Distributions 7.2 Determining Normal Probabilities 7.3 Finding Values That Correspond to Normal Probabilities 7.4 Assessing Departures from Normality Basic Biostat
§7.1: Normal Distributions Normal random variables are the most common type of continuous random variable First described de Moivre in 1733 Laplace elaborated the mathematics in 1812 Describe some (not all) natural phenomena More importantly, describe the behavior of means
Normal Probability Density Function Recall the continuous random variables are described with smooth probability density functions (pdfs) – Ch 5 Normal pdfs are recognized by their familiar bell-shape This is the age distribution of a pediatric population. The overlying curve represents its Normal pdf model
Area Under the Curve The darker bars of the histogram correspond to ages less than or equal to 9 (~40% of observations) This darker area under the curve also corresponds to ages less than 9 (~40% of the total area)
Parameters μ and σ Normal pdfs are a family of distributions Family members identified by parameters μ (mean) and σ (standard deviation) μ controls location σ controls spread
Mean and Standard Deviation of Normal Density Chapter 7 4/20/2017 Mean and Standard Deviation of Normal Density σ μ Basic Biostat
Standard Deviation σ Points of inflections (where the slopes of the curve begins to level) occur one σ below and above μ Practice sketching Normal curves to feel inflection points Practice labeling the horizontal axis of curves with standard deviation markers (figure)
Means and Standard Deviations Chapter 7 4/20/2017 Means and Standard Deviations The mean and standard deviation from data sets are denoted “xbar” and s The mean and standard deviation parameters from the Normal distributions are μ and σ These means and standard deviations are related, but are not the same thing Basic Biostat
68-95-99.7 Rule for Normal Distributions Chapter 7 4/20/2017 68-95-99.7 Rule for Normal Distributions 68% of the AUC falls within ±1σ of μ 95% of the AUC falls within ±2σ of μ 99.7% of the AUC falls within ±3σ of μ Basic Biostat
Chapter 7 4/20/2017 Example: 68-95-99.7 Rule Wechsler adult intelligence scores are Normally distributed with μ = 100 and σ = 15; X ~ N(100, 15). Using the 68-95-99.7 rule: 68% of scores fall in μ ± σ = 100 ± 15 = 85 to 115 95% of scores fall in μ ± 2σ = 100 ± (2)(15) = 70 to 130 99.7% of scores in μ ± 3σ = 100 ± (3)(15) = 55 to 145 Basic Biostat
Symmetry in the Tails Because of the Normal curve is symmetrical and the total AUC adds to 1… … we can determine the AUC in tails, e.g., Because 95% of curve is in μ ± 2σ, 2.5% is in each tail beyond μ ± 2σ 95%
Chapter 7 4/20/2017 Example: Male Height Male height is approximately Normal with μ = 70.0˝ and σ = 2.8˝ Because of the 68-95-99.7 rule, 68% of population is in the range 70.0˝ 2.8˝ = 67.2 ˝ to 72.8˝ Because the total AUC adds to 100%, 32% are in the tails below 67.2˝ and above 72.8˝ Because of symmetry, half of this 32% (i.e., 16%) is below 67.2˝ and 16% is above 72.8˝ Basic Biostat
Example: Male Height 64% 16% 16% 70 67.2 72.8 Chapter 7 4/20/2017 Basic Biostat
Reexpression of Non-Normal Variables Many biostatistical variables are not Normal We can reexpress non-Normal variables with a mathematical transformation to make them more Normal Example of mathematical transforms include logarithms, exponents, square roots, and so on Let us review the logarithmic transformation
Logarithms Logarithms are exponents of their base There are two main logarithmic bases common log10 (base 10) natural ln (base e) Landmarks: log10(1) = 0 (because 100 = 1) log10(10) = 1 (because 101 = 10)
Example: Logarithmic Re-expression Chapter 7 4/20/2017 Example: Logarithmic Re-expression Prostate specific antigen (PSA) not Normal in 60 year olds but the ln(PSA) is approximately Normal with μ = −0.3 and σ = 0.8 95% of ln(PSA) falls in μ ± 2σ = −0.3 ± (2)(0.8) = −1.9 to 1.3 Thus, 2.5% are above ln(PSA) 1.3; take anti-log of 1.3: e1.3 = 3.67 Since only 2.5% of population has values greater than 3.67 → use this as cut-point for suspiciously high results Basic Biostat
§7.2: Determining Normal Probabilities To determine a Normal probability when the value does not fall directly on a ±1σ, ±2σ, or ±3σ landmark, follow this procedure: 1. State the problem 2. Standardize the value (z score) 3. Sketch and shade the curve 4. Use Table B to determine the probability
Example: Normal Probability Step 1. Statement of Problem Chapter 7 4/20/2017 Example: Normal Probability Step 1. Statement of Problem We want to determine the percentage of human gestations that are less than 40 weeks in length We know that uncomplicated human pregnancy from conception to birth is approximately Normally distributed with μ = 39 weeks and σ = 2 weeks. [Note: clinicians measure gestation from last menstrual period to birth, which adds 2 weeks to the μ.] Let X represent human gestation: X ~ N(39, 2) Statement of the problem: Pr(X ≤ 40) = ? Basic Biostat
Standard Normal (Z) Variable Standard Normal variable ≡ a Normal random variable with μ = 0 and σ = 0 Called “Z variables” Notation: Z ~ N(0,1) Use Table B to look up cumulative probabilities Part of Table B shown on next slide…
Example: A Standard Normal (Z) variable with a value of 1 Example: A Standard Normal (Z) variable with a value of 1.96 has a cumulative probability of .9750.
Normal Probability Step 2. Standardize To standardize, subtract μ and divide by σ. The z-score tells you how the number of σ-units the value falls above or below μ
Steps 3 & 4. Sketch and Use Table B 3. Sketch and label axes 4. Use Table B to lookup Pr(Z ≤ 0.5) = 0.6915
Probabilities Between Two Points Let a represent the lower boundary and b represent the upper boundary of a range: Pr(a ≤ Z ≤ b) = Pr(Z ≤ b) − Pr(Z ≤ a) Use of this concept will be demonstrate in class and on HW exercises.
§7.3 Finding Values Corresponding to Normal Probabilities State the problem. Use Table B to look up the z-percentile value. Sketch 4. Unstandardize with this formula
Looking up the z percentile value Use Table B to look up the z percentile value, i.e., the z score for the probability in questions Look inside the table for the entry closest to the associated cumulative probability. Then trace the z score to the row and column labels.
Suppose you wanted the 97. 5th percentile z score Suppose you wanted the 97.5th percentile z score. Look inside the table for .9750. Then trace the z score to the margins. Notation: Let zp represents the z score with cumulative probability p, e.g., z.975 = 1.96
Finding Normal Values - Example Chapter 7 4/20/2017 Finding Normal Values - Example Suppose we want to know what gestational length is less than 97.5% of all gestations? Step 1. State the problem! Let X represent gestations length Prior problem established X ~ N(39, 2) We want the gestation length that is shorter than .975 of all gestations. This is equivalent to the gestation that is longer than.025 of gestations. Basic Biostat
Chapter 7 4/20/2017 Example, cont. Step 2. Use Table B to look up the z value. Table B lists only “left tails”. “less than 97.5%” (right tail) = “greater than 2.5%” (left tail). z lookup in table shows z.025 = −1.96 z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09 –1.9 .0287 .0281 .0274 .0268 .0262 .0256 .0250 .0244 .0239 .0233 Basic Biostat
3. Sketch 4. Unstandardize “The 2.5th percentile gestation is 35 weeks.”
7.4 Assessing Departures from Normality The best way to assess Normality is graphically Approximately Normal histogram Normal “Q-Q” Plot of same distribution A Normal distribution will adhere to a diagonal line on the Q-Q plot
A negative skew will show an upward curve on the Q-Q plot
A positive skew will show an downward curve on the Q-Q plot
Same data as previous slide but with logarithmic transform A mathematical transform can Normalize a skew
Leptokurtotic A leptokurtotic distribution (skinny tails) will show an S-shape on the Q-Q plot