TOPIC 5 Normal Distributions
Start Thinking As a web designer you face a task, one that involves a continuous measurement of downloading time which could be any value and not just a whole number. How can you answer the following questions: What proportion of the homepage downloads take more than 10 seconds? How many seconds elapse before 10% of the downloads are complete? etc.
Continuous Probability Distributions Uniform Exponential Normal Gamma Weibull Beta
Normal Distribution Also called Gaussian distribution ‘Bell-shaped’ & symmetrical Mean, median, mode are equal Random variable has infinite range Area under the curve is 1 (the probability equals 1) σ Can be used to approximate discrete probability distributions, for example: Binomial and Poisson Basis for classical statistical inference
Probability Density Function = Standard deviation = 3.14159; e = 2.71828 x = Value of random variable (– < x < ) = Mean The mean and the variance are The notation that denotes the random variable X has a normal distribution
f(x) B A C x Effect of Varying Parameters (μ & σ) 5 10 Normal distributions differ by mean & standard deviation Each distribution would require its own table. f(x) B A C x 5 10 That’s an infinite number of table! Then we need a “standardized” normal distribution
Standard Normal Distribution The Standard Normal Distribution One table! Normal Distribution Standard Normal Distribution s s = 1 m m = 0 X Z Negative Positive
Normal Distribution Probability If X ~ N(μ, σ2) then the transformed random variable is The random variable Z is known as the “standardized” version of the random variable X. The probability values of a general normal distribution can be related to the cumulative distribution function of the standard normal distribution, Φ(z) where
Standard Normal Distribution Example Normal Distribution Standard Normal Distribution s = 1 s = 0.8 m = 3 m = 0 X Z 3 4.2 z = 1.5 Therefore
Standard Normal Tables The values of the cumulative distribution function of the standard normal distribution, Φ(z) or the probability P(Z ≤ z) is already tabulated m = 0 s = 1 Z z = ?
Normal Distribution Probability We just have the table of the cumulative distribution function of the standard normal distribution, Φ(z) or P(Z ≤ z) to find P(X ≤ a). By using the same table, we can find the other probabilities m m a a b X X
Standard Normal Distribution Normal Distribution μ = 5, σ = 10 : P(5 < X< 24.6) = ? Normal Distribution Standard Normal Distribution s = 10 s = 1 m = 5 m = 0 5 24.6 X 1.96 Z
Standard Normal Probability Table Normal Distribution μ = 5, σ = 10 : P(5 < X< 24.6) = ? Standard Normal Probability Table Look up the table ! 0.06 Z 0.04 0.05 1.8 0.9671 0.9678 0.9686 s = 1 1.9 0.9750 0.9738 0.9744 0.4750 2.0 0.9793 0.9798 0.9803 m = 0 2.1 0.9838 0.9842 0.9846 1.96 Z
Standard Normal Distribution Normal Distribution μ = 5, σ = 10 : P(X ≥ 8) = ? Look up the table ! please Normal Distribution Standard Normal Distribution s = 10 s = 1 m = 5 m = 0 5 8 X 0.3 Z
Normal Distribution Example You work in Quality Control for GE. Light bulb life has a normal distribution with = 2000 hours and = 200 hours. What’s the probability that a bulb will last between 1800 and 2200 hours? less than 1470 hours? more than 2500 hours? Allow students about 10-15 minutes to solve this.
Standardized Normal Distribution Example Solution a) Normal Distribution Standardized Normal Distribution s = 200 s = 1 m = 2000 m = 0 1800 2200 -1.0 X 1.0 Z
Standardized Normal Distribution Example Solution b) Normal Distribution Standardized Normal Distribution s = 200 s = 1 0.0040 m = 2000 m = 0 X 1470 -2.65 Z
Standardized Normal Distribution Example Solution c) Normal Distribution Standardized Normal Distribution s = 200 s = 1 m = 2000 m = 0 2500 X 2.5 Z
The Empirical Rule
Standard Normal Probability Table Finding Random Variable X for Known Probabilities Given that P(X ≤ a) = 0.6216, what is a? Standard Normal Probability Table Firstly, find the value of z ! Z .00 0.2 0.0 .5000 .5040 .5080 0.1 .5398 .5438 .5478 .5793 .5832 .5871 .6179 .6255 .01 0.3 .6217 .6217 s = 1 m = 0 0.31 ? Z The closest value
Standard Normal Distribution Finding Random Variable X for Known Probabilities Secondly, find the value of X = a ! Normal Distribution Standard Normal Distribution s = 10 s = 1 .6217 .6217 m = 5 8.1 ? X m = 0 0.31 Z
Exercise The thicknesses of metal plates made by a particular machine are normally distributed with a mean 0f 4.3 mm and a standard deviation of 0.12 mm What is the proportion of the metal plates that have thickness outside the range of 4.1 to 4.5 mm What are the upper and lower quartiles of the metal plate thickness? What is the value of c for which there is 80% probability that a metal plate has a thickness within the interval [4.3 – c, 4.3 + c]?
Answer to the Exercise μ = 4.3 mm and σ = 0.12 mm a) b) Lower quartile: P(X ≤ a) = 0.25 and upper quartile: P(X ≤ a) = 0.75
Answer to the Exercise c) It means P (a ≤ X ≤ b) = 80%, where P (X ≤ a) = 10% = 0.1 or P (X ≤ b) = 90% = 0.9. Pick either one.
Linear Combination of Normal Random Variables Linear Functions of a Normal Random Variable If X ~ N(μ, σ2) and a and b are constants then The Sum of Two Independent Normal Random Variables If X1 ~ N(μ1, σ12) and X2 ~ N(μ2, σ22) are independent random variables then Averaging Independent Normal Random Variables If Xi ~ N(μ, σ2), 1≤ i ≤ n, are independent random variables then their average is distributed
Example The annual return of the stock of company A, XA say (in percent), is distributed In addition, suppose that the annual return from the stock of company B, XB say, is distributed independent of the stock of company A. What is the probability that company B’s stock performs better than company A’s stock? What is the probability that company B’s stock performs at least 2% points better than company A’s stock? Allow students about 10-15 minutes to solve this.
Example Solution Let Y = XB – XA , then Performs better means Y ≥ 0. Allow students about 10-15 minutes to solve this.
Example Solution b) It means Y ≥ 2.0. Allow students about 10-15 minutes to solve this.
Normal Approximations to the Binomial Distribution Not all binomial tables exist Requires large sample size Gives approximate probability only Need correction for continuity n = 10 p = 0.50 .0 .1 .2 .3 2 4 6 8 10 x f(x) The distribution B(n, p) can be approximated by a normal distribution with the mean and variance
x a Normal Approximations to the Binomial Distribution f(x) .3 .2 .1 Probability Added by Normal Curve .3 .2 As the number of vertical bars (n) increases, the errors due to approximating with the normal decrease. .1 .0 x a Binomial Distribution: the area of all the ‘orange’ bars Normal Approximation: the area starting from the ‘blue’ vertical line to the left. So it needs correction of a ‘half’ in order to have the same area as the Binomial
a Correction for Continuity -0.5 +0.5 A 1/2 unit adjustment to discrete variable Improves accuracy Correction for each of four cases: For P(X ≥ a), use the area above â = (a – 0.5). For P(X > a), use the area above â = (a + 0.5). For P(X ≤ a), use the area below â = (a + 0.5). For P(X < a), use the area below â = (a – 0.5). a -0.5 +0.5
Normal Approximation Procedure Normal approximations to the binomial distribution work well as long as For each of four cases above, use where
Example Suppose that a fair coin is tossed n times. The distribution of the number of heads obtained, X, is B(n, 0.5). If n = 100, what is the probability of obtaining between 45 and 55 heads? are satisfied since
Example Solution Using a statistical software or Excel, the exact solution of the binomial probability is 0.7287. The difference is just about 0.0001
Central Limit Theorem If X1, …, Xn is a sequence of independent identically distributed random variables with a mean μ and a variance σ2 (not necessarily normal distributed), then the distribution of their average X can be approximated by a distribution. Similarly, the distribution of the sum X1 + … + Xn can be approximated by a distribution. The general rule is that the approximation is adequate as long as n ≥ 30
Any Questions ?