Normal Distribution Prepared by: Ameer Sameer Hamood University of Babylon Information technology - information networks
Normal Distributions The most important probability distribution in statistics is the normal distribution. A normal distribution is a continuous probability distribution for a random variable, x. The graph of a normal distribution is called the normal curve. x
Normal Distributions Many things closely follow a Normal Distribution: heights of people size of things produced by machines errors in measurements blood pressure marks on a test Social Network Cryptograph Telecommunications Internet Of Things
Properties of Normal Distributions The mean, median, and mode are equal. The normal curve is bell-shaped and symmetric about the mean. The total area under the curve is equal to one. The normal curve approaches, but never touches the X axis as it extends farther and farther away from the mean.
Properties of Normal Distributions * symmetry about the center * 50% of values less than the mean and 50% greater than the mean
The normal (Gaussian) distribution “µ” – the Greek letter “mu,” which is the Mean “σ” – the Greek letter “sigma,” which is the Standard Deviation Note: In a normal distribution, only 2 parameters are needed, namely μ and σ2
PARAMETER The normal distribution can be completely specified by two parameters: 1. Mean 2. Standard deviation If the mean and standard deviation are known, then one essentially knows as much as if one had access to every point in the data set.
PARAMETER A normal distribution can have any mean and any positive standard deviation. Inflection points The mean gives the location of the line of symmetry. Inflection points 3 6 1 5 4 2 x 3 6 1 5 4 2 9 7 11 10 8 x Mean: μ = 3.5 Standard deviation: σ 1.3 Mean: μ = 6 Standard deviation: σ 1.9 The standard deviation describes the spread of the data.
Which curve has the greater mean? Example: Which curve has the greater mean? Which curve has the greater standard deviation? 3 1 5 9 7 11 13 A B x Answer: 1- The line of symmetry of curve A occurs at x = 5. The line of symmetry of curve B occurs at x = 9. Curve B has the greater mean. 2- Curve B is more spread out than curve A, so curve B has the greater standard deviation.
Curves with different means, different standard deviations PARAMETER Curves with different means, same standard deviation Curves with different means, different standard deviations
The Standard Normal Distribution It makes life a lot easier for us if we standardize our normal curve, with a mean of zero and a standard deviation of 1 unit. If we have the standardized situation of μ = 0 and σ = 1, then we have: Standard Normal Curve μ = 0, σ = 1
The Standard Normal Distribution The normal random variable of a standard normal distribution is called a standard score or a z-score. Every normal random variable X can be transformed into a z score via the following equation: z = (X - μ) / σ z is the "z-score" (Standard Score) X is a normal random variable(x is the value to be standardized), μ is the mean of X, and σ is the standard deviation of X.
Example Say μ=2 and σ =1/3 in a normal distribution. The graph of the normal distribution is as follows:
The following graph represents the same information, but it has been standardized so that μ = 0 and σ = 1
Standard normal vs Normal Distribution
Normal Distributions empirical rule Because of its unique bell shape, probabilities for the normal distribution follow the empirical rule or the 68-95-99.7 rule. Clearly, given a normal distribution, most outcomes will be within 3 standard deviations of the mean. This figure illustrates all three components of the Empirical Rule. The reason that so many (about 68%) of the values lie within 1 standard deviation of the mean in the Empirical Rule is because when the data are bell-shaped, the majority of the values are mounded up in the middle, close to the mean (as the figure shows). Adding another standard deviation on either side of the mean increases the percentage from 68 to 95, which is a big jump and gives a good idea of where “most” of the data are located. Most researchers stay with the 95% range (rather than 99.7%) for reporting their results, because increasing the range to 3 standard deviations on either side of the mean (rather than just 2) doesn’t seem worthwhile, just to pick up another 4.7% of the values. The Empirical Rule tells you about what percentage of values are within a certain range of the mean. These results are approximations only, and they only apply if the data follow a normal distribution. However, the Empirical Rule is an important result in statistics because the concept of “going out about two standard deviations to get about 95% of the values” is one that you see mentioned often with confidence intervals and hypothesis tests.
mean and standard Example Example At the New Age Information Corporation, the ages of all new employees hired during the last 5 years are normally distributed. Within this curve, 95.4% of the ages, centered about the mean, are between 24.6 and 37.4 years. Find the mean age and the standard deviation of the data.
mean and standard Example Solution: As was seen in Example 95.4% implies a span of 2 standard deviations from the mean. The mean age is symmetrically located between -2 standard deviations (24.6) and +2 standard deviations (37.4). From 31 to 37.4 (a distance of 6.4 years) is 2 standard deviations. Therefore, 1 standard deviation is (6.4)/2 = 3.2 years.
mean and standard Example Questions 95% of students at school weigh between 62 kg and 90 kg. Assuming this data is normally distributed, what are the mean and standard deviation? Answer : The mean is halfway between 62 kg and 90 kg: Mean = (62 kg + 90 kg)/2 = 76 kg 95% is 2 standard deviations either side of the mean (a total of 4 standard deviations) so: 1 standard deviation = (90 kg - 62 kg)/4 = 28 kg/4 = 7 kg
Questions A machine produces electrical components. 99.7% of the components have lengths between 1.176 cm and 1.224 cm. Assuming this data is normally distributed, what are the mean and standard deviation? Answer : The mean is halfway between 1.176 cm and 1.224 cm: Mean = (1.176 cm + 1.224 cm)/2 = 1.200 cm 99.7% is 3 standard deviations either side of the mean (a total of 6 standard deviations) so: 1 standard deviation = (1.224 cm - 1.176 cm)/6 = 0.048 cm/6 = 0.008 cm
Examples We can take any Normal Distribution and convert it to The Standard Normal Distribution. Example: Travel Time A survey of daily travel time had these results (in minutes): 26, 33, 65, 28, 34, 55, 25, 44, 50, 36, 26, 37, 43, 62, 35, 38, 45, 32, 28, 34 The Mean is 38.8 minutes, and the Standard Deviation is 11.4 minutes. Convert the values to z-scores ("standard scores"). To convert 26: first subtract the mean: 26 - 38.8 = -12.8, then divide by the Standard Deviation: -12.8/11.4 = -1.12 So 26 is -1.12 Standard Deviations from the Mean Here are the first three conversions Original Value Calculation Standard Score (z-score) 26 (26-38.8) / 11.4 = -1.12 33 (33-38.8) / 11.4 = -0.51 65 (65-38.8) / 11.4 = +2.30 ... ... ...
Examples Answer
Examples
Normal Probabilities We are often interested in the probability that z takes on values between z0 and z1
What is the probability of an infant weighing more than 5000g?
Finding z-Scores Example: Find the z-score that corresponds to a cumulative area of 0.9973. Appendix B: Standard Normal Table Find the z-score by locating 0.9973 in the body of the Standard Normal Table. The values at the beginning of the corresponding row and at the top of the column give the z-score. The z-score is 2.78.
Examples Edge Perspectives in Social Network
Examples Discovery Method in the Internet of Things
Examples Discovery Method in the Internet of Things
Examples Anomaly detection and a simple algorithm with probabilistic approach.