Download presentation
Presentation is loading. Please wait.
1
The Normal Distribution
Objectives: For variables with relatively normal distributions: Students should know the approximate percent of observations in a set of data that will fall between the mean and ± 1 sd, 2 sd, and 3 sd Students should be able to determine the range of values that will contain approximately 68%, 95%, and 99% of the observations in a set of data. The first learning objective for this module is that for variables with relatively normal distributions, students should know the approximate percent of observations in a set of data that will fall between the mean and + or – 1 standard deviation, 2 standard deviations, and 3 standard deviations. The second objective is that students should be able to determine the range of values that will contain approximately 68%, 95%, and 99% of the observations in a set of data.
2
The Normal Distribution
(also called the bell-shaped or Gaussian distribution) The normal distribution is the most common distribution we work with in statistics. It is also referred to as a bell-shaped distribution, bell-curve, or Gaussian distribution, named for Johann Carl Friedrich Gauss.
3
The Normal Distribution
The normal distribution is completely defined by the mean and standard deviation of a set of quantitative data: The mean determines the location of the curve on the x axis of a graph The standard deviation determines the height of the curve on the y axis The normal distribution is defined by the mean and standard deviation of a set of quantitative/numerical data. The mean determines the location of the curve on the x-axis of a graph and the standard deviation determines the height of the curve on the y-axis. If you change the mean, the distribution shifts on its x-axis; if you change the standard deviation, it changes the spread of the distribution. It is important to remember that there are an infinite number of normal distributions, one for every possible combination of a mean and standard deviation. There are an infinite number of normal distributions- one for every possible combination of a mean and standard deviation
4
Pr(X) on the y-axis refers to either frequency or probability.
Examples of Normal Distributions These are examples of three normal distributions. Each distribution has a different mean, but the same standard deviation. Therefore, the graphs are shifted to different places on the x-axis, due to the different means, but the shapes are identical because the standard deviation is the same. Pr(X) on the y-axis refers to either frequency or probability. Pr(X) on the y-axis refers to either frequency or probability.
5
Examples of Normal Distributions
These three normal distributions all have the same mean, but different standard deviations. Changes in the standard deviation result in changes in the shape of the distribution, without affecting the midpoint. A smaller standard deviation results in a narrower, more peaked curve. A larger standard deviation results in a wider, flatter curve.
6
Normal Distributions In biostatistics, we often examine continuous data, which is typically, but not always, approximately normally distributed. Generally, as sample size increases, the shape of a frequency distribution becomes more normally distributed. Many (but not all) continuous variables are approximately normally distributed. Generally, as sample size increases, the shape of a frequency distribution becomes more normally distributed.
7
Normal Distributions When data are normally distributed, the mode, median, and mean are identical and are located at the center of the distribution. Frequency of occurrence When data are normally distributed, the mode, median, and mean are identical and are located at the center of the distribution, as seen in this graph.
8
Skewness Quantitative variables may also have a skewed distribution:
When distributions are skewed, they have more extreme values in one direction than the other, resulting in a long tail on one side of the distribution. The direction of the tail determines whether a distribution is positively or negatively skewed. A positively skewed distribution has a long tail on the right, or positive side of the curve. A negatively skewed distribution has the tail on the left, or negative side of the curve. Interval/ratio level data may have a skewed distribution if there are extreme values in one director or the other that result in a long tail on one side of the distribution. A positively skewed distribution has a tail on the right, or positive side of the curve. A negatively skewed distribution has the tail on the left, or negative side of the curve.
9
Skewed Distributions Normal distribution
The top middle graph on this page demonstrates how a normal distribution appears visually, with the same mode, median, and mean in the middle of the curve. The graph at the bottom left demonstrates a positively skewed distribution. Notice the tail on the right side of the distribution and a mean value larger than the mode. The graph on the right demonstrates a negatively skewed distribution with a left tail and a mode value that is larger than the mean. Positively skewed distribution Negatively skewed distribution
10
Range of Observations For a normally distributed variable:
~68.3% of the observations lie between the mean and 1 standard deviation ~95.4% lie between the mean and 2 standard deviations ~99.7% lie between the mean and 3 standard deviations When data are normally distributed with the mode, median, and mean in the center of the curve, 68.3% of the observations lie between the mean and 1 standard deviation on either side of the mean, noted as + or – 1 standard deviation, or as shown below the graph, mu + or – 1 sigma. In a normal distribution, 95.4% of the observations fall between the mean and + or – 2 standard deviations and 99.7% of the observations fall between the mean and + or – 3 standard deviations.
11
Heart Rate Example For the heart rate data for 84 adults:
Mean HR = 74.0 bpm SD = 7.5 bpm Mean 1SD = 74.0 7.5 = bpm Mean 2SD = 74.0 15.0 = bpm Mean 3SD = 74.0 22.5 = bpm Using the same heart rate data as we used previously, the mean heart rate for the set of 84 observations is 74.0 beats per minute with a standard deviation of 7.5 beats per minute. So, 68% of the data lie between the mean and + or - 1 standard deviation, or between 66.5 and 81.5 beats per minute. 95% of the data lie between the mean and + or – 2 standard deviations, or between 59 and 89 beats per minute. 99% of the observations lie between the mean and + or – 3 standard deviations, or between 51.5 and 96.5 beats per minute.
12
Heart Rate Example HR Data:
57/84 (67.9%) subjects are between mean ± 1SD 82/84 (97.6%) are between mean ± 2SD 84/84 (100%) are between mean ± 3SD Mean +3 SD +2 SD + 1SD -1 SD -2 SD -3 SD Here we demonstrate a different type of plot for the same data. In this case, the number of subjects who experienced the different heart rates are plotted. We can also add the markings for + or – 1, 2, and 3 standard deviations. Looking at the data in this way, we can still see that 57 of the 84, or 67.9% of the subjects fall between the mean and + or – 1 standard deviation, 82 out of 84, or 97.6% fall between the mean and + or – 2 standard deviations, and 84 out of 84, or 100% of the observations are between the mean and + or – 3 standard deviations.
13
Reference (“Normal”) Ranges in Medicine
The “normal” range in medical measurements is the central 95% of the values for a reference population, and is usually determined from large samples representative of the population. The central 95% is approximately the mean 2 sd* Some examples of established reference ranges are: Serum “Normal” range fasting glucose mg/dL sodium mEq/L triglycerides mg/dL In medicine, the “normal” range of values is typically considered to be the central 95% of the values for a reference population. This range is determined by selecting a large sample that is representative of the population, taking their measurements, and then determining the range of values that fall between the mean value and + or – 2 standard deviations. Actually, the value is 1.96 standard deviations, but we round it to 2 standard deviations for convenience sake. You can see three examples here. The “normal” range for fasting glucose is milligrams per deciliter. The normal range for sodium is , and the normal range for triglycerides is Note: The value is actually 1.96 sd but for convenience this is usually rounded to 2 sd.
14
The Standard Normal Distribution
A normal distribution with a mean of 0, and sd of 1 The distribution is also called the z distribution Any normal distribution can be converted to the standard normal distribution using the z transformation. Each value in a distribution is converted to the number of standard deviations the value is from the mean. The transformed value is called a z score. The standard normal distribution is a particular type of normal distribution in which the mean is zero with a standard deviation of 1. The standard normal distribution is also called the z-distribution. You can convert any normal distribution to a standard normal distribution by using a z-transformation. A z-transformation simply converts each value in a distribution to the number of standard deviations that value is from the mean. This transformed value is called a z-score.
15
Formula for the z transformation
The formula to transform data to z-scores is X (the value) – mu (the population mean) divided by sigma, the population standard deviation. The transformed z-scores can then be used to determine areas under the curve for any normal distribution. Once the data are transformed to z-scores, the standard normal distribution can be used to determine areas under the curve for any normal distribution.
16
Example of a z-transformation
If the population mean heart rate is 74 bpm, and the standard deviation is 7.5, the z score for an individual with HR = 80 bpm is: Let’s go through an example of a z-transformation. If the population mean heart rate is 74 beats per minute, and the population standard deviation is 7.5, the z-score for an individual with a heart rate of 80 beats per minute is: the value minus the mean, divided by the standard deviation. Or, 80 minus 74 divided by The result is a z-score of 0.8 which means that the individual’s heart rate of 80 beats per minute is 0.8 standard deviations above the mean for the population. The individual’s HR of 80 bpm is 0.8 standard deviations above the mean.
17
Rule of Thumb #1 The important z-scores to know are
The z-value can be looked up in a table for the standard normal distribution to determine the lower and upper areas defined by a z-score of 0.8 (the areas are the lower 78.8% and upper 21.2%) You will not need to calculate z-scores or find corresponding areas under the curve for z-scores in this class, but you will be expected to know the following: The important z-scores to know are ±1.645, ±1.960*, ±2.575 Note: when calculating by hand, it is OK to round to 2 The calculated z-value can be looked up in a z-table found at the back of any statistics textbook or online. Looking the value up in the table will give the corresponding lower and upper areas defined by the z score. In our example of a z-score of 0.8, the areas are the lower 78.8% and the upper 21.2% of observations. Note: you will not need to calculate z-scores or find corresponding areas under the curve for z-scores for this class. Z-scores are often used as rules of thumb, so the important z-scores to remember are + or – 1.645, + or – 1.960, and + or – Another note: when calculating by hand, it is okay to round to 2.
18
Rule of Thumb #2 The total area under the normal distribution curve is 1: 90% of the area is between ± sd 95% of the area is between ± sd 99% of the area is between ± sd Another rule of thumb: when looking at the area under a normal distribution curve, remember that 90% of the area is between + and – standard deviations from the mean, 95% of the area is between + and – 1.96 standard deviations from the mean, and 99% of the area is between + and – standard deviations from the mean.
19
The Normal Distribution & Confidence Intervals
90% of the area is between ± sd 95% of the area is between ± sd 99% of the area is between ± sd These are the most commonly used areas for defining Confidence Intervals which are used in inferential statistics to estimate population values from sample data 90%, 95%, and 99% are the most commonly used areas for defining confidence intervals in inferential statistical tests to estimate population values from sample data. We will examine inferential statistics in the modules for confidence intervals, hypothesis testing, and choosing a statistical test.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.