Explaining the Normal Distribution What Is Normality? Explaining the Normal Distribution
Preliminaries: How to describe data Discussion Question: How do we describe data in Statistics? In this course, the way we describe data is by using the C.U.S.S. Method. We look at these characteristics: C. Center (i.e. Mean and Median) U. Unusual Features( i.e. Outliers present?) S. Shape (Skewness and\or Symmetry) S. Spread (max to min)
Continued Preliminaries: Boxplots Recall that a box plot is a standard way of displaying the distribution of data graphically using the five number summary. Lets build a box plot using our calculators! Ex 1: Make a box plot using the following distribution of numbers in your calculator. 2, 3, 5, 5, 6, 7, 7, 7, 8, 9, 9, 9 10, 10, 10, 10, 10, 10, 11, 11, 12, 14, 14, 14, 15, 16, 18, 18, 18, 18, 19, 19, 20, 20, 22, 22, 24, 24, 25, 26, 26, 27, 28, 28, 28, 33, 45, 50, 55, 66 After you obtain the graph in your calculator draw it in your notes, write down the five number summary, and then use C.U.S.S to describe the distribution.
Steps to draw a box plot in Calculator TI 83-84 Go to STAT button and push edit Copy the data into list L1 Push 2nd y= to get to the stat plot page Select plot 1 on Select Type: Boxplot (bottom left) Select Xlist: L1 Select Frequency 1 Select mark the first one Complete the questions
Preliminaries: Histograms Recall that a Histogram is a graphical representation of the data of a distribution using different ranges and frequencies. The distribution has a similar shape to a bar graph. Below is a visual representation of a Histogram.
Video: Building a Histogram Please watch this short video about how to build a Histogram and answer the following questions in your notes Histogram
What is the connection between Histograms and Normality? As more data is added to the a Histogram the shape becomes more and more similar to a bell curve. You can see the resemblance in the examples we covered and in the following picture
Defining the Normal Distribution To assume that a set of data follows a Normal Distribution is one of the most important assumptions in Statistics When we think about the concept of Normality consider a list of numbers that has low values, high values, and values in the middle. If you have LOTS of these types of numbers then they MIGHT follow a “Normal Distribution” Some examples of Normally distributed data are: The heights of all males in the United States, scores on the SAT exam, lengths of great white sharks, etc. Can you think of some examples based on the description given above? Write them in your notes.
CAUTION! The Normality Trap Not all data follows a Normal Distribution Data with outliers or skewness may not be Normally distributed Large samples will be closer to a Normal distribution than small samples Real life data is almost NEVER EXACTLY NORMAL.
Characteristics of a Normal Distribution The mean, median, and mode all have the same value The curve is bell shaped and symmetric about the line that crosses the mean The curve approaches, but never touches the x-axis as you move away from the mean The area under the curve is equal to 1 Almost all of the area under the curve exists within three standard deviations of the mean When data follows a normal distribution it is denoted N(𝜇,𝜎) where 𝜇 is the mean and 𝜎 is the standard deviation of the distribution
The Normal Distribution The middle represents the mean/median/50th percentile
How to draw a Normal Curve We first draw our axis and then we plot the mean and three standard deviations above and below the mean as illustrated in the diagram. Then draw the bell curve. Ex: In your notes draw the following curve. The data is N(10, 5).
The Empirical Rule 68% of the data is within 1 StDev of the mean Recall the characteristics of the Normal distribution said that almost all of the area under the curve exists within three standard deviations of the mean How do we know what the almost means? For this we have the Empirical Rule A.K.A The 68-95-99.7 Rule The empirical rule states that; 68% of the data is within 1 StDev of the mean 95% of the data is within 2 StDev of the mean 99.7% of the data is within 3 StDev of the mean Discussion Question: Where do you think this rule originated from?
Estimating Areas Under the normal curve using the empirical Rule Complete this example in your notes with a partner. Suppose the scores from your AP Stats exam are N(75, 5). Answer the following questions. Sketch the distribution What percentage of the scores is within 60 and 90? What percentage is the 50th percentile? What score would be the 16th percentile?
Z-SCores A z-score is how many standard deviations a data point is away from the mean Z-scores are used to find the area under the Normal curve by standardizing it. The Standard Normal Curve is a Normal curve with a mean of 0 and standard deviation 1. The formula for the z-score is as follows:
Continued Z-scores Recall our previous example with our test scores that were N(70, 5). Supposed I asked you to find the z for a score of 82. Using the formula we get 𝑧= 82−70 5 =2.4 Now to interpret this result we would say: A score of 82 in the test is 2.4 standard deviations above the mean If your z-score is negative then the data is z standard deviations away from the mean.
Using Z-scores to find the area under the Normal Curve Suppose we said that the show size for males in the U.S. is N(9.5, 1.25). How would you calculate the percentage of shoe sizes that are below 10? We could use the Empirical Rule to estimate this, but an quicker way is to use z-scores Let’s work through this example with our calculators Using the z-score formula we get: 𝑧= 10−9.5 1.5 =.3333 Now on the calculator press 2nd vars, normalcdf(-1,000, .333) and you get the answer. If it were to ask you for the percentage above 10, you would press 2nd vars, normalcdf(.33, -1,000).
Now You Try! Using the previous scenario answer the following questions in your notes with a partner. Find the percentage of shoe sizes above 8.5 Find the percentage of shoe sizes between 8.5 and 9.5 Find the percentage of show sizes below 9.25 Draw and shade the Normal Curves representing these scenarios
Checking for Normality using the Empirical Rule Discussion Question: How could we use the Empirical Rule to check if the data follows a Normal Distribution? Jot down some ideas in your notes.
Continued: Checking for normality The steps to checking the data for Normality are as follows Check the Empirical Rule (68-95-99.7 percent of the data lie within this boundary) Check the 1-variable stats (The closer the median is to the mean suggest that the data might be Normally distributed) Draw a Histogram or boxplot in your calculator and check C.U.S.S. The more symmetrical the graph the more evidence that suggest that the data is Normally Distributed
Final Activity: Great White sharks
Great White Sharks Below are the length of a sample of 44 Great white Sharks in feet. In your notes use the steps previously stated to check if the length of this sample of Great White Sharks is Normally Distributed? Write a Paragraph describing your findings. Discuss your findings with your partner. 18.7 12.3 18.6 16.4 15.7 18.3 14.6 15.8 14.9 17.6 12.1 16.7 17.8 16.2 12.6 13.8 12.2 15.2 14.7 12.4 13.2 14.3 16.6 9.4 18.2 13.6 15.3 16.1 13.5 19.1 22.8 16.8 19.7
Lesson Summary For describing data we use the C.U.S.S acronym (Refer to slide 2) The more samples you add to a histogram the more symmetrical it becomes the it shapes resembles a bell curve. The concept of Normality refers to a distribution of data that has low values, middle values, and high values in almost equal amounts The empirical Rule allows us to calculate the area under the Normal Curve Z-scores represent how many standard deviations a data point is away from the mean The Empirical Rule, and histograms are used to check if certain samples of data might follow a Normal Distribution.
Great White Shark Example and distribution clipart provided by Mr Great White Shark Example and distribution clipart provided by Mr. Pines’s Website. Konastats.com Chapter 2 powerpoint.