Density Curves Normal Distribution Area under the curve

Slides:



Advertisements
Similar presentations
Thinking about variation. Learning Objectives By the end of this lecture, you should be able to: – Discuss with an example why it is important to know.
Advertisements

Using the Rule Normal Quantile Plots
Standard Normal Table Area Under the Curve
Chapter 1 Displaying the Order in a Group of Numbers
Objective To understand measures of central tendency and use them to analyze data.
Density Curves Normal Distribution Area under the curve.
Probability, contd. Learning Objectives By the end of this lecture, you should be able to: – Describe the difference between discrete random variables.
The distribution of heights of adult American men is approximately normal with mean 69 inches and standard deviation 2.5 inches. Use the rule.
Standard Deviation Z Scores. Learning Objectives By the end of this lecture, you should be able to: – Describe the importance that variation plays in.
NOTES The Normal Distribution. In earlier courses, you have explored data in the following ways: By plotting data (histogram, stemplot, bar graph, etc.)
Data and Variation.
The Normal distributions BPS chapter 3 © 2006 W.H. Freeman and Company.
Central Tendency & Dispersion
IPS Chapter 1 © 2012 W.H. Freeman and Company  1.1: Displaying distributions with graphs  1.2: Describing distributions with numbers  1.3: Density Curves.
Ch 2 The Normal Distribution 2.1 Density Curves and the Normal Distribution 2.2 Standard Normal Calculations.
Intro to Inference & The Central Limit Theorem. Learning Objectives By the end of this lecture, you should be able to: – Describe what is meant by the.
Outline of Today’s Discussion 1.Displaying the Order in a Group of Numbers: 2.The Mean, Variance, Standard Deviation, & Z-Scores 3.SPSS: Data Entry, Definition,
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
Density Curves & Normal Distributions Textbook Section 2.2.
The Normal Distributions.  1. Always plot your data ◦ Usually a histogram or stemplot  2. Look for the overall pattern ◦ Shape, center, spread, deviations.
Chapter 2 The Normal Distributions. Section 2.1 Density curves and the normal distributions.
Thursday, May 12, 2016 Report at 11:30 to Prairieview
Continuous random variables
The distribution of heights of adult American men is approximately normal with mean 69 inches and standard deviation 2.5 inches. Use the rule.
2.2 Normal Distributions
Modeling Distributions of Data
Chapter 4: The Normal Distribution
CHAPTER 2 Modeling Distributions of Data
CHAPTER 2 Modeling Distributions of Data
Numerical Descriptive Measures
Good Afternoon! Agenda: Knight’s Charge-please wait for direction
Descriptive Statistics (Part 2)
Distribution of the Sample Means
Density Curve A mathematical model for data, providing a way to describe an entire distribution with a single mathematical expression. An idealized description.
CHAPTER 3: The Normal Distributions
Density Curves and Normal Distribution
CHAPTER 2 Modeling Distributions of Data
Chapter 2: Modeling Distributions of Data
Introduction to Summary Statistics
Empirical Rule Rule Ch. 6 Day 3 AP Statistics
Warm-up We are going to collect some data and determine if it is “normal” Roll each pair of dice 10 times and record the SUM of the two digits in your.
ID1050– Quantitative & Qualitative Reasoning
Introduction to Summary Statistics
Inferential Statistics
Chapter 2 Data Analysis Section 2.2
CHAPTER 2 Modeling Distributions of Data
CHAPTER 2 Modeling Distributions of Data
CHAPTER 2 Modeling Distributions of Data
Section 2.1 Density Curves & the Normal Distributions
Normal Distribution Z-distribution.
Honors Statistics The Standard Deviation as a Ruler and the Normal Model Chapter 6 Part 3.
CHAPTER 2 Modeling Distributions of Data
CHAPTER 2 Modeling Distributions of Data
CHAPTER 2 Modeling Distributions of Data
CHAPTER 2 Modeling Distributions of Data
Advanced Algebra Unit 1 Vocabulary
CHAPTER 3: The Normal Distributions
CHAPTER 2 Modeling Distributions of Data
Standard Normal Table Area Under the Curve
Inference: Confidence Intervals
Standard Deviation Z Scores
Density Curves Normal Distribution Area under the curve
Using the Rule Normal Quantile Plots
Intro to Inference & The Central Limit Theorem
Probability, contd.
Image from Minitab Website
Using the Rule Normal Quantile Plots
Standard Normal Table Area Under the Curve
CHAPTER 2 Modeling Distributions of Data
Thinking about variation
Presentation transcript:

Density Curves Normal Distribution Area under the curve

Learning Objectives By the end of this lecture, you should be able to: Describe what is meant by a density curve Be able to identify normal (bell-shaped), skewed, bimodal, uniform distributions from a density curve Describe the most common type of distribution encountered in nature Be able to estimate areas under a Normal density curve

You should know this term. Density Curve Density curve: When applied to a histogram, a density curve is a line drawn as a smooth approximation to the histogram. A density curve is a mathematical model of a distribution. That is, it is a summary of the histogram. It is not a perfect summary of the original histogram from which it originated. Much like “drawing the best line” in linear regression that you have all likely seen in high-school, we attempt to draw the best summary line over the histogram. We will typically have our statistical software do this for us. Here we have a bell-shaped distribution. It gets this name because when you draw a density curve over the histogram, the curve is shaped like a bell. This is why this density curve has often been called the “bell curve”. The proper name for a distribution that has a bell-shape, is a normal distribution. You should know this term.

Density curves can be in any shape. It all depends on the distribution. However, there are some shapes that we tend to see much more frequently than others. Here are some of the common distributions we’ve discussed. For each of these histograms, a density curve has been drawn over it. Left skewed Right skewed Bimodal Uniform

You should be able to identify a distribution by looking at only the density curve (as opposed to being provided with the entire histogram). Identify the distribution shape of the following: Right skewed Bimodal Uniform

Normal Distribution If you took a large sample of observations and graphed any of the following: Heights Corn yield per year in Indiana SAT (or ACT) Scores Blood pressure Age of graduate students at DePaul Weight of M&Ms per large package Etc, etc, etc you would see that they all had a normal distribution. When looking at data, the Normal distribution is the most commonly encountered distribution in the ‘real world’. Because this distribution is so common, we’re going to spend quite a lot of time studying it and learning how to find out all kinds of statistics from it.

Example of a dataset that shows a Normal distribution One study looked at the gestation (pregnancy) time of a group of women who were given prenatal vitamins. After creating the histogram, the following density curve was drawn. You can see that this dataset has a normal distribution. This tells us that the most common gestation period was a range in the area of, say, 240-260 days. As you might expect, as you go further and further out (i.e. longer and longer or shorter and shorter gestation times), there are fewer and fewer people who had those gestation times. For example, as you might expect, while there are some women who had gestation periods of less than 210 days (or longer than, say, 290 days), they are relatively rare. This is the essence of a normal distribution: The majority of people cluster around some value in the middle (in this case, about 250 days), but as you go higher and higher (or lower and lower) you find relatively few observations.

The “Normal Curve” A density curve drawn over a Normal distribution is called (not surprisingly) the Normal density curve (or just the ‘normal curve’). Notice that while the density curve is exactly symmetric, it does not perfectly outline the histogram. That is, a density curve is an idealized description of the data. Still, even though the curve is higher than the histogram at some points, and below it at others, the mathematical model used to generate the density curve will turn out to be very accurate for our calculations.

A Density Curve is a Model That is Based on Existing Data The histogram is based on existing data. For example, we may look at a few hundred students’ vocabulary scores and create the histogram below. However, if we want to make probability predictions for the future, we can do so by drawing a density curve over the histogram. This is analogous to drawing the “best line” in a linear regression model. That is, a density curve is an idealized description of the data. The curve is higher than the histogram at some points, and below it at others. The mathematical model used to generate the density curve will turn out to be very useful for statistical analyses and predictions.

Not all distributions are normal! While many datasets that we look at do follow a Normal distribution, many other datasets do not. For example, people’s incomes are not Normally distributed. (They are typically right-skewed). The age at which people are diagnosed with Inflammatory Bowel Disease is typically bimodal.

“Normal” Density Curves Normal curves have the following properties: Symmetric Unimodal Bell-shaped Curves that have these properties are called ‘Normal curves’ and the data distributions they describe are called ‘Normal distributions’ The idea of a Normal curve does not imply that other kinds of curves are somehow abnormal! It’s simply the term that we use – and it is a term you must be comfortable with!

How we use density curves One of the reasons we love density curves, is that by estimating the area under the curve, we can make various analyses and predictions about the population. Important: Be sure you understand, however, that the rules we are going to study over the next few lectures, only apply to density curves of Normal distributions. These tools will NOT apply (properly!) to density curves for, say, skewed distributions. Example: Suppose we take our sample of 25 women’s heights, plot them on a histogram. We then summarize this histogram by drawing a density curve. If that density curve turns out to show a Normal distribution, we can use it to make all kinds of statistical estimates such as: What percentage of women in our population would be more than 6’ tall? What percentage of women are between 5’0 and 5’5? What is the likelihood of encountering women shorter than 4’6? What is the height of the tallest 90th percentile of women? Etc However, in order to do all of this, we must learn how to calculate the area under the density curve.

Area under the curve Here is an example of a histogram that was obtained from a sample of several hundred students who took a vocabulary test. Scores ranged from 0 to 12. After obtaining this histogram, we drew a density curve over it. By analyzing this density curve, we can make predictions about the overall population of ALL of the students who would take this exam. Example: If we want to predict percentage of students who would be expected to score below 6 on this exam, we could do so by finding out the area under the curve to the left of 6.0. It is shaded on this diagram. This percentage is somewhere in the neighborhood of 30%. Determining the exact percentage will be the subject of an upcoming lecture. Note: I hope it also makes sense that if the shaded area tells us that about 30% of students scored below 6, it stands to reason that about 70% of students scored higher than 6.

Mean of a Normal distribution On a Normal density curve, the peak / midpoint / midline is the mean. (Represented in this graphic by the black line). I hope you can see that the area to the left of the line contains 50% of the area under the curve, while the area to the right also contains 50% of the area under the curve. In terms of the graph seen here, if we estimate the midpoint to be a score of 7, we can say that about 50% of the population scored below 7 and 50% scored above.

Examples: Area under the curve How would you determine the percentage of students who would be expected to score greater than 10? In this case, we would want to calculate the area under the curve above the score of 10. It would probably be somewhere around 5%. Again, we will learn how to accurately determine this number in an upcoming lecture. How would you determine the percentage of students expected to score between 6 and 8? In this case, we would want to calculate the area under the curve between those two numbers. About 50% of students would achieve a score higher than _____ ? Answer: Draw a line down the very center of the curve. The area under the curve to the right of that line represents 50% of students. That line is right about a score of 7. So you could say that 50% of students scored above 7 (and, or course, about 50% of students scored below 7).

Estimate the area under the curve While we will shortly learn how to estimate the area under a curve pretty accurately, you must also be able to make some ballpark ‘guesstimates’. Example: What percentage of students would you predict will score below 6 on this exam? Answer: On the graph, it would be the shaded area here. A ballpark estimate would be somewhere around 30-40%. Example: What percentage of students would score above 9? Answer: A reasonable guess might be a number, in the vicinity of 20-30%. Don’t worry about accuracy here, just focus on being in the general area. Example: What percentage of students would score less than 2? Answer: A very low number! E.g.: 1% would be a good guess. Example: What percentage of students would score more than 7? Answer: Since 7 is right around the midpoint, then the area under the curve to the right of the midline is 50%. I will ask you to do at least a couple of these estimations on your quiz and/or exams. However, you will not have to be super-accurate – you just need to be in the ballpark.

More practice estimating: Example: Based on this density curve, about what percentage of American women would have be predicted to have gestations gestation longer than 250 days? Answer: About 50%. Example: About what percentage of women would have a gestation of less than 210 days? Answer: A reasonable guess would be a low-ish number such as 15%. Example: About what percentage of women would have a gestation less than 310 days? Answer: A high number! E.g.: 99% would be a good guess. Example: About 30% of women would have a gestation longer than _____? Choose among the following: 210 days, 230 days, 250 days, 270 days. Answer: The only reasonable option here would be 270 days. Note the last question: Turning it around like that is a common way that stats people love to throw on exams!

Coming up… For the moment, we have been estimating the area under the curve. Very soon, we will look at how to accurately determine the area under a Normal density curve. Still, if you can’t estimate the answers to the previous questions we have gone through then you should absolutely not go on to the ‘number crunching’. Make sure you get the concept down before moving on.