Presentation is loading. Please wait.

Presentation is loading. Please wait.

CHAPTER 2 Modeling Distributions of Data

Similar presentations


Presentation on theme: "CHAPTER 2 Modeling Distributions of Data"— Presentation transcript:

1 CHAPTER 2 Modeling Distributions of Data
2.2 Density Curves and Normal Distributions

2 Density Curves and Normal Distributions
ESTIMATE the relative locations of the median and mean on a density curve. ESTIMATE areas (proportions of values) in a Normal distribution. FIND the proportion of z-values in a specified interval, or a z-score from a percentile in the standard Normal distribution. FIND the proportion of values in a specified interval, or the value that corresponds to a given percentile in any Normal distribution. DETERMINE whether a distribution of data is approximately Normal from graphical and numerical evidence.

3 Exploring Quantitative Data
In Chapter 1, we developed a kit of graphical and numerical tools for describing distributions. Now, we’ll add one more step to the strategy. Exploring Quantitative Data Always plot your data: make a graph, usually a dotplot, stemplot, or histogram. Look for the overall pattern (shape, center, and spread) and for striking departures such as outliers. Calculate a numerical summary to briefly describe center and spread. 4. Sometimes the overall pattern of a large number of observations is so regular that we can describe it by a smooth curve.

4 Density Curves The overall pattern of this histogram of the scores of all 947 seventh-grade students in Gary, Indiana, on the vocabulary part of the Iowa Test of Basic Skills (ITBS) can be described by a smooth curve drawn through the tops of the bars. Example A density curve is a curve that is always on or above the horizontal axis, and has area exactly 1 underneath it. A density curve describes the overall pattern of a distribution. The area under the curve and above any interval of values on the horizontal axis is the proportion of all observations that fall in that interval.

5 A Density Curve is an idealized description of a distribution of data
The area under the curve and above any range of values is the proportion of all observations that fall in that range. AP Statistics, Section 2.1, Part 1

6 Describing Density Curves
Our measures of center and spread apply to density curves as well as to actual sets of observations. Distinguishing the Median and Mean of a Density Curve The median of a density curve is the equal-areas point, the point that divides the area under the curve in half. The mean of a density curve is the balance point, at which the curve would balance if made of solid material. The median and the mean are the same for a symmetric density curve. They both lie at the center of the curve. The mean of a skewed curve is pulled away from the median in the direction of the long tail.

7 Describing Density Curves
A density curve is an idealized description of a distribution of data. We distinguish between the mean and standard deviation of the density curve and the mean and standard deviation computed from the actual observations. The usual notation for the mean of a density curve is µ (the Greek letter mu). We write the standard deviation of a density curve as σ (the Greek letter sigma).

8 TYPES OF DENSITY CURVES
Since a Density Curve is an idealized description of the data, we classify the types of density curves by the shapes of the distributions they represent. Finding the area under the density curve is like finding the proportion of data values on a particular interval

9 The time it takes for students to drive to school is evenly distributed with a minimum of 5 minutes and a range of 35 minutes. Draw the distribution What is the height of the rectangle? Where should the rectangle end? 1/35 5 40

10 b) What is the probability that it takes less than 20 minutes to drive to school?
P(X < 20) = (15)(1/35) = .4286 5 40 1/35

11 Uniform Distribution Is a continuous distribution that is evenly (or uniformly) distributed Has a density curve in the shape of a rectangle EX: The Citrus Sugar Company packs sugar in bags labeled 5 pounds. However, the packaging isn’t perfect and the actual weights are uniformly distributed with a mean of 4.98 pounds and a range of .12 pounds.

12 Constructing the the uniform distribution we draw a rectangle centered at the mean of 4.98 extending .06 in either direction. What is the height of this rectangle? What shape does a uniform distribution have? How long is this rectangle? 1/.12 4.98 5.04 4.92

13 What is the length of the shaded region?
What is the probability that a randomly selected bag will weigh more than 4.97 pounds? P(X > 4.97) = .07(1/.12) = .5833 What is the length of the shaded region? 4.98 5.04 4.92 1/.12

14 What is the length of the shaded region?
Find the probability that a randomly selected bag weighs between 4.93 and 5.03 pounds. What is the length of the shaded region? P(4.93<X<5.03) = .1(1/.12) = .8333 4.98 5.04 4.92 1/.12

15 Understanding the Normal Curve

16 Assume the first person had a 10 inch foot.
Suppose we measured the right foot length of 30 teachers and graphed the results. Assume the first person had a 10 inch foot. If our second subject had a 9 inch foot, we would add her to the graph. As we continued to plot foot lengths, a pattern would begin to emerge. 8 7 6 5 4 3 2 1 Number of People with that Shoe Size . Length of Right Foot

17 Notice how there are more people (n=6) with a 10 inch right foot than any other length. Notice also how as the length becomes larger or smaller, there are fewer and fewer people with that measurement. This is a characteristics of many variables that we measure. There is a tendency to have most measurements in the middle, and fewer as we approach the high and low extremes. If we were to connect the top of each bar, we would create a frequency polygon. 8 7 6 5 4 3 2 1 Number of People with that Shoe Size . Length of Right Foot

18 You will notice that if we smooth the lines, our data almost creates a bell shaped curve.
8 7 6 5 4 3 2 1 Number of People with that Shoe Size Length of Right Foot

19 You will notice that if we smooth the lines, our data almost creates a bell shaped curve.
This bell shaped curve is known as the “Bell Curve” or the “Normal Curve.” 8 7 6 5 4 3 2 1 Number of People with that Shoe Size Length of Right Foot

20 What can you say about the Mean, Median and Mode of a Normal Curve?
Whenever you see a normal curve, you should imagine the HISTOGRAM within it. Points on a Quiz Number of Students 9 8 7 6 5 4 3 2 1 What can you say about the Mean, Median and Mode of a Normal Curve?

21 So WHAT ABOUT THE STANDARD DEVIATION????
The inflection points (where the curve starts to flatten out) represent the width of the standard deviation μ-σ μ μ+σ AP Statistics, Section 2.1, Part 1

22 Normal distributions are a family of distributions that have the same general bell shape. They are symmetric (the left side is an exact mirror of the right side) with scores more concentrated in the middle than in the tails. Examples of normal distributions are shown to the right. Notice that they differ in how spread out they are although the area under each curve is always the same and equal to 1!

23 The mean and standard deviation are useful ways to describe a set of scores as they also determine the shape of the bell curve. If the scores are grouped closely together, the curves will have a smaller standard deviation than if they are spread farther apart. Small Standard Deviation Large Standard Deviation . Different Means Different Standard Deviations Same Standard Deviations Same Means

24 THE NORMAL EQUATION Notation: N(μ,σ) is a normal distribution
N(0,1) is the standard normal distribution “Standardizing” is the process of doing a linear translation from N(μ,σ) into N(0,1)

25 Normal Distributions Why are the Normal distributions important in statistics? Normal distributions are good descriptions for some distributions of real data. Normal distributions are good approximations of the results of many kinds of chance outcomes. Many statistical inference procedures are based on Normal distributions. A Normal distribution is described by a Normal density curve. Any particular Normal distribution is completely specified by two numbers: its mean µ and standard deviation σ. The mean of a Normal distribution is the center of the symmetric Normal curve. The standard deviation is the distance from the center to the change-of-curvature (inflection) points on either side. We abbreviate the Normal distribution with mean µ and standard deviation σ as N(µ,σ).

26 IN GENERAL, A Normal Distribution is:
Symmetrical bell-shaped (unimodal) density curve Above the horizontal axis Each curve is specified by its Mean & Standard Deviation: N(m, s) The transition points occur at m + s Probability is calculated by finding the area under the curve As s increases, the curve flattens & spreads out As s decreases, the curve gets taller and thinner

27 The Standard Normal Distribution
All Normal distributions can be “standardized” and are therefore the same if we measure in units of size σ from the mean µ as center. Normal Distributions Definition: The standard Normal distribution is the Normal distribution with mean 0 and standard deviation 1. If a variable x has any Normal distribution N(µ,σ) with mean µ and standard deviation σ, then the standardized variable has the standard Normal distribution, N(0,1).

28 The EMPIRICAL Rule Normal models give us an idea of how extreme a value is by telling us how likely it is to find one that far from the mean. We can find these numbers precisely, but until then we will use a simple rule that tells us a lot about the Normal model…

29 It turns out that in a Normal model:
about 68% of the values fall within one standard deviation of the mean; about 95% of the values fall within two standard deviations of the mean; and, about 99.7% (almost all!) of the values fall within three standard deviations of the mean.

30 AP Statistics, Section 2.1, Part 1
(or Empirical) Rule AP Statistics, Section 2.1, Part 1

31 The distribution of Iowa Test of Basic Skills (ITBS) vocabulary scores for 7th grade students in Gary, Indiana, is close to Normal. Suppose the distribution is N(6.84, 1.55). Sketch the Normal density curve for this distribution. What percent of ITBS vocabulary scores are less than 3.74? What percent of the scores are between 5.29 and 9.94? Example Normal Distributions

32 Finding Normal Percentiles by Hand
When a data value doesn’t fall exactly 1, 2, or 3 standard deviations from the mean, we can look it up in a table of Normal percentiles. A Table of Standard Normal Probabilities provides us with normal percentiles, but many calculators and statistics computer packages provide these as well.

33 The Standard Normal Table
Normal Distributions Because all Normal distributions are the same when we standardize, we can find areas under any Normal curve from a single table. Definition: The Standard Normal Table Table A is a table of areas under the standard Normal curve. The table entry for each value z is the area under the curve to the left of z. Suppose we want to find the proportion of observations from the standard Normal distribution that are less than 0.81. We can use Table A: P(z < 0.81) = .7910 Z .00 .01 .02 0.7 .7580 .7611 .7642 0.8 .7881 .7910 .7939 0.9 .8159 .8186 .8212

34 Normal Distributions Finding Areas Under the Standard Normal Curve
Example, p. 117 Normal Distributions Find the proportion of observations from the standard Normal distribution that are between and 0.81. Can you find the same proportion using a different approach? 1 - ( ) = 1 – =

35 Strategies for finding probabilities or proportions in normal distributions
Express the problem in terms of the observed variable x Make a graph of the data to check the Nearly Normal Condition to make sure we can use the Normal distribution to model the distribution. Draw a picture of the distribution and shade the area of interest under the curve. Standardize x to restate the problem in terms of a standard Normal variable z. Use Table A or the calculator and the fact that the total area under the curve is 1 to find the required area under the standard Normal curve. Conclude: Write your conclusion in the context of the problem.

36 Normal Distribution Calculations
Normal Distributions When Tiger Woods hits his driver, the distance the ball travels can be described by N(304, 8). What percent of Tiger’s drives travel between 305 and 325 yards? Using Table A, we can find the area to the left of z=2.63 and the area to the left of z=0.13. – = About 44% of Tiger’s drives travel between 305 and 325 yards.

37 Cautions We should only use the z-table when the distributions are normal, and data has been standardized The z-table only gives the amount of data found below the z-score, THAT IS THE AREA TO THE LEFT OF THE z-score! If you want to find the portion found above the z-score, subtract the probability found on the table from 1. AP Statistics, Section 2.2, Part 1

38 EXAMPLE: Find the proportion of observations from the standard Normal distribution the is greater than .81

39 From Percentiles to Scores: z in Reverse
Sometimes we start with areas and need to find the corresponding z-score or even the original data value. Example: What z-score represents the first quartile in a Normal model?

40 From Percentiles to Scores: z in Reverse (cont.)
Look in theTable for an area of The exact area is not there, but is pretty close. This figure is associated with z = –0.67, so the first quartile is 0.67 standard deviations below the mean.

41 Will my calculator do any of this normal stuff?
Normalpdf – use for graphing ONLY Normalcdf – will find probability of area from lower bound to upper bound Invnorm (inverse normal) – will find z-score for probability THESE COMMANDS ARE FOUND IN THE DISTRIBUTION MENU

42 Finding Normal Percentiles using the calculator
Go to the Distribution key on your calculator Find NORMCDF Use the key stroke: NORMCDF(min z,max z)

43 Example Men’s heights are N(69,2.5).
What percent of men are taller than 68 inches? AP Statistics, Section 2.2, Part 1

44 Working with intervals
What proportion of men are between 68 and 70 inches tall? AP Statistics, Section 2.2, Part 1

45 INVERSE NORM We can also use the calculator to also find the z-score for a particular area: INVNORM(prop of area to the left) Note: when using the calculator, entering μ,σ will “un-standardize” the data

46 Working backwards How tall must a woman be in order to be in the top 15% of all women? AP Statistics, Section 2.2, Part 1

47 Working backwards How tall must a man be in order to be in the 90th percentile? AP Statistics, Section 2.2, Part 1

48 Working backwards What range of values make up the middle 50% of men’s heights? AP Statistics, Section 2.2, Part 1

49 REMINDER: CAUTION!!! Whether using the calculator or Table, we should only use the z- table when the distributions are normal, and data has been standardized AP Statistics, Section 2.2, Part 1

50 Therefore, we need a strategy for assessing Normality.
Normal Distributions Plot the data. Make a dotplot, stemplot, or histogram and see if the graph is approximately symmetric and bell-shaped. Check the 1-Vars Stats & compare mean & median Check whether the data follow the rule. Count how many observations fall within one, two, and three standard deviations of the mean and check to see if these percents are close to the 68%, 95%, and 99.7% targets for a Normal distribution.

51 Are You Normal? How Can You Tell?
A more specialized graphical display that can help you decide whether a Normal model is appropriate is the Normal probability plot. If the distribution of the data is roughly Normal, the Normal probability plot approximates a diagonal straight line. Deviations from a straight line indicate that the distribution is not Normal.

52 Normal Probability Plots
Most software packages can construct Normal probability plots. These plots are constructed by plotting each observation in a data set against its corresponding percentile’s z-score. Normal Distributions Interpreting Normal Probability Plots If the points on a Normal probability plot lie close to a straight line, the plot indicates that the data are Normal. Systematic deviations from a straight line indicate a non-Normal distribution. Outliers appear as points that are far away from the overall pattern of the plot.

53 Are You Normal? How Can You Tell? (cont.)
Nearly Normal data have a histogram and a Normal probability plot that look somewhat like this example:

54 Are You Normal? How Can You Tell? (cont.)
A skewed distribution might have a histogram and Normal probability plot like this:

55 Are Walter Johnson’s Wins Normal?
5, 14, 13, 25, 25, 33, 36, 28, 27, 25, 23, 23, 20, 8, 17, 15, 17, 23, 20, 15, 5 into list L1 Run “1-Var Stats” Is the data set symmetric? Where do you look? AP Statistics, Section 2.2, Part 2

56 Are Walter Johnson’s Wins Normal?
Look also at boxplot Is the data set symmetric? AP Statistics, Section 2.2, Part 2

57 Rule? You can use the rule with a histogram to see if the distribution roughly fits the rule. You will also want to check the Normal Probability Plot. AP Statistics, Section 2.2, Part 2

58 In Summary: What Can Go Wrong?
Don’t use a Normal model when the distribution is not unimodal and symmetric.

59 What Can Go Wrong? (cont.)
Don’t use the mean and standard deviation when outliers are present—the mean and standard deviation can both be distorted by outliers. Don’t round off too soon. Don’t round your results in the middle of a calculation. Don’t worry about minor differences in results.

60 What have we learned? (cont.)
We’ve learned that the Rule can be a useful rule of thumb for understanding NORMAL distributions: For data that are unimodal and symmetric, about 68% fall within 1 SD of the mean, 95% fall within 2 SDs of the mean, and 99.7% fall within 3 SDs of the mean.

61 What have we learned? (cont.)
We see the importance of Thinking about whether a method will work: Normality Assumption: We sometimes work with Normal tables. These tables are based on the Normal model. Data can’t be exactly Normal, so we check the Nearly Normal Condition by making a histogram (is it unimodal, symmetric and free of outliers?) or a normal probability plot (is it straight enough?).


Download ppt "CHAPTER 2 Modeling Distributions of Data"

Similar presentations


Ads by Google