Modeling Continuous Variables Lecture 19 Section 6.1 - 6.3.1 Fri, Oct 6, 2006
Models Mathematical model – An abstraction and, therefore, a simplification of a real situation, one that retains the essential features. Real situations are usually much to complicated to deal with in all their details.
Example The “bell curve” is a model (an abstraction) of many populations. Real populations have all sorts of bumps and twists and irregularities. The bell curve is smooth and perfectly symmetric. In statistics, the bell curve is called the normal curve, or normal distribution.
Models Our models will be models of distributions, presented either as histograms or as continuous distributions.
Histograms and Area In a histogram, frequency is represented by area. Consider the following distribution of test scores. Grade Frequency 60 – 69 3 70 – 79 8 80 – 89 9 90 – 99 5
Histograms and Area Frequency 10 8 6 4 2 Grade 60 70 80 90 100
Histograms and Area What is the total area of this histogram? We will rescale the vertical scale so that the total area equals 1, representing 100%.
Histograms and Area To achieve this, we divide the frequencies by the original area to get the density. Grade Frequency Density 60 – 69 3 0.012 70 – 79 8 0.032 80 – 89 9 0.036 90 – 99 5 0.020
Histograms and Area Density 0.040 0.030 0.020 0.010 Grade 60 70 80 90 60 70 80 90 100
Histograms and Area Density 0.040 Total area = 1 0.030 0.020 0.010 Grade 60 70 80 90 100
Histograms and Area This histogram has the special property that the proportion can be found by computing the area of the rectangle. For example, what proportion of the grades are less than 80? Compute: (10 0.012) + (10 0.032) = 0.12 + 0.32 = 0.44 = 44%.
Density Functions This is the fundamental property that connects the graph of a continuous model to the population that it represents, namely: The area under the graph between two numbers a and b on the x-axis represents the proportion of the population that lies between a and b. AREA = PROPORTION
Density Functions Now consider an arbitrary distribution. The area under the curve between a and b is the proportion of the values of x that lie between a and b. x a b
Density Functions Now consider an arbitrary distribution. The area under the curve between a and b is the proportion of the values of x that lie between a and b. x a b
Density Functions Now consider an arbitrary distribution. The area under the curve between a and b is the proportion of the values of x that lie between a and b. x a b Area = Proportion
Density Functions Again, the total area under the curve must be 1, representing a proportion of 100%. x a b
Density Functions Again, the total area under the curve must be 1, representing a proportion of 100%. 100% x a b
The Normal Distribution Normal distribution – The statistician’s name for the bell curve. It is a density function in the shape of a “bell.” Symmetric. Unimodal. Extends over the entire real line (no endpoints). “Main part” lies within 3 of the mean.
The Normal Distribution The curve has a bell shape, with infinitely long tails in both directions.
The Normal Distribution The mean is located in the center, at the peak.
The Normal Distribution The width of the “main” part of the curve is 6 standard deviations wide (3 standard deviations each way from the mean). – 3 + 3
The Normal Distribution The area under the entire curve is 1. (The area outside of 3 st. dev. is approx. 0.0027.) Area = 1 – 3 + 3
The Normal Distribution The normal distribution with mean and standard deviation is denoted N(, ). For example, if X is a variable whose distribution is normal with mean 30 and standard deviation 5, then we say that “X is N(30, 5).”
The Normal Distribution If X is N(30, 5), then the distribution of X looks like this: 15 30 45
Some Normal Distributions 1 2 3 4 5 6 7 8
Some Normal Distributions 1 2 3 4 5 6 7 8
Some Normal Distributions 1 2 3 4 5 6 7 8
Some Normal Distributions 1 2 3 4 5 6 7 8
Bag A vs. Bag B Suppose we have two bags, Bag A and Bag B. Each bag contains millions of vouchers. In Bag A, the values of the vouchers have distribution N(50, 10). Normal with = $50 and = $10. In Bag B, the values of the vouchers have distribution N(80, 15). Normal with = $80 and = $15.
Bag A vs. Bag B H0: Bag A H1: Bag B 30 40 50 60 70 80 90 100 110
Bag A vs. Bag B We are presented with one of the bags. We select one voucher at random from that bag. H0: Bag A H1: Bag B 30 40 50 60 70 80 90 100 110
Bag A vs. Bag B If its value is less than or equal to $65, then we will decide that it was from Bag A. H0: Bag A H1: Bag B 30 40 50 60 65 70 80 90 100 110
Bag A vs. Bag B If its value is less than or equal to $65, then we will decide that it was from Bag A. H0: Bag A H1: Bag B 30 40 50 60 65 70 80 90 100 110 Acceptance Region
Bag A vs. Bag B If its value is less than or equal to $65, then we will decide that it was from Bag A. H0: Bag A H1: Bag B 30 40 50 60 65 70 80 90 100 110 Acceptance Region Rejection Region
Bag A vs. Bag B What is ? H0: Bag A H1: Bag B 30 40 50 60 65 70 80 90 100 110
Bag A vs. Bag B What is ? H0: Bag A H1: Bag B 30 40 50 60 65 70 80 90 100 110
Bag A vs. Bag B What is ? H0: Bag A H1: Bag B 30 40 50 60 65 70 80 90 100 110
Bag A vs. Bag B What is ? H0: Bag A H1: Bag B 30 40 50 60 65 70 80 90 100 110
Bag A vs. Bag B If the distributions are very close together, then and will be large. H0: Bag A H1: Bag B 30 40 50 60 65 70 80 90 100 110
Bag A vs. Bag B If the distributions are very similar, then and will be large. H0: Bag A H1: Bag B 30 40 50 60 65 70 80 90 100 110
Bag A vs. Bag B If the distributions are very similar, then and will be large. H0: Bag A H1: Bag B 30 40 50 60 65 70 80 90 100 110
Bag A vs. Bag B Similarly, if the distributions are far apart, then and will both be very small. H0: Bag A H1: Bag B 30 40 50 60 65 70 80 90 100 110
Bag A vs. Bag B Similarly, if the distributions are far apart, then and will both be very small. H0: Bag A H1: Bag B 30 40 50 60 65 70 80 90 100 110
Bag A vs. Bag B Similarly, if the distributions are far apart, then and will both be very small. H0: Bag A H1: Bag B 30 40 50 60 65 70 80 90 100 110