Download presentation
Presentation is loading. Please wait.
1
Continuous Random Variables
Chapter 6 Continuous Random Variables
2
Problem: Diagnosing Spina Bifida
What is the probability that a foetus without spina bifida is correctly diagnosed? To ensure that 99% of foetuses with spina bifida are correctly diagnosed, at what value should T be set? What is the probability that a foetus with spina bifida is correctly diagnosed? X Y Background facts about spina bifida: See SB is a congenital disease. A screening test for SB in a foetus measures the mother’s urine for the concentration of alpha fetoprotein. Here is a plot showing two descriptions, or models, for alpha fetoprotein concentration. Here’s a bit about these plots which will provide us with a good overview of what this chapter is about. We are measuring alpha fetoprotein concentration, so along the axis we have the scale for alpha fetoprotein concentration in micromoles per litre. We use continuous random variables for these measures and that is where the chapter gets it name from. The values fall on a ‘continuum’ and can take on any values within an interval on that continuum. The X graph is that of a pdf for alpha fetoprotein concentration from women who are carrying a foetus without SB (a ‘healthy’ foetus). “Probability” because areas under the curve give probabilities of observations occurring within a specified interval. “Density” because the graph is higher at places where observations plotted on a dot plot would be most densely placed. Most observations would be around the 15.7 but virtually never as large as 23.1. Later today we will get a better feel for where these pdfs come from and will define their key properties. THESE NOTES CONTINUE UNDER THE NEXT (HIDDEN) SLIDE 15.7 23.1 Concentration (µM/L) T
3
Female Weights: Sample of 1
110 100 90 80 70 60 50 40 Weight (kg) This is the first of 6 slides leading to the concept of a probability density function. These data come from female Introductory Statistics students who answered an online survey. We will plot the weights for an increasing number of female students.
4
Female Weights: Sample of 2
110 100 90 80 70 60 50 40 Weight (kg)
5
Female Weights: Sample of 3
110 100 90 80 70 60 50 40 Weight (kg)
6
Female Weights: Sample of 30
40 50 60 70 80 90 100 110 Weight (kg) In this sample of 30 female students we can see that the observations appear more dense around 50kg to 55kg.
7
Female Weights: Sample of 100
40 50 60 70 80 90 100 110 Weight (kg) With 100 observations the “shape” of the data is becoming more obvious. The density of observations is still greatest at around 50kg to 55kg.
8
Female Weights: Sample of 1098
30 40 50 60 70 80 90 100 110 120 25 75 125 150 175 Weight (kg) The curve is an attempt to model this pattern of weights for the (conceptual) population of weights of female Statistics students. But as the sample size increases the frequencies in each interval will also increase. We will therefore adjust the vertical scale so that the total area under the curve is 1.
9
Female Weights: Sample of 1098
Describe the shape of the distribution. Symmetric Right-skewed Left-skewed 30 40 50 60 70 80 90 100 110 120 25 75 125 150 175 Weight (kg) With 1098 observations a clear pattern is starting to emerge; unimodal (greatest density around 50 to 65kg), right (positively) skewed.
10
Probability Density Function (p.d.f.)
30 40 50 60 70 80 90 100 110 120 Weight (kg) 0.01 0.02 0.03 This curve is called a probability density function (often shortened to p.d.f.). “Density” because the graph is higher at places where observations plotted on a dot plot would be most densely placed. “Probability” because areas under the curve give probabilities of observations occurring within a specified interval. For example, the shaded area gives the probability of a randomly chosen female Statistics student having a weight between 70kg and 90kg.
11
Probability Density Function (p.d.f.)
Properties of a Probability Density Function (p.d.f.) X The p.d.f. curve is always above or on the x-axis. The first of 3 important properties of a probability density function. Emphasise that these results are true for a p.d.f. for any continuous random variable. In particular, they are true for a p.d.f. for a Normal distribution.
12
Female Weights: Sample of 1098
30 40 50 60 70 80 90 100 110 120 25 75 125 150 175 Weight (kg) Think about how the pdf was formulated – from a histogram, the bars are always above the x-axis.
13
Probability Density Function (p.d.f.)
Properties of a Probability Density Function (p.d.f.) X The p.d.f. curve is always above or on the x-axis.
14
Probability Density Function (p.d.f.)
Properties of a Probability Density Function (p.d.f.) X Probabilities are represented by areas under the p.d.f. curve. pr(a ≤ X ≤ b) = under the p.d.f. curve between x = a and x = b. area The second of 3 important properties of a probability density function. a b
15
Probability Density Function (p.d.f.)
30 40 50 60 70 80 90 100 110 120 Weight (kg) 0.01 0.02 0.03 We talked about this before with the female weights: using this pdf to model the weights of female Statistics students, the probability that a randomly selected student has a weight between 70kg and 90kg, (or the proportion of these students who have a weight between 70kg and 90kg) is equal to the area of the shaded region. So what this tells me is that if we have a continuous random variable X and want to find e.g., pr(a<X<b), then we need to find a suitable model for the distribution of X (the hard part) and then need to find an appropriate area (the easy part).
16
Probability Density Function (p.d.f.)
Properties of a Probability Density Function (p.d.f.) X Probabilities are represented by areas under the p.d.f. curve. pr(a ≤ X ≤ b) = under the p.d.f. curve between x = a and x = b. area The second of 3 important properties of a probability density function. a b
17
Probability Density Function (p.d.f.)
Properties of a Probability Density Function (p.d.f.) X The total area under a p.d.f. curve = 1 The third of 3 important properties of a probability density function. Using the p.d.f.s for alpha fetoprotein concentrations you could now justify why the p.d.f. for Y was flatter than the p.d.f. for X.
18
Endpoints of Intervals
X a b Already seen that pr(a<=X<=b) = area. Let’s have a closer look at this area. It includes both boundary lines. pr(a ≤ X ≤ b)
19
Endpoints of Intervals
= a b What if we remove the boundary lines? Area remains the same, (don’t change the area of a paddock by fencing etc) Two areas are the same. Area with boundaries excluded = pr(a<X<b) Hence pr(a<=X<=b) = pr(a<X<b) pr(a ≤ X ≤ b) = pr(a < X < b)
20
Endpoints of Intervals
= a b Similarly if we had only included one boundary and excluded the other. pr(a ≤ X ≤ b) = pr(a ≤ X < b)
21
Endpoints of Intervals
= a b pr(a ≤ X ≤ b) = pr(a < X ≤ b)
22
Endpoints of Intervals
X pr(a ≤ X ≤ b) = pr(a ≤ X < b) = pr(a < X ≤ b) = pr(a < X < b) a b So the result is . . . In calculations involving continuous random variables, we do not have to worry about whether interval endpoints are included (≥ or ≤) or excluded (> or <).
23
Endpoints Which one of the following statements is FALSE?
pr(a ≤ X ≤ b) = pr(a ≤ X < b) pr(a ≤ X ≤ b) = pr(X ≤ b ) – pr(X ≤ a ) pr(a ≤ X ≤ b) ≠ pr(a < X < b) pr(a ≤ X ≤ b) ≠ pr(X ≤ a ) – pr(X ≤ b ) pr(a ≤ X ≤ b) = pr(a < X ≤ b)
24
Heights of Male Students
Height (cm) f 150 1 167 10 178 46 189 4 156 168 14 179 22 190 12 157 169 8 180 65 191 3 158 2 170 34 181 13 192 159 171 9 182 17 193 160 6 172 183 19 195 7 161 173 23 184 16 196 162 174 31 185 26 198 163 175 38 186 199 165 176 25 187 11 200 166 177 188 207 The table contains data from a recent online survey of 544 male introductory Statistics students.
25
Heights of Male Students
160 140 120 100 80 60 40 20 150 170 180 190 200 210 Height (cm) Here is a histogram of the data. A histogram was chosen because the sample size is large. The data are unimodal, reasonably symmetrical and approximately bell-shaped. The mean was 177.1cm and the standard deviation was 7.7cm.
26
Heights of Male Students
mean – 1 sd = – 7.7 = 169.4 mean + 1 sd = = 184.8 Heights of Male Students Height (cm) f 150 1 167 10 178 46 189 4 156 168 14 179 22 190 12 157 169 8 180 65 191 3 158 2 170 34 181 13 192 159 171 9 182 17 193 160 6 172 183 19 195 7 161 173 23 184 16 196 162 174 31 185 26 198 163 175 38 186 199 165 176 25 187 11 200 166 177 188 207 Let’s find the proportion of the data that lie within 1 standard deviation of the mean.
27
Heights of Male Students
mean – 1 sd = – 7.7 = 169.4 mean + 1 sd = = 184.8 Heights of Male Students Height (cm) f 150 1 167 10 178 46 189 4 156 168 14 179 22 190 12 157 169 8 180 65 191 3 158 2 170 34 181 13 192 159 171 9 182 17 193 160 6 172 183 19 195 7 161 173 23 184 16 196 162 174 31 185 26 198 163 175 38 186 199 165 176 25 187 11 200 166 177 188 207 375/544 = 69% have a height within 1 standard deviation of the mean.
28
Heights of Male Students
mean – 2 sd = – 2 × 7.7 = 161.7 mean + 2 sd = × 7.7 = 192.5 Heights of Male Students Height (cm) f 150 1 167 10 178 46 189 4 156 168 14 179 22 190 12 157 169 8 180 65 191 3 158 2 170 34 181 13 192 159 171 9 182 17 193 160 6 172 183 19 195 7 161 173 23 184 16 196 162 174 31 185 26 198 163 175 38 186 199 165 176 25 187 11 200 166 177 188 207 Let’s also find the proportion of the data that lie within 2 standard deviations of the mean.
29
Heights of Male Students
mean – 2 sd = – 2 × 7.7 = 161.7 mean + 2 sd = × 7.7 = 192.5 Heights of Male Students Height (cm) f 150 1 167 10 178 46 189 4 156 168 14 179 22 190 12 157 169 8 180 65 191 3 158 2 170 34 181 13 192 159 171 9 182 17 193 160 6 172 183 19 195 7 161 173 23 184 16 196 162 174 31 185 26 198 163 175 38 186 199 165 176 25 187 11 200 166 177 188 207 515/544 = 95% have a height within 2 standard deviations of the mean.
30
Heights of Male Students
mean – 3 sd = – 3 × 7.7 = 154.0 mean + 3 sd = × 7.7 = 200.2 Heights of Male Students Height (cm) f 150 1 167 10 178 46 189 4 156 168 14 179 22 190 12 157 169 8 180 65 191 3 158 2 170 34 181 13 192 159 171 9 182 17 193 160 6 172 183 19 195 7 161 173 23 184 16 196 162 174 31 185 26 198 163 175 38 186 199 165 176 25 187 11 200 166 177 188 207 And also the proportion of the data that lie within 3 standard deviations of the mean.
31
Heights of Male Students
mean – 3 sd = – 3 × 7.7 = 154.0 mean + 3 sd = × 7.7 = 200.2 Heights of Male Students Height (cm) f 150 1 167 10 178 46 189 4 156 168 14 179 22 190 12 157 169 8 180 65 191 3 158 2 170 34 181 13 192 159 171 9 182 17 193 160 6 172 183 19 195 7 161 173 23 184 16 196 162 174 31 185 26 198 163 175 38 186 199 165 176 25 187 11 200 166 177 188 207 542/544 = 99.6% have a height within 3 standard deviations of the mean.
32
Heights of Male Students
160 140 120 100 80 60 40 20 150 170 180 190 200 210 Height (cm) Now what about a description or model for the behaviour of the random variable, the height of a male?
33
Heights of Male Students
150 160 170 180 190 200 210 Height (cm) 0.01 0.02 0.03 0.04 0.05 0.06 The vertical scale has been adjusted so that the total area under the histogram is 1. The probability density function of a Normal(µ = 177.1, σ = 7.7) distribution has been superimposed on the plot. This is another opportunity to talk about the parameters for the Normal distribution, i.e. this member of the Normal distribution family is uniquely determined by the mean and the standard deviation. This Normal distribution appears to be a reasonable model for these data.
34
68 – 95 – 99.7 Rule In a Normal distribution, approximately:
68% of observations are within 1s of m 95% of observations are within 2s of m 99.7% of observations are within 3s of m m - s m m + s 0.68 m - 2s m m + 2s 0.95 Now to state the proportions (or percentages) for Normal distributions. Note the close similarity between the “ Rule” and the percentages from the male heights data. The “Against All Odds” Boston Beanstalks video clip could be played here. m - 3s m m + 3s 0.997
35
Percentage Body Fat X ~ Normal ( = 9, = 3)
Approximately 95% of competitive cyclists have percentage body fat somewhere between 0.95 Let X be the percentage body fat of a competitive cyclist. The Normal(µ = 9, σ = 3) distribution has been chosen to model percentage body fat of competitive cyclists. This is the first time they have seen the X ~ Normal(µ = 9, σ = 3) notation but the parameters for the Normal distribution have been mentioned before, but there is no harm in discussing them again. Under this model there would be a small percentage of negative values for percentage body fat. Although obviously not perfect, the model may give good approximate answers to many calculations. According to the rule, how many std deviations do we need to take on either side of the mean to cover 95% of observations? (Note: “Healthy” percentage body fat for normal people is about %.) 9
36
Percentage Body Fat X ~ Normal ( = 9, = 3)
Approximately 95% of competitive cyclists have percentage body fat somewhere between 1 s of m 2 s of m 3 s of m
37
68 – 95 – 99.7 Rule In a Normal distribution, approximately:
68% of observations are within 1s of m 95% of observations are within 2s of m 99.7% of observations are within 3s of m m - s m m + s 0.68 m - 2s m m + 2s 0.95 Under a Normal distribution, 95% of the observations are within 2 std deviations of the mean. m - 3s m m + 3s 0.997
38
Percentage Body Fat X ~ Normal ( = 9, = 3)
Approximately 95% of competitive cyclists have percentage body fat somewhere between 3% and 15%. 0.95 Let X be the percentage body fat of a competitive cyclist. The Normal(µ = 9, σ = 3) distribution has been chosen to model percentage body fat of competitive cyclists. This is the first time they have seen the X ~ Normal(µ = 9, σ = 3) notation but the parameters for the Normal distribution have been mentioned before, but there is no harm in discussing them again. Under this model there would be a small percentage of negative values for percentage body fat. Although obviously not perfect, the model may give good approximate answers to many calculations. According to the rule, 95% of observations will lie within 2 standard deviations of the mean. – 2 × + 2 × m s 9 3 9
39
Percentage Body Fat X ~ Normal ( = 9, = 3)
For a randomly chosen competitive cyclist, there is a probability of 0.68 that the cyclist has percentage body fat between 0.68 How many std deviations to cover 68% of the observations? 9
40
Percentage Body Fat X ~ Normal ( = 9, = 3)
For a randomly chosen competitive cyclist, there is a probability of 0.68 that the cyclist has percentage body fat between 1s of m 2s of m 3s of m How many std deviations to cover 68% of the observations?
41
68 – 95 – 99.7 Rule In a Normal distribution, approximately:
68% of observations are within 1s of m 95% of observations are within 2s of m 99.7% of observations are within 3s of m m - s m m + s 0.68 m - 2s m m + 2s 0.95 Under a Normal distribution, 68% of observations will lie within 1 standard deviation of the mean. m - 3s m m + 3s 0.997
42
Percentage Body Fat X ~ Normal ( = 9, = 3)
For a randomly chosen competitive cyclist, there is a probability of 0.68 that the cyclist has percentage body fat between 0.68 6% and 12%. According to the rule, 68% of observations will lie within 1 standard deviation of the mean. 9 – 1 × + 1 × m s 9 3
43
Annual Compound Share Returns X ~ Normal ( = 11.0, = 20.3)
Less than 0% (i.e., a negative return). pr(X < 0) = 0.2940 x P(X <= x) Let X be the annual compound share return for a large US company (as a percentage). X ~ Normal ( = 11.0, = 20.3) Find the following probability that a randomly chosen large US company has an annual compound share return of: …. Draw a sketch diagram and shade the probability (area) required. This is a lower-tail probability and so the answer appears in the output given. These 3 questions could be done using SPSS. Set up the 3 calculations as lower-tail probabilities and use SPSS once to obtaIn these probabilities. 11.0
44
Annual Compound Share Returns X ~ Normal ( = 11.0, = 20.3)
More than 50%. pr(X > 50) = 1 – pr(X < 50) = 1 – = x P(X <= x) Sketch the diagram and shade the probability required. As the total area under a p.d.f. is 1, this upper-tail probability = 1 – lower-tail probability. 11.0 50
45
Annual Compound Share Returns X ~ Normal ( = 11.0, = 20.3)
Between 10% and 30%. pr(10 < X < 30) = pr(X < 30) – pr(X < 10) = – = x P(X <= x) Shade the required probability. This can be represented as the lower-tail probability, X < 30, minus the lower-tail probability, X < 10. 30 10 11.0
46
Problem: Diagnosing Spina Bifida
X ~ Normal (X = 15.7, X = 0.7) Y ~ Normal (Y = 23.1, Y = 4.1) X Y Let’s now revisit the spina bifida example. Use the distributions of X and/or Y to talk about the shape of the Normal distribution (unimodal, bell-shaped and symmetrical about the mean) and the parameters (and the effect each parameter has on the shape of the curve). The p.d.f. property that the total area under the curve is 1 explains why the maximum height changes as the standard deviation changes. Discuss the setting of a threshold for the test; with values below the threshold providing a “healthy” diagnosis and values above providing a “spina bifida” diagnosis, in which the mother is sent for further tests. Discuss the advantages and disadvantages of different threshold values. The value indicated is 17.8 micromoles per litre. We’ll now answer 2 of our earlier questions about this screening test. 15.7 T = 17.8 23.1 Concentration (µM/L)
47
Spina Bifida X ~ Normal (X = 15.7, X = 0.7)
Y ~ Normal (Y = 23.1, Y = 4.1) For a foetus without spina bifida, what is the probability that it is correctly diagnosed? Spina Bifida This random variable of interest here is X Y I don’t know Notice the subscripts used on the means and standard deviations to distinguish between the two distributions. For a foetus without spina bifida we need to use the distribution of X.
48
Spina Bifida X ~ Normal (X = 15.7, X = 0.7)
Y ~ Normal (Y = 23.1, Y = 4.1) For a foetus without spina bifida, what is the probability that it is correctly diagnosed? Spina Bifida 15.7 17.8 The foetus will be correctly diagnosed if the concentration is below the threshold of 17.8.
49
Spina Bifida X ~ Normal (X = 15.7, X = 0.7)
Y ~ Normal (Y = 23.1, Y = 4.1) For a foetus without spina bifida, what is the probability that it is correctly diagnosed? Spina Bifida 15.7 17.8 This question can be expressed as What is pr(X < 17.8)? What is pr(X > 17.8)? I don’t know The foetus will be correctly diagnosed if the concentration is below the threshold of 17.8.
50
Spina Bifida X ~ Normal (X = 15.7, X = 0.7)
Y ~ Normal (Y = 23.1, Y = 4.1) For a foetus without spina bifida, what is the probability that it is correctly diagnosed? Spina Bifida pr(X < 17.8) = 15.7 17.8 For a foetus without spina bifida we need to use the distribution of X. The foetus will be correctly diagnosed if the concentration is below the threshold of 17.8. Normal(m = 15.7, s = 0.7) Normal(m = 23.1, s = 4.1) x P(X <= x) x P(X <= x)
51
Spina Bifida X ~ Normal (X = 15.7, X = 0.7)
Y ~ Normal (Y = 23.1, Y = 4.1) What is the probability that a foetus with spina bifida is correctly diagnosed? Spina Bifida This random variable of interest here is X Y I don’t know For a foetus with spina bifida we need to use the distribution of Y.
52
Spina Bifida X ~ Normal (X = 15.7, X = 0.7)
Y ~ Normal (Y = 23.1, Y = 4.1) What is the probability that a foetus with spina bifida is correctly diagnosed? Spina Bifida 17.8 23.1 The foetus will be correctly diagnosed if the concentration is above the threshold of 17.8.
53
Spina Bifida X ~ Normal (X = 15.7, X = 0.7)
Y ~ Normal (Y = 23.1, Y = 4.1) What is the probability that a foetus with spina bifida is correctly diagnosed? Spina Bifida 17.8 23.1 This question can be expressed as What is pr(Y < 17.8)? What is pr(Y > 17.8)? I don’t know The foetus will be correctly diagnosed if the concentration is above the threshold of 17.8.
54
Spina Bifida X ~ Normal (X = 15.7, X = 0.7)
Y ~ Normal (Y = 23.1, Y = 4.1) What is the probability that a foetus with spina bifida is correctly diagnosed? Spina Bifida pr(Y > 17.8) 17.8 23.1 = 1 – pr(Y < 17.8) = 1 – = For a foetus with spina bifida we need to use the distribution of Y. The foetus will be correctly diagnosed if the concentration is above the threshold of 17.8. This upper-tail probability will be calculated using 1 minus the lower-tail probability. Normal(m = 15.7, s = 0.7) Normal(m = 23.1, s = 4.1) x P(X <= x) x P(X <= x)
55
Annual Compound Share Returns X ~ Normal ( = 11.0, = 20.3)
Less than 0% (i.e., a negative return). pr(X < 0) = 0.2940 x P(X <= x) Here is the problem that we started this section with – in the examples so far, we have been given an x-value (or 2 x-values) and have found a probability (or a proportion/percentage), i.e, found an area. (Note: don’t confuse the units of the variable and the percentage in the graph!) 11.0
56
Inverse Normal Problems
IQ Scores: X ~ Normal ( = 100, = 15) Find the IQ score that the bottom 20% of the population fall below (20th percentile). pr(X < x ) = 0.2 0.2 In the next examples we will be given a probability/proportion/percentage (i.e., an area) and have to find an x-value. This is called an inverse Normal problem. Draw a diagram and shade an appropriate area for the given probability. The use of the word “bottom” indicates that the probability is a lower-tail one, and the 20% means that the x-value is below the mean. x 100
57
IQ Scores X ~ Normal ( = 100, = 15)
P(X <= x) x The output gives x-values for lower-tail probabilities so for a lower-tail probability of 0.2 the x-value is
58
IQ Scores X ~ Normal ( = 100, = 15)
Find the IQ score that the bottom 20% of the population fall below (20th percentile). pr(X < x ) = 0.2 0.2 x = 87 For IQs it is sensible to round to the nearest whole number. x 100
59
IQ Scores X ~ Normal ( = 100, = 15)
What IQ score is exceeded by only the top 10% of the population? pr(X > x ) = 0.1 pr(X < x ) = 0.9 0.1 0.9 Draw a diagram and shade an appropriate area for the given probability. The word “top” indicates an upper-tail probability and the 10% means the x-value is above the mean. The upper-tail area of 0.1 means the lower-tail area is 0.9. 100 x
60
IQ Scores X ~ Normal ( = 100, = 15)
P(X <= x) x For a lower-tail probability of 0.9 the x-value is
61
IQ Scores X ~ Normal ( = 100, = 15)
What IQ score is exceeded by only the top 10% of the population? pr(X > x ) = 0.1 pr(X < x ) = 0.9 x = 119 0.1 0.9 Again we’ll round to the nearest whole number. 100 x
62
IQ Scores X ~ Normal ( = 100, = 15)
Find the interquartile range for IQ scores. pr(X ≤ a ) = 0.25 0.25 a Recall that the interquartile range is the difference between the upper and lower quartiles. Let’s first find the lower quartile. As 25% of observations lie below the lower quartile, the lower-tail area is 0.25. 100
63
IQ Scores X ~ Normal ( = 100, = 15)
P(X <= x) x For a lower-tail probability of 0.25 the x-value is
64
IQ Scores X ~ Normal ( = 100, = 15)
Find the interquartile range for IQ scores. pr(X ≤ a ) = 0.25 a = 89.88 0.25 We won’t do any rounding until we have done the subtraction of the two quartiles. a 100
65
IQ Scores X ~ Normal ( = 100, = 15)
Find the interquartile range for IQ scores. pr(X ≤ a ) = 0.25 a = 89.88 pr(X ≤ b ) = 0.75 0.25 0.25 b 25% of observations lie above the upper quartile, so 75% lie below it. The lower-tail area is 0.75. a 100
66
IQ Scores X ~ Normal ( = 100, = 15)
P(X <= x) x For a lower-tail probability of 0.75 the x-value is
67
IQ Scores X ~ Normal ( = 100, = 15)
Find the interquartile range for IQ scores. pr(X ≤ a ) = 0.25 a = 89.88 pr(X ≤ b ) = 0.75 b = IQR = – 89.88 = 20 0.25 0.25 – = 20.24, which is 20 to the nearest whole number. a 100 b
68
Male Business Graduate Salary B ~ Normal ( = 62 800, = 21 000)
Find the salary that the lowest 10% are below. Another inverse Normal problem, because we are given a probability and have been asked for an x-value. As usual, draw a diagram. A 62800 B
69
Male Business Graduate Salary B ~ Normal ( = 62 800, = 21 000)
Find the salary that the lowest 10% are below. Which is the correct position of x? A B Unsure The word “before” indicates a lower-tail probability. With a lower-tail probability of 0.1, the x-value is less than the mean. A 62800 B
70
Male Business Graduate Salary B ~ Normal ( = 62 800, = 21 000)
Find the salary that the lowest 10% are below. Another inverse Normal problem, because we are given a probability and have been asked for an x-value. As usual, draw a diagram. The word “lowest” indicates a lower-tail probability. With a lower-tail probability of 0.1, the x-value is less than the mean. x 62800
71
Male Business Graduate Salary B ~ Normal ( = 62 800, = 21 000)
Find the salary that the lowest 10% are below. pr(B < x ) = 0.1 0.1 x Another inverse Normal problem, because we are given a probability and have been asked for an x-value. As usual, draw a diagram. The word “lowest” indicates a lower-tail probability. With a lower-tail probability of 0.1, the x-value is less than the mean. 62800
72
Male Business Graduate Salary B ~ Normal ( = 62 800, = 21 000)
P(X <= x) x For a lower-tail probability of 0.1 the x-value is
73
Male Business Graduate Salary B ~ Normal ( = 62 800, = 21 000)
Find the salary that the lowest 10% are below. pr(B < x ) = 0.1 x = $35 887 0.1 x You may want to discuss appropriate rounding if this answer was to be reporting more contextually. x 62800
74
Male Business Graduate Salary B ~ Normal ( = 62 800, = 21 000)
Find the range of the central 60% of salaries. pr(B < a ) = 0.2 b a 0.6 0.2 Shade the central 60% and label the lower limit as a and the upper limit as b. With a central area of 0.6 the lower and upper tail areas are each 0.2. This gives a lower-tail area of 0.2 for a. 62800
75
Male Business Graduate Salary B ~ Normal ( = 62 800, = 21 000)
P(X <= x) x For a lower-tail probability of 0.2 the x-value is
76
Male Business Graduate Salary B ~ Normal ( = 62 800, = 21 000)
Find the range of the central 60% of salaries. pr(B < a ) = 0.2 a = $45 126 pr(B < b ) = 0.8 b a 0.6 0.2 For b there is a lower-tail area of 0.8. 62800
77
Male Business Graduate Salary B ~ Normal ( = 62 800, = 21 000)
P(X <= x) x For a lower-tail probability of 0.8 the x-value is
78
Male Business Graduate Salary B ~ Normal ( = 62 800, = 21 000)
Find the range of the central 60% of salaries. pr(B < a ) = 0.2 a = $45 126 pr(B < b ) = 0.8 b = $80 474 Range is to b a 0.6 0.2 Again, appropriate rounding could be discussed. 62800 $45 126 $80 474
79
Male Business Graduate Salary B ~ Normal ( = 62 800, = 21 000)
Find the salary that the highest 75% are above. A 62800 B
80
Male Business Graduate Salary B ~ Normal ( = 62 800, = 21 000)
Find the salary that the highest 75% are above. Which is the correct position of x? A B Unsure A 62800 B
81
Male Business Graduate Salary B ~ Normal ( = 62 800, = 21 000)
Find the salary that the highest 75% are above. A diagram is again necessary. “Highest” indicates an upper-tail area of The corresponding lower-tail area is 0.25. x 62800
82
Male Business Graduate Salary B ~ Normal ( = 62 800, = 21 000)
Find the salary that the highest 75% are above. pr(B < x ) = 0.25 x 0.75 0.25 A diagram is again necessary. “Highest” indicates an upper-tail area of The corresponding lower-tail area is 0.25. 62800
83
Male Business Graduate Salary B ~ Normal ( = 62 800, = 21 000)
P(X <= x) x For a lower-tail probability of 0.25 the x-value is
84
Male Business Graduate Salary B ~ Normal ( = 62 800, = 21 000)
Find the salary that the highest 75% are above. pr(B < x ) = 0.25 x = $48 636 x 0.75 0.25 Rounding, in context, could be discussed. 62800
85
Test & Exam Results Test Exam /25 /50 Student A 19 40
/25 /50 Student A Did Student A do better in the test or the exam? 19/25 = 76% 40/50 = 80% In a recent semester, a student got 19/25 = 76% in the mid-semester test and 40/50 = 80% in the final exam, did she do better in the test or exam? She got 80% for the exam and 76% for the test.
86
Test & Exam Results Test Exam /25 /50 Student A 19 40 Class results:
/25 /50 Student A Class results: Mean Std dev Did Student A do better in the test or the exam, in terms of place in the class? . . . but when being compared with other students in which assessment did the student do better? Lets see the summary statistics for the test and exam. Assume that they are Normally distributed results.
87
Test & Exam Results x Test 4.6 = 1.20 Exam x 19 x 40 13.5 -3 3
-3 3 Test 19 – 13.5 4.6 = 1.20 30.1 We want to compare the 19 questions correct in the test with the 40 questions correct in the exam – they are from two different distributions. This is like trying to compare apples and oranges. We need to put both of these results on the same scale. Let’s see how far each of these results are away from the mean of its distribution, in terms of std deviations. They will then be both on the same scale. Why is this scale effectively drawn from -3 to 3? 19 x Exam 40 x
88
Test & Exam Results x x Test 9.4 = 1.05 Exam x 19 x 40 13.5 -3 3
-3 3 Test 40 – 30.1 9.4 = 1.05 30.1 On this scale you can see that the test result is further above the mean than the exam result. The test is the better mark relative to the rest of the marks. We are finding the distance (i.e. difference) between a result X, and its mean µ, i.e., X – µ. How many std deviations is this? (X - µ)/σ 19 x Exam 40 x
89
Test & Exam Results x x x x Z ~ Normal (0,1) Test Exam 1.05 1.20 x 19
13.5 x 1.05 x 1.20 x x -3 3 Test 30.1 We are forming a new distribution, we end up with a Normal distribution which has a mean of 0 and a std dev of 1 which is called the standard Normal distribution, usually labelled with the r.v. Z. 19 x Exam 40 x
90
Test and Examination Results
Let T be test mark & E exam mark T ~ Normal (T = 13.5, T = 4.6) E ~ Normal (E = 30.1, E = 9.4) Test: z-score for 19 = Exam: z-score for 40 = This student did better in the in terms of ranking with students in the same course. = 1.20 Let’s return to our workbooks and record this. We need the calculate the z-score for each mark. The test mark is 1.2 standard deviations above the mean test mark but the exam mark is 1.05 standard deviations above the mean exam mark. Therefore the student did better in the test when compared with her ranking with students in the same course. = 1.05 test
91
Percentage Body Fat X ~ Normal ( = 10, = 4)
Find the z–score for these percentage body fat values for competitive swimmers: x = 18 z = 18 is sd the mean x = 6 z = 6 is sd the mean x = 12.6 z = is sd the mean 2 2 above -1 Now let’s make sure that we’ve got this z-score idea well and truly understood. The distribution of percentage body fat for competitive swimmers is modelled by a Normal(10,4) distribution. 18 is 8 above the mean, that is 2 standard deviations above the mean. So the z-score is 2. 6 is 4 below the mean, that is 1 standard deviation below the mean. So the z-score is -1. 12.6 is above the mean, so the z-score will be positive. The exact z-score is a bit difficult so we need a formula to help us. 1 below c. ? ? above
92
Working in Standard Units
The z-score for an observation x: z = First, the mean must be subtracted and then the result is divided by the standard deviation. Note that this formula is not on the formula sheet, but you should be able to understand this calculation from the z-score definition.
93
Percentage Body Fat X ~ Normal ( = 10, = 4)
Find the z–score for these percentage body fat values for competitive swimmers: x = 15.7 z = x = 8.9 z = = 1.425 = These are 2 opportunities to use the formula. So 15.7 is standard deviations above the mean of 10 and 8.9 is standard deviations below the mean of 10.
94
Percentage Body Fat X ~ Normal ( = 10, = 4)
Calculate the percentage body fat values for competitive swimmers with the following z-scores: z = 2.3 x is 2.3 sd above the mean x = z = -1.8 x is 1.8 sd below the mean × 4 = 19.2 In (f), x is 2.3 standard deviations above the mean, and so x = x 4. In (g), x is 1.8 standard deviations below the mean, and so x = 10 – 1.8 x 4 10 – 1.8 × 4 = 2.8
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.