Problem: Diagnosing Spina Bifida The procedure of amniocentesis involves drawing a sample of the amniotic fluid that surrounds an unborn child in its mother’s womb. High concentration of alpha fetoprotein can indicate the condition spina bifida. Concentration of alpha fetoprotein tends to increase with the size of the foetus. Amniocentesis results in miscarriage for 1%. Preliminary tests involve measuring the level of alpha fetoprotein in the mother’s urine.
Problem: Diagnosing Spina Bifida For mothers with normal foetuses, the mean level of alpha fetoprotein is 15.73 moles/litre with a standard deviation of 0.72 moles/litre. For mothers carrying foetuses with spina bifida, the mean is 23.05 and the standard deviation is 4.08. In both groups the distribution of alpha fetoprotein appears to be approximately Normally distributed. 23.05 15.73
Problem: Diagnosing Spina Bifida To operate a diagnostic test for spina bifida, set a threshold concentration of alpha fetoprotein, T, say. If the alpha fetoprotein level is below T, then the foetus is diagnosed as not having spina bifida. If the level is above T, then further testing is required.
Problem: Diagnosing Spina Bifida If T was set at 17.80 moles/litre: What is the probability that a foetus with spina bifida is correctly diagnosed? What is the probability that a foetus not suffering from spina bifida is correctly diagnosed? If they wanted to ensure that 99% of foetuses with spina bifida were correctly diagnosed, at what level should they set T ? What are the implications of setting T at this level?
Chapter 6 Continuous Random Variables If a random variable, X, can take any value in some interval of the real line it is called a continuous random variable. Eg Hg levels, height, weight, alpha fetoprotein concentration, cell radius, etc. (i.e. usually ‘measures’ )
The Standardized Histogram §6.1 pages 231-233 The Standardized Histogram Example: Dietary Carbohydrate in the Workforce The average daily intake of carbohydrate in the diet of 5929 people.
The Standardized Histogram The histogram of the data shows the carbohydrate intake: 2 4 6 8 Carbohydrate (g/day) . Is unimodal (modal class 200 – 225 g/day)
The Standardized Histogram The histogram of the data shows the carbohydrate intake: 2 4 6 8 Carbohydrate (g/day) . Skewed to larger values (skewed right)
The Standardized Histogram The histogram of the data shows the carbohydrate intake: 2 4 6 8 Carbohydrate (g/day) . Has huge variability (highest consumers more than 10 times that of lowest consumers)
The Standardized Histogram Area between a = 225 and b = 375 shaded Shaded area = 0.483 (Corresponds to 48.3% of observations) 6 8 2 5 . 4 375
The Standardized Histogram The standardized histogram adjusts the height of the rectangle or bar to relative freq. or proportion divided by width so that Area = Estimated Probability. 6 8 2 5 . 4 375 Shaded area = 0.483 (Corresponds to 48.3% of observations)
The Standardized Histogram i.e. The area of the ith rectangle tells us what proportion of the data lie in the ith class interval. Shaded area = 0.483 (Corresponds to 48.3% of observations) 6 8 2 5 . 4 375
The Standardized Histogram For a standardized histogram: The vertical scale is : Relative frequency / interval width (density scale) Total area under the histogram = 1 The proportion of the data between a and b is the area under the histogram between a and b.
The Standardized Histogram With approximating curve Carbohydrate (g/day) 2 4 6 8 .
The Standardized Histogram Area between a = 225 and b = 375 shaded .04 600 800 225 375 .002 Shaded area = .486 (cf. area = .483 for histogram) This area is calculated to be 0.486 and is very close to the proportion of people who had carbohydrate intake of between 225 and 375 g/day.
Radius of Maliginant Tumor Cells In JMP select Histogram Options > Density Axis to create a standardized histogram The histogram on the left is for cell radii of malignant tumor fine needle aspirations in the breast cancer study from your 2nd assignment. X = radius of a randomly selected malignant tumor cell We estimate that, P(14 < X < 15) = .10 or a 10% chance
AFP Levels in Spina Bifida Cases In JMP select Histogram Options > Density Axis to create a standardized histogram The histogram on the left is AFP levels found in the urine of mothers carrying a fetus with spina bifida. X = AFP level of random select mother carrying fetus with spina bifida. We estimate that, P(22.5 < X < 25) = 2.5 X .10 = .25 or a 25% chance
Smooth Density Curves Take a standardized histogram, decrease the width of the class intervals and increase the number of observations. Then the top of the histogram tends to a smooth curve.
Histogram Density Curves as sample size increases! (AFP Levels)
Properties of the Probability Density Function (p.d.f.) 1. f(x) 0 (i.e. the p.d.f. curve stays above the x-axis) 2. P(a X b) = area from a to b beneath the p.d.f curve 3. Area under the p.d.f. curve = 1
Endpoints of Intervals For a continuous random variable, X, endpoints of intervals are unimportant. P(a X b) = P(a < X b) = P(a X < b) = P(a < X < b) = area from a to b between the p.d.f. curve and the x-axis. (Inclusion or exclusion of the endpoints will not change the area.)
The Normal Distribution Limiting smooth bell shaped symmetric curve is called the Normal p.d.f. curve. Is symmetric about the mean. Mean = Median If a random variable, X, has a Normal distribution with a mean and a standard deviation we write: X ~ Normal ( , ) 50% Mean m parameters
The Normal Distribution The Normal distribution is important because: it fits a lot of data reasonably well; it can be used to approximate other distributions; it is important in statistical inference (see later work).
The Normal Distribution A Normal distribution is solely determined by and . (a) Changing m Shifts the curve along the axis
The Normal Distribution (b) Increasing s Increases the spread and flattens the curve A Normal distribution is solely determined by and .
Spina Bifida Example Let X be the AFP level found in the urine of mother carrying a foetus with spina bifida. We will assume that the AFP level is normally distributed with a mean of m = 23.05 mmoles/L and a standard deviation of s = 4.08 mmoles/L . AFP Levels for Mothers Carrying Spina Bifida Foetus
Spina Bifida Example (Empirical Rule) Approximately 68 % of mothers in this population will have a AFP levels within 1 standard deviation of the mean. i.e., approximately 68 % of mothers in this population will have AFP levels between 23.05 – 4.08 and 23.05 + 4.08 = between 18.97 and 27.13
Spina Bifida Example (Empirical Rule) Approximately 95 % of mothers in this population will have a AFP levels within 2 standard deviation of the mean. i.e., approximately 95 % of mothers in this population will have AFP levels between 23.05 - 2 4.08 and 23.05 + 2 4.08 = between 14.89 and 31.21
Spina Bifida Example (Empirical Rule) Approximately 99.73 % of mothers in this population will have a AFP levels within 3 standard deviation of the mean. i.e., approximately 99.73% of mothers in this population will have AFP levels between 23.05 - 3 4.08 and 23.05 - 3 4.08 = between 10.81 and 35.29
The Normal Distribution For the Normal Distribution: A random observation has approximately: 68% chance of falling within 1s of ; 95% chance of falling within 2s of ; 99.7% chance of falling within 3s of . Or: In a Normal distribution, approximately: 68% of observations are within 1s of ; 95% of observations are within 2s of ; 99.7% of observations are within 3s of .
The Normal Distribution Probabilities and numbers of standard deviations Shaded area = 0.683 68% chance of falling between m - s and m + s m - s m m + s Shaded area = 0.954 95% chance of falling between m - 2s and m + 2s m - 2s m m + 2s Shaded area = 0.997 99.7% chance of falling between m - 3s and m + 3s m - 3s m m + 3s
Problem: Diagnosing Spina Bifida For mothers with normal foetuses, the mean level of alpha fetoprotein is 15.73 moles/litre with a standard deviation of 0.72 moles/litre. For mothers carrying foetuses with spina bifida, the mean is 23.05 and the standard deviation is 4.08. In both groups the distribution of alpha fetoprotein appears to be approximately Normally distributed. Given this information we want to be able to find probabilities associated with these distributions. 23.05 15.73 For example we might like to find: P(X > 17.8) or P(19 < X < 25) etc… for either group.
Obtaining Probabilities Normal distribution probabilities can be obtained from all statistical packages by giving the mean and standard deviation of the distribution. Most tables give the value of P(X x). i.e., cumulative or lower tail probabilities. OR x Area = P(X x)
Obtaining Probabilities Basic method for obtaining probabilities 1. Sketch a Normal curve, marking on the mean and values of interest. Shade the area under the curve corresponding to the required probability. 3. Convert all values to their z-scores 4. Obtain the desired probability using the normal table in the front inside cover of your text, or better yet use JMP.
Standard Normal Distribution GO TO NOTES ON STANDARD NORMAL DISTRIBUTION
Original problem: Diagnosing Spina Bifida 23.05 15.73 Original problem: Diagnosing Spina Bifida Recall: For normal foetuses =15.73, = 0.72 and for foetuses with spina bifida = 23.05 and = 4.08. Assume the threshold for detecting spina bifida is set at 17.8. (A foetus would be diagnosed as not having spina bifida if the fetoprotein level is below 17.8)
Original problem: Diagnosing Spina Bifida 23.05 15.73 Original problem: Diagnosing Spina Bifida a) What is the probability that a foetus not suffering from spina bifida is correctly diagnosed? Let X be level of fetoprotein in normal foetus X ~ Normal (15.73, 0.72) What is P(X < 17.8)? P(X < 17.8) = P(Z < z-score for 17.8) 15.73 17.8 z-score = (17.8 – 15.73)/.72 = 1.13/.72 = 1.57 P(X < 1.57) = .9420 1.57
Original problem: Diagnosing Spina Bifida 23.05 15.73 Original problem: Diagnosing Spina Bifida b) What is the probability that a foetus with spina bifida is correctly diagnosed? Let Y be the level of fetoprotein in a spina bifida foetus. Y ~ Normal (23.05, 4.08) P(Y > 17.8) = P(Z > z-score for Y = 17.8) P(Z > -1.29) = 1 - P(Z < -1.29) = 1 – 0.099 = 0.901 z-score = (17.8 – 23.05)/4.08 = -5.25/4.08 = -1.29 23.05 17.8 -1.29
Original problem: Diagnosing Spina Bifida 23.05 15.73 Original problem: Diagnosing Spina Bifida If they wanted to ensure that 99% of foetuses with spina bifida were correctly diagnosed, at what level should they set T ? Find a value T so that if Y ~ Normal (23.05, 4.08) we will have P(Y > T) = .9900 or P(Y < T) = .0100 T = 13.54 ensures 99% of foetuses with spina bifida will be identified. This probability is called the sensitivity From Normal Table we find P(Z < -2.33) = .0100 thus T = m + s x z = 23.05 – 4.08 x 2.33 = 13.54 First find the z-score associated with T by finding z so that P(Z < z) = .0100
Standard Normal Probabilities in JMP Normal Probability Calculator.JMP from Tutorials section of course website. Here it is ready to calculate probabilities for the standard normal distribution. (m = 0, s = 1)
Arbitrary Normal Probabilities in JMP Change the mean and standard deviation columns to contain the desired values. For mothers carrying foetus with spina bifida: X ~ N(23.05,4.08), i.e. m=23.05 mmoles/liter & s=4.08 mmoles/liter Here we have found P(X < 17.8)