Data Mining 2018/2019 Fall MIS 331 Chapter 7-A Sampliing Distribution, Confidence Interval Estimation and Hypothesis Testing for Variance of a Population Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Outline Sampling Distributio of Sample Variances Confidence Interval Estimation for the Variance Tests of the Variance of a Normal Distribution
Sampling Distributions of Sample Variances 6.4 Sampling Distributions Sampling Distributions of Sample Means Sampling Distributions of Sample Proportions Sampling Distributions of Sample Variances Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Sample Variance Let x1, x2, . . . , xn be a random sample from a population. The sample variance is the square root of the sample variance is called the sample standard deviation the sample variance is different for different random samples from the same population Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Sampling Distribution of Sample Variances The sampling distribution of s2 has mean σ2 If the population distribution is normal, then Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Chi-Square Distribution of Sample and Population Variances If the population distribution is normal then has a chi-square (2 ) distribution with n – 1 degrees of freedom Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
The Chi-square Distribution The chi-square distribution is a family of distributions, depending on degrees of freedom: d.f. = n – 1 Text Appendix Table 7 contains chi-square probabilities 2 2 2 0 4 8 12 16 20 24 28 0 4 8 12 16 20 24 28 0 4 8 12 16 20 24 28 d.f. = 1 d.f. = 5 d.f. = 15 Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Defined Chi-square distribution defined as: 2v = vi=1Zi2 sum of squares of v standard normal distributions Zi = N(0,1)
Expected value of a chi-square distribution with v degrees of freedom is v E[2v] = v Variance of a chi-square distribution with v degrees of freedom is 2v Var[2v] = 2v
Since (n-1)s2/2 has a chi-square distribution with df: n-1 E[(n-1)s2/2] = n-1 ((n-1)/2)E[s2] = n-1 E[s2] = 2, unbiesd estimation of popultion variance Similarly Var[(n-1)s2/2] = 2(n-1) ((n-1)2/4)Var[s2] = 2(n-1) Var[s2] = 24/(n-1)
Examples of Squares of Distributions a: side of a square plate – distributed normally with a mean and std. area of the plate: A = a2, square of a normal distribution
Discrete Distribution Example X has values -2,-1,0,1,2 with equal probabilities of 1/5 a discrete uniform distribution what is pdf of X2? X2 can take values 0,1,4 p(0) = 1/5, p(1) = 2/5, p(2) = 2/5 X2 not symetric and skewed
E(xi-)2 = 2 definition of variance E[ni=1(xi-)2] = n2 expected value of n independent identical distributed (iid) random variables or E[ni=1(xi-)2]/n = 2 unbiesd estimation of population variance when population mean is known
if is known: (xi-)2/2 = z2i =. 21 by definition of the chi-square distribution zi = xi-) /, ni=1(xi-)2 /2 ] = (1/2)ni=1(xi-)2] 2n by definition of chi-square as each of these terms in the sumation are standard normal squares E[2n] = n
if is not known – estimate by xbar, E[ni=1(xi-xbar)2 ] = (n-1)2 shown in Appandix of Chapter 6 of Newbold 8 independnet of distribution of Xi. sum of n quantities on the left makes only n-1 2.whan mean of the distribution is etimated by xbar
Exercise Show with n = 2 E[2i=1(xi-xbar)2 ] = 2 where xbar = (x1+x2)/2
if is not known – estimate by xbar, for a normally distributred Xi, ni=1(xi-xbar)2 /2 = 2n-1 without proof taking expected values of both sides E[ni=1(xi-xbar)2 /2] = E[2n-1] = (n-1) E[ni=1(xi-xbar)2 ] = (n-1)2 dividing by n-1. E[ni=1(xi-xbar)2 /(n-1) ] = 2 unbiesd
ni=1(xi-xbar)2 /2 = 2n-1 dividing by n-1 and multiplying by 2 ni=1(xi-xbar)2 /(n-1) = 22n-1 /(n-1) s2 = 22n-1 /(n-1) or 2n-1 = (n-1)s2 /2 n-1 times sample variance over population variance is distributed as chi-square with n-1 degree of freedom
Degrees of Freedom (df) Idea: Number of observations that are free to vary after sample mean has been calculated Example: Suppose the mean of 3 numbers is 8.0 Let X1 = 7 Let X2 = 8 What is X3? If the mean of these three values is 8.0, then X3 must be 9 (i.e., X3 is not free to vary) Here, n = 3, so degrees of freedom = n – 1 = 3 – 1 = 2 (2 values can be any numbers, but the third is not free to vary for a given mean) Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Table 7 in Appandix d.f. versus probabilities for critical values P(210 < KL) = 0.05 KL = 3.940 hence P(210 < 3.940) = 0.05 P(210 > KU) = 0.05 KU = 18.31 hence P(210 > 18.31) = 0.05
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Chi-square Example A commercial freezer must hold a selected temperature with little variation. Specifications call for a standard deviation of no more than 4 degrees (a variance of 16 degrees2). A sample of 14 freezers is to be tested What is the upper limit (K) for the sample variance such that the probability of exceeding this limit, given that the population standard deviation is 4, is less than 0.05? Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Finding the Chi-square Value Is chi-square distributed with (n – 1) = 13 degrees of freedom Use the the chi-square distribution with area 0.05 in the upper tail: 213 = 22.36 (α = .05 and 14 – 1 = 13 d.f.) probability α = .05 2 213 = 22.36 Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Chi-square Example (continued) 213 = 22.36 (α = .05 and 14 – 1 = 13 d.f.) So: or (where n = 14) so If s2 from the sample of size n = 14 is greater than 27.52, there is strong evidence to suggest the population variance exceeds 16. Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Confidence Interval Estimation for the Variance 7.5 Confidence Intervals Population Mean Population Proportion Population Variance (From a normally distributed population) σ2 Known σ2 Unknown Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch. 7-24
Confidence Intervals for the Population Variance Goal: Form a confidence interval for the population variance, σ2 The confidence interval is based on the sample variance, s2 Assumed: the population is normally distributed Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch. 7-25
Confidence Intervals for the Population Variance (continued) The random variable follows a chi-square distribution with (n – 1) degrees of freedom Where the chi-square value denotes the number for which Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch. 7-26
P(2n-1 > 2n-1,/2 ) = /2 P(2n-1 > 2n-1,1-/2 ) = 1 - /2 or P(2n-1 < 2n-1,1-/2 ) = /2 Finally, P(2n-1,1-/2 < 2n-1 < 2n-1,/2) = 1 - /2 - /2=1-
Example Find two numbers such that probability that chi-square with d.f. 6 is laying between tham is 0.90 1- = 0.90 P(26,0.95 < 26 < 26,0.05) =0.90 The two numbers 26,0.95 = 1.635 26,0.05 = 12.932 hence P(1.635 < 26 < 12.935) =0.90
Confidence Intervals for the Population Variance (continued) The 100(1 - )% confidence interval for the population variance is given by Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch. 7-29
Derivation P(2n-1,1-/2 < 2n-1 < 2n-1,/2) = 1 - 2n-1 = (n-1)s2/2. substituting for 2n-1, P(2n-1,1-/2 < (n-1)s2/2 < 2n-1,/2) = 1 - rearranging: P(2n-1,1-/2/(n-1)s2 < 1/2 < 2n-1,/2 /(n-1)s2)=1- P((n-1)s2/2n-,/2 < 2 < (n-1)s2/2n-1,1-/2) = 1-
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Example You are testing the speed of a batch of computer processors. You collect the following data (in Mhz): Sample size 17 Sample mean 3004 Sample std dev 74 Assume the population is normal. Determine the 95% confidence interval for σx2 Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch. 7-31
Finding the Chi-square Values n = 17 so the chi-square distribution has (n – 1) = 16 degrees of freedom = 0.05, so use the the chi-square values with area 0.025 in each tail: probability α/2 = .025 probability α/2 = .025 216 216 = 6.91 216 = 28.85 Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch. 7-32
Calculating the Confidence Limits The 95% confidence interval is Converting to standard deviation, we are 95% confident that the population standard deviation of CPU speed is between 55.1 and 112.6 Mhz Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch. 7-33
Tests of the Variance of a Normal Distribution 9.6 Goal: Test hypotheses about the population variance, σ2 (e.g., H0: σ2 = σ02) If the population is normally distributed, has a chi-square distribution with (n – 1) degrees of freedom Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Chap 11-34
Tests of the Variance of a Normal Distribution (continued) The test statistic for hypothesis tests about one population variance is Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Chap 11-35
Decision Rules: Variance Population variance Lower-tail test: H0: σ2 σ02 H1: σ2 < σ02 Upper-tail test: H0: σ2 ≤ σ02 H1: σ2 > σ02 Two-tail test: H0: σ2 = σ02 H1: σ2 ≠ σ02 a a a/2 a/2 Reject H0 if Reject H0 if Reject H0 if or Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Chap 11-36
Newbold 9.47 Test the hypothesis H0:2 <=100 againts H1 2 >100 a) s2 = 165, n=25 b) s2 = 165, n=29 c) s2 = 159, n=25 d) s2 = 67, n=38
Solution
Solution
Newbold 7.48 new safety device random sample for 8 days 618 660 638 625 571 598 639 582 management concenrs about variability test the null hypothesis variance less than or equal to 500 at a significance level of 10%
Solution