Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Mining 2018/2019 Fall MIS 331 Chapter 7-A Sampliing Distribution,

Similar presentations


Presentation on theme: "Data Mining 2018/2019 Fall MIS 331 Chapter 7-A Sampliing Distribution,"— Presentation transcript:

1 Data Mining 2018/2019 Fall MIS 331 Chapter 7-A Sampliing Distribution,
Confidence Interval Estimation and Hypothesis Testing for Variance of a Population Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall

2 Outline Sampling Distributio of Sample Variances
Confidence Interval Estimation for the Variance Tests of the Variance of a Normal Distribution

3 Sampling Distributions of Sample Variances
6.4 Sampling Distributions Sampling Distributions of Sample Means Sampling Distributions of Sample Proportions Sampling Distributions of Sample Variances Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall

4 Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Sample Variance Let x1, x2, , xn be a random sample from a population. The sample variance is the square root of the sample variance is called the sample standard deviation the sample variance is different for different random samples from the same population Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall

5 Sampling Distribution of Sample Variances
The sampling distribution of s2 has mean σ2 If the population distribution is normal, then Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall

6 Chi-Square Distribution of Sample and Population Variances
If the population distribution is normal then has a chi-square (2 ) distribution with n – 1 degrees of freedom Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall

7 The Chi-square Distribution
The chi-square distribution is a family of distributions, depending on degrees of freedom: d.f. = n – 1 Text Appendix Table 7 contains chi-square probabilities 2 2 2 d.f. = 1 d.f. = 5 d.f. = 15 Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall

8 Defined Chi-square distribution defined as: 2v = vi=1Zi2
sum of squares of v standard normal distributions Zi = N(0,1)

9 Expected value of a chi-square distribution with v degrees of freedom is v
E[2v] = v Variance of a chi-square distribution with v degrees of freedom is 2v Var[2v] = 2v

10 Since (n-1)s2/2 has a chi-square distribution with df: n-1
E[(n-1)s2/2] = n-1 ((n-1)/2)E[s2] = n-1 E[s2] = 2, unbiesd estimation of popultion variance Similarly Var[(n-1)s2/2] = 2(n-1) ((n-1)2/4)Var[s2] = 2(n-1) Var[s2] = 24/(n-1)

11 Examples of Squares of Distributions
a: side of a square plate – distributed normally with a mean and std. area of the plate: A = a2, square of a normal distribution

12 Discrete Distribution Example
X has values -2,-1,0,1,2 with equal probabilities of 1/5 a discrete uniform distribution what is pdf of X2? X2 can take values 0,1,4 p(0) = 1/5, p(1) = 2/5, p(2) = 2/5 X2 not symetric and skewed

13 E(xi-)2 = 2 definition of variance
E[ni=1(xi-)2] = n2 expected value of n independent identical distributed (iid) random variables or E[ni=1(xi-)2]/n = 2 unbiesd estimation of population variance when population mean is known

14 if  is known: (xi-)2/2 = z2i =. 21 by definition of the chi-square distribution zi = xi-) /, ni=1(xi-)2 /2 ] = (1/2)ni=1(xi-)2]  2n by definition of chi-square as each of these terms in the sumation are standard normal squares E[2n] = n

15 if  is not known – estimate by xbar,
E[ni=1(xi-xbar)2 ] = (n-1)2 shown in Appandix of Chapter 6 of Newbold 8 independnet of distribution of Xi. sum of n quantities on the left makes only n-1 2.whan mean of the distribution is etimated by xbar

16 Exercise Show with n = 2 E[2i=1(xi-xbar)2 ] = 2
where xbar = (x1+x2)/2

17 if  is not known – estimate by xbar,
for a normally distributred Xi, ni=1(xi-xbar)2 /2 = 2n-1 without proof taking expected values of both sides E[ni=1(xi-xbar)2 /2] = E[2n-1] = (n-1) E[ni=1(xi-xbar)2 ] = (n-1)2 dividing by n-1. E[ni=1(xi-xbar)2 /(n-1) ] = 2 unbiesd

18 ni=1(xi-xbar)2 /2 = 2n-1
dividing by n-1 and multiplying by 2 ni=1(xi-xbar)2 /(n-1) =  22n-1 /(n-1) s2 = 22n-1 /(n-1) or 2n-1 = (n-1)s2 /2 n-1 times sample variance over population variance is distributed as chi-square with n-1 degree of freedom

19 Degrees of Freedom (df)
Idea: Number of observations that are free to vary after sample mean has been calculated Example: Suppose the mean of 3 numbers is 8.0 Let X1 = 7 Let X2 = 8 What is X3? If the mean of these three values is 8.0, then X3 must be 9 (i.e., X3 is not free to vary) Here, n = 3, so degrees of freedom = n – 1 = 3 – 1 = 2 (2 values can be any numbers, but the third is not free to vary for a given mean) Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall

20 Table 7 in Appandix d.f. versus probabilities for critical values P(210 < KL) = 0.05 KL = hence P(210 < 3.940) = 0.05 P(210 > KU) = 0.05 KU = hence P(210 > 18.31) = 0.05

21 Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Chi-square Example A commercial freezer must hold a selected temperature with little variation. Specifications call for a standard deviation of no more than 4 degrees (a variance of 16 degrees2). A sample of 14 freezers is to be tested What is the upper limit (K) for the sample variance such that the probability of exceeding this limit, given that the population standard deviation is 4, is less than 0.05? Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall

22 Finding the Chi-square Value
Is chi-square distributed with (n – 1) = 13 degrees of freedom Use the the chi-square distribution with area 0.05 in the upper tail: 213 = (α = .05 and 14 – 1 = 13 d.f.) probability α = .05 2 213 = 22.36 Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall

23 Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Chi-square Example (continued) 213 = (α = .05 and 14 – 1 = 13 d.f.) So: or (where n = 14) so If s2 from the sample of size n = 14 is greater than 27.52, there is strong evidence to suggest the population variance exceeds 16. Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall

24 Confidence Interval Estimation for the Variance
7.5 Confidence Intervals Population Mean Population Proportion Population Variance (From a normally distributed population) σ2 Known σ2 Unknown Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch. 7-24

25 Confidence Intervals for the Population Variance
Goal: Form a confidence interval for the population variance, σ2 The confidence interval is based on the sample variance, s2 Assumed: the population is normally distributed Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch. 7-25

26 Confidence Intervals for the Population Variance
(continued) The random variable follows a chi-square distribution with (n – 1) degrees of freedom Where the chi-square value denotes the number for which Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch. 7-26

27 P(2n-1 > 2n-1,/2 ) = /2 P(2n-1 > 2n-1,1-/2 ) = 1 - /2 or P(2n-1 < 2n-1,1-/2 ) = /2 Finally, P(2n-1,1-/2 < 2n-1 < 2n-1,/2) = 1 - /2 - /2=1-

28 Example Find two numbers such that probability that chi-square with d.f. 6 is laying between tham is 0.90 1-  = 0.90 P(26,0.95 < 26 < 26,0.05) =0.90 The two numbers 26,0.95 = 1.635 26,0.05 = hence P( < 26 < ) =0.90

29 Confidence Intervals for the Population Variance
(continued) The 100(1 - )% confidence interval for the population variance is given by Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch. 7-29

30 Derivation P(2n-1,1-/2 < 2n-1 < 2n-1,/2) = 1 - 
2n-1 = (n-1)s2/2. substituting for 2n-1, P(2n-1,1-/2 < (n-1)s2/2 < 2n-1,/2) = 1 -  rearranging: P(2n-1,1-/2/(n-1)s2 < 1/2 < 2n-1,/2 /(n-1)s2)=1- P((n-1)s2/2n-,/2 < 2 < (n-1)s2/2n-1,1-/2) = 1-

31 Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Example You are testing the speed of a batch of computer processors. You collect the following data (in Mhz): Sample size 17 Sample mean 3004 Sample std dev 74 Assume the population is normal. Determine the 95% confidence interval for σx2 Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch. 7-31

32 Finding the Chi-square Values
n = 17 so the chi-square distribution has (n – 1) = 16 degrees of freedom  = 0.05, so use the the chi-square values with area in each tail: probability α/2 = .025 probability α/2 = .025 216 216 = 6.91 216 = 28.85 Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch. 7-32

33 Calculating the Confidence Limits
The 95% confidence interval is Converting to standard deviation, we are 95% confident that the population standard deviation of CPU speed is between 55.1 and Mhz Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch. 7-33

34 Tests of the Variance of a Normal Distribution
9.6 Goal: Test hypotheses about the population variance, σ2 (e.g., H0: σ2 = σ02) If the population is normally distributed, has a chi-square distribution with (n – 1) degrees of freedom Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Chap 11-34

35 Tests of the Variance of a Normal Distribution
(continued) The test statistic for hypothesis tests about one population variance is Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Chap 11-35

36 Decision Rules: Variance
Population variance Lower-tail test: H0: σ2  σ02 H1: σ2 < σ02 Upper-tail test: H0: σ2 ≤ σ02 H1: σ2 > σ02 Two-tail test: H0: σ2 = σ02 H1: σ2 ≠ σ02 a a a/2 a/2 Reject H0 if Reject H0 if Reject H0 if or Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Chap 11-36

37 Newbold 9.47 Test the hypothesis H0:2 <=100 againts H1 2 >100
a) s2 = 165, n=25 b) s2 = 165, n=29 c) s2 = 159, n=25 d) s2 = 67, n=38

38 Solution

39 Solution

40 Newbold 7.48 new safety device random sample for 8 days
management concenrs about variability test the null hypothesis variance less than or equal to 500 at a significance level of 10%

41 Solution


Download ppt "Data Mining 2018/2019 Fall MIS 331 Chapter 7-A Sampliing Distribution,"

Similar presentations


Ads by Google