Consolidation & Review Inference
Sample Proportions Column of zeros and ones : estimated proportion or fraction equals # of successes/ # of observations = k/n Where k= Binomial[mean=np, variance=p*(1-p)n] E[ } = E[k/n] =E[(1/n)*k] = 1/n E[k] =( 1/n)np=p Var[ ] = Var[k/n] = (1/n)2 Var k =(1/n)2 np(1-p) Var[ ] = p(1-p)/n is a point estimate of p The estimate of the Var of is *(1 - )/n
Sample Proportions example from Lab 3 one of the ten columns of 50 observations of ones and zeros with the smallest proportion of 0.32 Var = 0.32*0.68/50 =0.004352 with square root, i.e standard deviation of 0.066 A 95 % confidence interval for an estimate of p from this sample is: Prob [-1.96≤(0.32-p)/0.066≤1.96]=0.95 Prob[-1.96*0.066≤(0.32-p)≤1.96*0.066]=0.95 Prob[-0.13≤(0.32-p)≤0.13]=0.95 Prob[0.13≥(p-0.32)≥-0.13]=0.95 Prob[0.45≥p≥ 0.19]=0.95
Note: This 95 % confidence interval does not include p=0.5, the population parameter chosen for the simulation, illustrating that 5 % of the time the 95% confidence interval will not include the true value! The Prob [-1.96≤(0.32-p)/0.066≤1.96]=0.95 Is the same as Prob[-1.96≤z≤1.96]=0.95, where, ( - p)/ = z = (0.32 – p)/0.066, in this example We can use the normal distribution approximation to the binomial in this example since n*p = 50*0.32 ≥ 5 and n*(1-p)≥5
a Z value of 1.96 leads to an area of 0.475, leaving 0.025 in the Upper tail
Interval Estimation The conventional approach is to choose a probability for the interval such as 95% or 99%
So z values of -1.96 and 1.96 leave 2.5% in each tail
1.96 -1.96 2.5% 2.5%
Application of Sample Proportions
Oct 13 Rasmussen Poll
Field Poll Margin of Error Variance of = *(1 - )/n = 0.47* 0.53/599 = 0.000416 = √0.000416 = 0.0204 So two standard deviations is about 0.041 0r 4.1%, i.e the margin of error is plus or minus 4.1 percentage points
Inferring the unknown population mean from a sample mean Example from Lab 3: simulate the population as uniform, with random variable x, 0≤x≤1, and density f(x) =1 Note: Note the expected value of x, E[x]= Var[x] = E[x-E(x)]2 = E[x – E(x)]2 =E{x2 -2xE[x]+E[x]2} Var[x] = {E(x2) – 2E[x]*E[x] + E[x]2 } = E[x2] –[Ex]2 E[x2] = = 1/3 Var[x] = E[x2] – E[x]2 = 1/3 –[1/2]2 = 1/3-1/4 = (4 -3)/12 =1/12 X~ Uniform(mean=1/2, Variance=1/12)
In lab 3 we drew a random sample of size 50 from this uniform distribution and calculated the sample mean: From the central limit theorem, we know, and we saw in lab 3, that the sample mean is distributed normally
Central tendency and dispersion of sample mean Where μ is the population mean. In the simulation from Lab 3 using the uniform distribution, we knew that μ = 0.5 Where σ2 is the variance of x. In the simulation from Lab 3 using the uniform distribution, we knew the σ2 =1/12.
Hypothesis testing Example from Lab 3 for sample proportions Step one: formulate the hypotheses Null hypothesis, H0: p = 0.5 Alternative hypothesis, HA : p<0.5 Step two: Identify a test statistic Where the value for p is from the null hypothesis, so z= (0.32 - 0.5)/0.066 = 0.18/0.066 = 2.73 If the null hypothesis were true, what is the probability of getting a test statistic of this size
Hypothesis Testing: 4 Steps Formulate all the hypotheses Identify a test statistic If the null hypothesis were true, what is the probability of getting a test statistic this large? Compare this probability to a chosen critical level of significance, e.g. 5% 19
20 a Z value Of 2.73 leads to an area of 0.4968, leaving 0.0032 in the Upper tail, and Hence 0.0032 In the lower tail. If you choose a risk level of .05, i.e. α = 0.05 for The probability A type I error, Then reject H0 20
0.0032 -2.73
0.050 -1.645