, are some parameters of the population. In general, , are not known.Suppose we want to know (say), we take samples and we will know and s So, what can we say about ? Can we say is ? Can we say is close to ? But how close is it?
To estimate by a number it is too “dangerous”! It is much “safer” to estimate by an interval. Based on the data from random samples, we can have sample mean and variance; suppose by some further calculation,we can find an interval (L,U), such that P(L< < U) = 95 % (say), that means there’s 95% chance (L,U) traps . We say (L,U) is a 95% confidence interval for
In general, to estimate a parameter , if we can find a random interval (L,U) such that P(L < < U) = k%, (L,U) is called a k% confidence interval for But how to find (L,U)? In AL, you are required to construct confidence interval C.I. for (1) population mean and (2) population proportion.
Let’s talk about C.I. for . By CLT, Task: Find 95% C.I. for . Suppose (L,U) is a 95% C.I. for , P(L < < U) = 95% --- (1) By table, P( 1.96 < z < 1.96) = 95% Rearranging, Comparing (1), 95% C.I. for is
is a 95% C.I. for . How about a 99% C.I. for ? Ans: since P( 2.58 < z < 2.58) = 99% In general, a % C.I. for is where P( z c < z < z c ) = % % is called the confidence level.
is a % C.I. for . Note 1: z c , hence width of C.I. Reasonable! To ensure more chance to “trap” the true , we can have wider width of C.I. But it is close to meaningless to mention C.I. of large range, e.g. if we claim that we have 100% confident that the true lies on ( , ). Note 2: In practice, we don’t even know , then we should use sample s.d. s to replace . More precisely, use s [n/(n 1)] instead of s.
E.g. 26 Masses of random sample (in g) are 182, 184, 176, 178, 181, 180, 183, 178, 179, 177, 180, 183, 179, 178, 181, 181. If this sample came from a normal population = 10g, obtain a 95% C.I. for mean mass of the population. For the sample, Hence 95% C.I. for is = (175.1, 184.9)
In previous question, (175.1,184.9) is a 95% C.I. for the true mean . Am I right in saying that there is 95% chance that lies in (175.1,184.9) ? Note 1: is NOT a random variable! While, the interval (L,U) is a random interval. Note 2: We can just say that we are 95% confident that lies on (L,U). How to comprehend this ?
Population Sample 1 (L1,U1)(L1,U1) Sample 2 (L2,U2)(L2,U2) Sample n (Ln,Un)(Ln,Un)......
If (L 1,U 1 ), (L 2,U 2 ), …, (L n,U n ) are 95% C.I. then there should be 95% of theses intervals (L 1,U 1 ), (L 2,U 2 ), …, (L n,U n ) includes the true mean . X For 20 95%C.I. So (175.1,184.9) is just one of the C.I.s and it may or may not trap . there should be 19 C.I. trap the true mean.
An example. Suppose {X 1, X 2,…, X 7 } = population set. We take 2-element samples. (n = 2) Total possible way = 7 C 2 = 21 Hence we can construct 21 different C.I.s We consider the 90% C.I. See the WORDS document now.
We know 21 C.I.s, 19 of them do trap . Please notice that 21 90% 19 Also, the sample size = 2, too small! Instead of using We use the adjusted sample s.d.. Refer to P.81 note (ii) in text book.
E.g. 27 A certain population, = 6. How large a sample size => width of 95% C.I. for = 0.5 Half width = %C.I.= n = 2209
Do you agree?
If is known, C.I. is If is unknown, C.I. is Precisely,
E.g. 28 A sample of 100 plugs with mean diameter cm. If s.d. of these plugs is 0.12, estimate the population mean diameter at 95% confident level. Now, we don’t know , so use sample s.d. s = (25.076,25.124)
E.g. 31 (a) A two-stage rocket to be fired to put a satellite into orbit. Due to variation of the specified impulse in the second stage, the velocity imparted in this stage will be normally distributed about 4095 ms 1 with s.d. 21 ms 1 Find 95% confident limits for the velocity imparted in this stage. 95% C.I. = = = (4054, 4126)
(b) In the first stage, the velocity imparted will be normally distributed about 3990 ms 1 with s.d. 20 ms 1 due to variation of the specific impulse and (independently) with s.d. 8 ms 1 due to variation in the time of burning of the change. Find 90% confident limits for the velocity imparted in this stage. s 2 = 20, s 3 = 8 Combined s.d. == % C.I. = = (3990 21.54, 21.54) = (3955,4025)
(c) Given that the final velocity of 8000 ms 1 is required to go into orbit and that the second stage fires immediately after the first, find the probability of achieving orbit. v = 4095 s 2 = 21 2 v 1 = 3990 s 1 2 = Let V = final velocity E(V) = Var(V) = = 8085 = 905 V ~ N(8085,905) P(V > 8000) = =
Prerequisite on E.g. 32 Uniform distribution a b f(x)f(x) r x
E.g. 32 To add 10 4 numbers, each of which was rounded off with accuracy 10 m degree. Assuming that the errors arising mutually independent and uniformly distributed on ( 0.5 10 m, 0.5 10 m ), find the limits in which the total error will lie with probability Let X = total error. X = X 1 + X 2 +…+ X Since X i is uniformly distributed, = 0
By CLT, By table, P( 2.56 < z < 2.56) = 0.99 Hence the limits are Hence we can construct the 99% C.I. for total error X and this estimation is far more better! Let’s use m = 3 as an example. |X| 10 4 = 5, too large for estimation! But the C.I. is ( ,0.0739) only, more “precise”.
Now, let’s talk about C.I. for proportion Suppose you want to look into the smoker’s proportion in H.K. You have interviewed with 100 H.K. people and discovered 60 smokers. Can we say the smokers’ proportion of H.K. people is 60% ? However, we can construct a C.I. to estimate the true proportion!
Let n be the sample size. Let m be the number of “success” (i.e. “smokers” in the e.g.) Let p be the true proportion (of “success”) Suppose the population is very large, then m has a binomial distribution such that m ~ B(n, p) Suppose further that n is reasonably large. We can use “normal” to approximate “binomial”. m ~ N(np, npq)
Let P s be the proportion on “success” in sample. HencePs ~Ps ~ In practice, p is unknown. We use P s Q s /n to estimate pq/n. Thus Ps ~Ps ~approximately
Hence Rearranging, Hence 95% C.I. for population proportion p is
In general, % C.I. for population proportion p is where P( z c < z < z c ) = % n > 30 is required.
E.g items, 240 defective, find 95% C.I. for the probability p that an item is defective. Ps =Ps = = 0.06 Q s = 1 0.06 = 0.94 = Required 95% C.I. is = (0.0526, )
E.g. 35 Suppose that we know p = 0.6 for a Bernoulli population. How large is the size is necessary to be 95% confident that the obtained value p lies in (0.5,0.7) ? (0.5,0.7) = (0.6 0.1, ) Let n = sample size. Hence, for 95% confidence, 0.1 = On solving, n 92
E.g. 37 (a) Of 50 houseflies, independently subjected to the same insecticide, 38 were killed. Obtain an estimate of p, the probability that a housefly is killed by the insecticide. Find also the standard error of p. P s = Standard error =
(b) Now conduct a larger experiment with the same insecticide so that an estimate with standard error of about 0.03 can be quoted. On the basis of the information in the experiment already conducted, how many houseflies needed ? Standard error = So n = 203 (c) To be absolutely sure of obtaining the desired accuracy, how many houseflies should be taken ? Standard error depends on P s. n = 203 makes standard error = 0.03 only when P s = 19/25. So what n to ensure s.e. 0.03 irrespective of P s ?
For fixed n, s.e. is a function of P s. s.e. 0.03 means max. of s.e. = Very easy to show that P s (1-P s ) attains max. when p = 0.5 Hence s.e. Then set n 279 i.e. Though different samples yield different P s, it is sure that s.e. not greater than 0.03 if we take n = 279 (or more)