7 Statistical Data Treatment and Evaluation CHAPTER
These applications include these following: 1.Defining a numerical interval around the mean of a set of replicate analytical results within which the population mean can be expected to lie with a certain probability. (Confidence interval) 2.Determining the number of replicate measurements required to ensure that an experimental mean falls within a certain range with a given level of probability. 3. Determining at a given probability level whether the precision of two set of measurements differs. 4.Deciding whit a certain probability whether an apparent outlier in a set of replicate measurements is the result for gross error and can be rejected or retained.
7A Confidence intervals 1 、 We can establish an interval surrounding an experimentally determined mean within which the population mean μ is expected to lie with a certain degree of probability. This interval is Known as the confidence interval ( CI 可信區間) and the boundaries are called confidence Limits. ( CL 可信界限) 2 、 Example That is 99% probable that true population mean for a set of potassium measurements lies in the interval 7.25% ±0.15% (confidence interval) ;Thus the mean 7.10%to 7.40% (confidence Limits) 3. The size of the confidence interval, which is computed from the sample standard deviation, depends on how well the sample standard deviation s estimates the population standard deviation σ.If s is a good approximation of σ, The confidence interval can be significantly narrower.
7A-1 Finding the confidence interval When σ is Known or S is a good Estimate of σ 1. Fig 7-1 shows a series of five normal error curves. The relative frequency is plotted as a function of the quantity Z. 2.The shaded areas in each plot lie between the values of –Z and +Z that are indicated to the left and right of the curves. The numbers within the shaded areas are the percentage of the total area under the curve that is included within these values of Z
Chapter 7 p143 Figure 7-1Areas under a Gaussian curve for various values of ± z. (a) z=±0.67 ; 50 % of the area under any Gaussian curve
Figure 7-1Areas under a Gaussian curve for various values of ± z. (b) z=±1.28;
Chapter 7 p143 Figure 7-1Areas under a Gaussian curve for various values of ± z. (c) z=±1.64;
Chapter 7 p143 Figure 7-1Areas under a Gaussian curve for various values of ± z. (d) z=±1.96
Chapter 7 p143 Figure 7-1Areas under a Gaussian curve for various values of ± z (e) z=±2.58 ;
1.Relationships such as these allow us to define a range of values around a measurement result within which the true mean is likely to lie with a certain probability provided we have a reasonable estimate of σ. Example : 90 times out of 100, the true mean μ will fall in the interval x±1.64σ ;CL=90% 2 、 The Confidence level is the probability that the true mean lies with certain interval. 『可信水凖 』 3 、 The probability that a result is outside the confidence interval is often called the significance level. 4. If we make a single measurement X from a distribution of Known σ,We can say that the true mean should lie in the interval X±zσ with a probability dependent on Z.( table7-1) CI for μ = X±zσ For the mean of N measurements, the standard error of the mean, is used confidence interval (CI)
Values for Z at various confidence levels are found in table7-1.
Example 7-1 : Determine the 80 % and 95 % confidence intervals for (a) the first entry (1108 mg/ L glucose) in Example 6-2 and (b) the mean value (1100.3mg/L) for month 1 in the example. Assume that in each part, S = 19 is a good estimate of σ. (1108,1122,1075,1099,1115,1083,1100) Sol : (a ) from table 7-1,we see that Z = 1.28 and 1.96 for the 80 % and 95 % confidence levels. Substituting into Equation % CI = 1108 ± 1.28×19 = 1108 ± 24.3 mg/L 95 % CI = 1108 ± 1.96×19 = 1108 ± 37.2mg/ L 80 % CL lies in the to mg/L 95 % CL lies in the and mg/ L (b) 80 % CI = ± 1.28×19/ √7 = ± 9.2 mg/L 95 % CI = ± 1.96×19/ √7 = ± 14.1 mg/L 80 % is located in the interval between and mg/L 95 % is lies between and mg/L
Example 7-2 : How many replicate measurements in month 1 in Example 6-2 are needed to decrease the 95 % confidence interval to ± 10.0mg/L of glucose. ( σ=19 ; Z table7-1=1.96(95%)) sol : Conclude that 14 measurements are needed to provide a slightly better than 95 % chance that the population mean will lie within ±14 mg/L of the experimental mean.
The confidence interval for an analysis can be halved by carrying out four measurement.
7A-2 Finding the confidence interval when σ is unknown Limitations in time or in the amount of available sample prevent us from making enough measurements to assume s is a good estimate of σ. Confidence intervals are broader. Single measurement
Chapter 7 p143
Example 7-3 ; A chemist obtained the following data for the alcohol content of a sample of blood: % is 0.084,0.0089, and Calculate the 95 % confidence interval for the mean assuming (a) the three results obtained are the only indication of the precision of the method and (b) from previous experience on hundreds of samples, we know that the standard deviation of the method S = % and is a good estimate of σ Sol:
7D Detection of gross errors 7D-1 The Q test Q crit < Q exp The questionable result can be rejected with the indicated degree of confidence ( figure 7-6 ) Q crit > Q exp The questionable result can be retain The outlier could be the result of an undetected gross error. Hence, we must develop a criterion to decide whether to retain or reject the outlying data point.
Chapter 7 p162
Figure7-6 The Q test for outliers.
Example 7-11 : the analysis of a calcite sample yielded Cao percentages of 55.95, 56.00, 56.04,56.08,and The last value appears anomalous should it be retained or rejected at the 95 % confidence level ? Sol Qexp =︳ Xq-Xn ︱ / W = ︳ ︱ /( ) = 0.54 Qcrit at the 95 % confidence level is Because 0.54 < 0.71, We must retain the outlier at the 95 % confidence level.
8 Sampling, Standardization, and Calibration CHAPTER
Chapter 8 p176 Figure 8-1 Classification of analyses by sample size.
Figure 8-2 Classification of constituent types by analyte level. 177
Figure 8-3 Interlaboratory error as a function of analyte concentration. Note that the relative standard deviation dramatically increases as the analyte concentration decreases. In the ultratrace range, the relative standard deviation approaches 100%. (From W. Horowitz, Anal. Chem., 1982, 54, 67A-76A.)
179 Figure 8-4 Steps in obtaining a laboratory sample. The laboratory sample consists of a few grams to at most a few hundred grams. It may constitute as little as 1 part in 10 7 or 10 8 of the bulk material.
180
Figure 8-5 Generating 10 random numbers from 1 to 1000 by use of a spreadsheet. The random number function in Excel [=RAND()] generates random numbers between 0 and 1. The multiplier shown in the documentation ensures that the numbers generated in column B will be between 1 and To obtain integer numbers, we use the Format/Cells…… command on the menu bar, choose Number and then 0 decimal places. So that the numbers do not change with every recalculation, the random numbers in column B are copied and then pasted as values into column C using the Edit/Paste Special……command on the menu bar. In column C, the numbers were sorted in ascending order using Excel’s Data Sort…… command on the menu bar.
185 Figure 8-6 Steps in sampling a particulate solid.