ANALYTICAL PROPERTIES PART II ERT 207 ANALYTICAL CHEMISTRY SEMESTER 1, ACADEMIC SESSION 2015/16
Overview CONFIDENCE INTERVALS STUDENT’S T / T STATISTICS STATISTICS AIDS TO HYPOTHESIS TESTING COMPARISON OF TWO EXPERIMENTAL MEANS ERRORS IN HYPOTHESIS TESTING COMPARISON OF VARIANCES ANALYSIS OF VARIANCE bblee@unimap
CONFIDENCE INTERVALS The confidence interval for the mean is the range of values within which the population mean (μ) is expected to lie with a certain probability. Sometimes the limits of the interval are called confidence limits. The size of the confidence interval, which is computed from the sample standard deviation, depends on how well the sample standard deviation (s) estimates the population standard deviation (σ). bblee@unimap
CONFIDENCE INTERVALS Figure 1 shows a series of five normal error curves. In each, the relative frequency is plotted as a function of the quantity z, which is the deviation from the mean divided by the population standard deviation. The numbers within the shaded areas are the percentage of the total area under the curve that is included within these values of z. bblee@unimap
Figure 1: Areas under a Gaussian curve for various values of ±z. CONFIDENCE INTERVALS (a) (b) (c) Figure 1: Areas under a Gaussian curve for various values of ±z. (d) (e) bblee@unimap
CONFIDENCE INTERVALS From Figure 1 (a): 50% of the area under any Gaussian curve is located between -0.67σ and +0.67σ. We may assume 50 times out of 100 the true mean μ will fall in the interval of x ± 0.67σ. Confidence level: The probability that the true mean lies within a certain interval It is often expressed as a percentage. bblee@unimap
CONFIDENCE INTERVALS Figure 1 (a): The confidence level is 50% and the confidence interval is from -0.67σ to +0.67σ. Significance level: the probability that a result is outside the confidence interval. A general expression for the confidence interval (CI) of the true mean based on measuring single value x: CI for μ = x ± z σ bblee@unimap
CONFIDENCE INTERVALS For the experimental mean of N measurements: Table 1 shows the values of z at various confidence level. The relative size of the confidence interval as a function of N is shown in Table 2. bblee@unimap
CONFIDENCE INTERVALS Table 1: Table 2: bblee@unimap
CONFIDENCE INTERVALS EXAMPLE 1: Determine the 80% and 95% confidence intervals for: (a) A data entry of 1108 mg/L glucose (b) A mean value for 1 week data of 1100.3 mg/L (1 data is recorded per day). Assume that in each part, s = 19 is a good estimate of σ. bblee@unimap
STUDENT’S T / T STATISTICS The t statistics is often called Student’s t. To account for the variability of s, we use the important statistical parameter t, which is defined in exactly the same way as z except that s is substituted for σ. For a single measurement with result x, For the mean of N measurement, bblee@unimap
STUDENT’S T / T STATISTICS The confidence interval for the mean of N replicate measurements can be calculated from t, bblee@unimap
STUDENT’S T / T STATISTICS Table 3: bblee@unimap
STUDENT’S T / T STATISTICS Example 2: A clinical chemist obtained the following data for the alcohol content of a sample of blood: % C2H5OH: 0.084, 0.089, and 0.079. Calculate the 95% confidence interval for the mean assuming that The three results obtained are the only indication of the precision of the method From previous of experience on hundreds of samples, we know that the standard deviation the method s = 0.005% C2H5OH is a good estimate of σ. bblee@unimap
STATISTICS AIDS TO HYPOTHESIS TESTING Hypothesis testing is the basis for many decision made in science and engineering. The hypothesis tests that we describe are used to determine if the results from these experiments support the model. If agreement is found, the hypothetical model serves as the basis for further experiments. When the hypothesis is supported by sufficient experimental data, it becomes recognized as a useful theory until such time as data are obtained that prove it. bblee@unimap
STATISTICS AIDS TO HYPOTHESIS TESTING A null hypothesis postulates that two or more observed quantities are the same. Specific examples of hypothesis tests that scientists often use include the comparison of The mean of an experimental data set with what is believed to be the true value, The mean to a predicted or cutoff (threshold) value, The means or the standard deviations from two or more sets of data. bblee@unimap
STATISTICS AIDS TO HYPOTHESIS TESTING Comparing an experimental mean with a known value: A statistical hypothesis test to draw conclusions about the population mean (μ) and its nearness to the known value (μ0). There are two contradictory outcomes that we consider in any hypothesis test: The null hypothesis H0, states that μ = μ0. The alternative hypothesis Ha, bblee@unimap
STATISTICS AIDS TO HYPOTHESIS TESTING We might reject the null hypothesis in favor of Ha if is different than μ0 (μ ≠ μ0). Other alternative hypotheses are μ > μ0 or μ < μ0. bblee@unimap
STATISTICS AIDS TO HYPOTHESIS TESTING Suppose we are interested in determining whether the concentration of lead in an industrial wastewater discharge exceeds the maximum permissible amount of 0.05 ppm. Our hypothesis test would be summarized: H0: μ = 0.05 ppm Ha: μ > 0.05 ppm bblee@unimap
STATISTICS AIDS TO HYPOTHESIS TESTING Large Sample Z test: If a large number of results are available so that s is a good estimate of σ, the z test is appropriate. State the null hypothesis: H0: μ = μ0 Form the test statistic: State the alternative hypothesis Ha and determine the rejection region. bblee@unimap
STATISTICS AIDS TO HYPOTHESIS TESTING For Ha: μ ≠ μ0, reject H0 if z ≥ zcrit or if z ≤ -zcrit (two-tailed test) For Ha: μ > μ0, reject H0 if z ≥ zcrit (one-tailed test) For Ha: μ < μ0, reject H0 if z ≤ -zcrit Figure 2 (a): There is only a 5% probability that random error will lead to a value of z ≥ zcrit or z ≤ -zcrit. The significance level overall is α = 0.05 From Table 1, the critical value of z is 1.96 bblee@unimap
STATISTICS AIDS TO HYPOTHESIS TESTING Figure 2: Rejection regions for the 95% confidence level (a) Two-tailed test for Ha: μ≠ μ0. bblee@unimap
STATISTICS AIDS TO HYPOTHESIS TESTING Figure 2: Rejection regions for the 95% confidence level (c) One-tailed test for Ha: μ< μ0. Figure 2 (b): The probability that z exceeds zcrit to be 5% or the total probability in both tails to be 10%. The significance level overall is α = 0.10. The critical value from Table 1 is 1.64. bblee@unimap
STATISTICS AIDS TO HYPOTHESIS TESTING Example 3 A class of 30 students determined the activation energy of a chemical reaction to be 116 kJ/mol (mean value) and standard deviation of 22 kJ/mol. Are the data in agreement with the literature value of 129 kJ/mol at The 95% confidence level The 99% confidence level Estimate the probability of obtaining a mean equal to the student value. bblee@unimap
STATISTICS AIDS TO HYPOTHESIS TESTING For a small number of results, we use a similar procedure to the z test except that the test statistics is the t statistic. The null hypothesis H0: μ= μ0, where μ0 is a specific value of μ such as an accepted value, a theoretical value or a threshold value. State the null hypothesis: H0: μ = μ0 From the test statistic: State the alternative hypothesis Ha and determine the rejection region. bblee@unimap
STATISTICS AIDS TO HYPOTHESIS TESTING For Ha: μ ≠ μ0, reject H0 if t ≥ tcrit or if t ≤ -tcrit (two-tailed test) For Ha: μ > μ0, reject H0 if t ≥ tcrit (one-tailed test) For Ha: μ < μ0, reject H0 if t ≤ -tcrit Figure 3: Curve A: If the analytical method had no systematical error, or bias, random errors would give the frequency distribution. bblee@unimap
STATISTICS AIDS TO HYPOTHESIS TESTING Figure 3: Curve B: The frequency distribution of results by a method that could have a significant bias due to a systematic error. Figure 3: Illustration of systematic error in an analytical method. bblee@unimap
STATISTICS AIDS TO HYPOTHESIS TESTING Example 4: A new procedure for the rapid determination of sulfur in kerosenes was tested on a sample known from its method of preparation to contain 0.123% S (μ0=0.123%S). The results for %S were 0.112, 0.118, 0.115 and 0.119. Do the data indicate that there is a bias in the method at the 95% confidence level? bblee@unimap
COMPARISON OF TWO EXPERIMENTAL MEANS Frequently scientists must judge whether a difference in the means of two sets of data is real or the result of random error. c The t-Test for differences in means: The test statistics t is could be found from: bblee@unimap
COMPARISON OF TWO EXPERIMENTAL MEANS If there is good reason to believe that the standard deviations of the two data sets differ, the two-sample t test must be used. Paired data: Scientists and engineers often make use of pairs of measurements on the same sample in order to minimize sources of variability that are not of interest. The test statistic value: Specific difference (0) Average difference bblee@unimap
COMPARISON OF TWO EXPERIMENTAL MEANS Example 5: A new automated procedure for determining glucose in serum (Method A) is to be compared to the established method (Method B). Both methods are performed on serum from the same six patients in order to eliminate patient-to-patient variability. Do the following results confirm a difference in the two methods at the 95% confidence level? bblee@unimap
ERRORS IN HYPOTHESIS TESTING Type I error: A type 1 error occurs when H0 is rejected although it is actually true. In some sciences, a type I error is called a false negative. Type II error: A type II error occurs when H0 is accepted and it is actually false. It is sometimes termed a false positive. bblee@unimap
ERRORS IN HYPOTHESIS TESTING The consequences of making errors in hypothesis testing are often compared to the errors made in judicial procedures. Convicting an innocent person is usually considered a more serious error than setting a guilty person free. If we make it less likely that an innocent person gets convicted, we make it more likely that a guilty person goes free. It is important when thinking about errors in hypothesis testing to determine the consequences of making a type I or type II error. bblee@unimap
ERRORS IN HYPOTHESIS TESTING As a general rule of thumb, the largest α that is tolerable for the situation should be used. This ensures the smallest type II error while keeping the type I error within acceptable limits. For many cases in analytical chemistry, an α value of 0.05 (95% confidence level) provides an acceptable compromise. bblee@unimap
COMPARISON OF VARIANCES At times, there is a need to compare the variances (or standard deviation) of two data sets. The normal t-test requires that the standard deviations of the data sets being compared are equal. F-test: A simple statistical test can be used to test this assumption under the provision that the populations follow the normal (Gaussian) distribution. bblee@unimap
COMPARISON OF VARIANCES F-test is based on the null hypothesis that the two population variances under consideration are equal. The test statistic F, which is defined as the ratio of the two samples variances. It is calculated and compared with the critical value of F at the desired significance level. The null hypothesis is rejected if the test statistic differs too much from unity. bblee@unimap
COMPARISON OF VARIANCES F-test is used in comparing > two means and in linear regression analysis. Critical values of F at the 0.05 significant level are shown in Table 4. Table 4: bblee@unimap
COMPARISON OF VARIANCES Two degrees of freedom are given, one associated with the numerator and the other with denominator. The F-test can be used in either a one-tailed mode or in a two-tailed mode. bblee@unimap
COMPARISON OF VARIANCES Example 6 A standard method for the determination of the carbon monoxide (CO) level in gaseous mixtures is known from many hundreds of measurements to have a standard deviation of 0.21 ppm CO. A modification of the method yields a value for s of 0.15 ppm CO for a pooled data set with 12 degrees of freedom. A second modification, also based on 12 degrees of freedom, has a standard deviation of 0.12 ppm CO. Is either modification significantly more precise than the original? bblee@unimap
ANALYSIS OF VARIANCE ANOVA – the methods used for multiple comparisons fall under the general category of analysis of variance. ANOVA indicates a potential difference, multiple comparison procedures can be used to identify which specific population means differ from the others. Experimental design methods take advantages of ANOVA planning and performing experiments. bblee@unimap
ANALYSIS OF VARIANCE ANOVA detects difference in several population means by comparing the variances. The following are typical applications of ANOVA: Is there a difference in the results of five analysts determining calcium by a volumetric method? Will four different solvent compositions have differing influences on the yield of a chemical synthesis? Are the results of manganese determination by three different analytical method different? Is there any difference in the fluorescence of a complex ion at six different values of pH? bblee@unimap
ANALYSIS OF VARIANCE Figure 4 – a single factor, or one-way ANOVA. The basic principle of ANOVA is to compare the variation between the different factor levels (groups) to that within factor levels. The groups are the different analysts, a comparison of the variation between analysts to the within-analyst variation (Figure 5). bblee@unimap
ANALYSIS OF VARIANCE Figure 4 bblee@unimap
ANALYSIS OF VARIANCE Figure 5 bblee@unimap
ANALYSIS OF VARIANCE ANOVA Table: bblee@unimap
ANALYSIS OF VARIANCE Example 7: Five analysts determined calcium by a volumetric method and obtained the amount (in mmol Ca) shown in the table below. Do the means differ significantly at the 95% confidence level? bblee@unimap
EXAMPLE 1 (a) From Table 1, z = 1.28 & 1.96 for 80% and 95% confidence levels. 80% CI = 1108 ± 1.28 x 19 = 1108 ± 24.3 mg/L 95% CI = 1108 ± 1.96 x 19 = 1108 ± 37.2 mg/L bblee@unimap
EXAMPLE 1 It can be concluded that 80% probable that the population mean (μ) lies in the interval 1083.7 to 1132.3 mg/L glucose. The probability is 95% that μ lies in the interval between 1070.8 and 1145.2 mg/L. bblee@unimap
EXAMPLE 1 (b) For the seven measurements, 80% CI = = 1100.3 ± 9.2 mg/L bblee@unimap
EXAMPLE 1 The experimental mean (Ẋ = 1100.3 mg/L), it can be concluded that there is an 80% chance that μ is located in the interval between 1091.1 and 1109.5 mg/L glucose and a 95% chance that it lies between 1086.2 and 1114.4 mg/L glucose. Note: the intervals are considerably smaller when we use the experimental mean instead of a single value. bblee@unimap
EXAMPLE 2 (a) = 0.252 = 0.021218 = 0.0050% C2H5OH In this, = 0.084 bblee@unimap
EXAMPLE 2 t = 4.30 for two degrees of freedom & the 95% confidence level. = 0.084 ± 0.012%C2H5OH (b) Because s = 0.0050% is a good estimate of σ, we can use z, = 0.084 ± 0.006%C2H5OH bblee@unimap
EXAMPLE 3 μ0 is the literature value of 129 kJ/mol so that the null hypothesis is μ = 129 kJ/mol. The alternative hypothesis is that μ ≠ 129 kJ/mol. This is a two-tailed test. From Table 1, zcrit = 1.96 for the 95% confidence level, and zcrit = 2.58 for the 99% confidence level. The test statistic is calculated as: = - 3.27 bblee@unimap
EXAMPLE 3 Since z ≤ -1.96, we reject the null hypothesis at the 95% confidence level. Since z ≤ -2.58, we also reject H0 at the 99% confidence level. In order to estimate the probability of obtaining a mean value μ = 116 kJ/mol, the probability of obtaining a z value of 3.27. Table 1, the probability of obtaining a z value this large because of random error is only about 0.2%. bblee@unimap
EXAMPLE 3 All of these results lead us to conclude that the student mean is actually different from the literature value and not just the result of random error. bblee@unimap
EXAMPLE 4 The null hypothesis is H0: μ = 0.123% S, The alternative hypothesis is Ha: μ ≠ 0.123%S. = 0.464 = 0.116%S = 0.53854 = 0.0032% S bblee@unimap
EXAMPLE 4 The test statistic can be calculated as, = -4.375 From Table 3: The critical value of t for 3 degrees of freedom and the 95% confidence level is 3.18. Since t ≤ -3.18, we conclude that there is a significant difference at the 95% confidence level and thus bias in the method. bblee@unimap
EXAMPLE 4 If we were to do this test at 99% confidence level, tcrit = 5.84. Since t =-4.375 is greater than -5.84, we would accept the null hypothesis at the 99% confidence level and conclude there is no difference between the experimental and the accepted values. bblee@unimap
EXAMPLE 5 If μd is the true average difference between the methods, we want to test the null hypothesis: H0: μd = 0 Ha : μd ≠ 0 The t-test statistic is: N = 6, ∑di = 16+9+25+5+22+11=88 ∑di2 =1592, ḋ = 88/6 = 14.67 bblee@unimap
EXAMPLE 5 The standard deviation of the difference: = 7.76 The t-statistic: = 4.628 The critical value of t = 2.57 for the 95% confidence level and 5 degrees of freedom. bblee@unimap
EXAMPLE 5 Since t>tcrit, we reject the null hypothesis and conclude that the two methods give different results. bblee@unimap
EXAMPLE 6 Null hypothesis: The alternative hypothesis: Because an improvement is claimed, the variances of the modifications are placed in the denominator. The variance of the standard method The variance of the modified method bblee@unimap
EXAMPLE 6 For 1st modification: = 1.96 For 2nd modification: = 3.06 For the standard procedure, sstd is a good estimate of, and the number of degrees of freedom from the numerator can be taken as infinite. bblee@unimap
EXAMPLE 6 F1 < 2.30, We cannot reject the null hypothesis & conclude that there is no improvement in precision. F2 > 2.30, We reject the null hypothesis and conclude that the second modification does appear to give better precision at the 95% confidence level. Is the precision of the 2nd modification is significantly better the 1st ? bblee@unimap
EXAMPLE 6 The F-test dictates that we must accept the null hypothesis, = 1.56 In this case, Fcrit = 2.69. Since F < 2.69, we must accept H0 and conclude that the two methods give equivalent precision. bblee@unimap
EXAMPLE 7 We obtain the mean and standard deviations for each analyst. The mean for analyst 1 is = 10.5 mmol Ca The remaining means are obtained in the same manner: bblee@unimap
EXAMPLE 7 The results are summarized as follows, The grand mean is found: bblee@unimap
EXAMPLE 7 bblee@unimap
EXAMPLE 7 The F table, the critical value of F at the 95% confidence level for 4 and 10 degrees of freedom is 3.48. Since F exceeds 3.48, we reject H0 at the 95% confidence level and conclude that there is a significant difference among the analysts. bblee@unimap