DATA ANALYSIS ERRORS IN CHEMICAL ANALYSIS Normal phrases in describing results of an analysis “pretty sure” “very sure” “most likely” “improbable” Replaced.

DATA ANALYSIS ERRORS IN CHEMICAL ANALYSIS Normal phrases in describing results of an analysis “pretty sure” “very sure” “most likely” “improbable” Replaced by using mathematical statistical tests. Is there such a thing as “ERROR FREE ANALYSIS”? Impossible to eliminate errors. Can only be minimized. Can only be approximated to an acceptable precision. How reliable are our data TO OVERCOME ERRORS. Carry out replicate measurements. Analyse accurately known standards. Perform statistical tests on data. ?

Mean/Average Xi = individual values of x N = number of replicate measurements Median Data in the middle if the number is odd, arranged in ascending order. The average of two data in the middle if he number is even arranged in ascending order. Errors Absolute Error Absolute error = measured value – true value Absolute error, Relative error = Percent relative error = STATISTICS Range The different between the highest and lowest result.

Standard Deviation (SD), s Measure of the precision of a population of data. Small sample size N-1 = Degree of freedom Population (infinity) N = Number of replicate More smaller SD means more precise the analysis

Varian, V The square of standard deviation. For sample, V = s 2 For population, V =  2 Relative Standard Deviation (RSD) And Covarian (CV)

PRECISION Relates to reproducibility or repeatability of a result. How similar are values obtained in exactly the same way? Useful for measuring deviation from the mean. ACCURACY Measurement of agreement between experimental mean and true value (which may not be known!).

DIFFERENCE BETWEEN ACCURACY AND PRECISION Good precision does not guarantee accuracy.

High Precision High accuracy High precision Low accuracy Good precision does not guarantee accuracy. Low precision Low accuracy Low precision High accuracy

TYPES OF ERROR IN EXPERIMENTAL DATA Gross Errors Serious but very seldom occur in analysis. Usually obvious - give outlier readings. Detectable by carrying out sufficient replicate measurements. Experiments must be repeated. e.g. Instrument faulty, contaminate reagent. Random (indeterminate) Error Data scattered approximately symmetrically about a mean value. Affects precision, can only be controlled. Dealt with statistically. e.g. physical and chemical variables. Systematic (determinate) Error Determinable and that presumably can be either avoided or corrected. Several possible sources. Readings all too high or too low. Causing bias in technique. Can either be +ve or -ve. Affects accuracy.

SOURCES OF SYSTEMATIC ERROR Instrument Error Need frequent calibration - both for apparatus such as volumetric flasks, burettes etc., but also for electronic devices such as spectrometers. Examples: Fluctuation in power supply Temperature changes Method Error Due to inadequacies in physical or chemical behaviour of reagents or reactions (e.g. slow or incomplete reactions). Difficult to detect and the most serious systematic error. Example: Small excess of reagent required causing an indicator to undergo colour change that signal the completion of a reaction. Personal Error Sources: Physical handicap, prejudice, not competence. Examples: Insensitivity to colour changes Tendency to estimate scale readings to improve precision Preconceived idea of “true” value.

SYSTEMATIC ERRORS Systematic errors can be: Constant (e.g. error in burette reading - less important for larger values of reading). Proportional (e.g. presence of given proportion of interfering impurity in sample; equally significant for all values of measurement). Minimise Errors: Minimise instrument errors by careful recalibration and good maintenance of equipment. Minimise personal errors by care and self-discipline. Method errors - most difficult. “True” value may not be known. Three approaches to minimise: Analysis of certified standards (SRM) Use 2 or more independent methods Analysis of blanks

SIGNIFICANT FIGURES Minimum number of digits written in scientific notation without a loss in accuracy. Zero is significant only when, It occurs in the middle of a number 401- 3 significant figures 6.0015- 5 significant figures It is the last number to the right of the decimal point. 3.00- 3 significant figures 6.00  10 2 - 3 significant figures 0.0500- 3 significant figures Addition-Subtraction Use the same number of decimal places as the number with the fewest decimal places. 12.2 + 0.365 + 1.04 = 13.605 = 13.6 (1 dp) (3 dp) (2 dp) (1 dp) Multiplication - Division Use the same number of digits as the number with the fewest number of digits.

USE OF STATISTICS IN DATA EVALUATION Defining the interval of values around a set mean () within which the population mean (  ) can be expected with a given probability. The intervals are called confidence limits. Determining the number of replicates required to assure (at a desired probability) that an experimental mean () falls within a predicted interval of values around the population mean (  ). Estimating the probability that the experimental mean () and true value (  ) are different or two experimental mean are different (t test). Estimating the probability that data from two experiments are different in precision (F test). Deciding when to accept/reject outliers among replicates (Q test). Treating calibration data. Quality control.

CONFIDENCE LIMITS AND CONFIDENCE INTERVAL Confidence Limits Interval around the mean that probably contains . Extream value of, a < < b Confidence Interval The magnitude of the confidence limits.  Confidence Limits Confidence Level Fixes the level of probability that the mean is within the confidence limits. 99.7%, 99%, 95%, 90%, 80%, 68%, 50%

CONFIDENCE LIMITS (CL) SINCE THE EXACT VALUE OF POPULATION MEAN,  CANNOT BE DETERMINED, ONE MUST USE STATISTICAL THEORY TO SET LIMITS AROUND THE MEASURED MEAN,, THAT PROBABLY CONTAIN . CL ONLY HAVE MEANING WITH THE MEASURED STANDARD DEVIATION, S, IS A GOOD APPROXIMATION OF THE POPULATION STANDARD DEVIATION, , AND THERE IS NO BIAS IN THE MEASUREMENT. CL WHEN  IS KNOWN (POPULATION), N = Number of measurements

Values for z at various confidence levels are found in Table 1. Confidence Level, %z 500.67 681.00 801.29 901.64 951.96 962.00 992.58 99.73.00 99.93.29 VALUES FOR Z

Examples: or,

CL For Small Data Set (N 20),  not known, t s Values of t depend on degree of freedom, (N - 1) and confidence level (from Table t). t also known as ‘student’s t’ and will be used in hypothesis test.

VALUES OF t AT VARIOUS CONFIDENCE LEVEL Degree of Freedo m Confidence Level 90 %95 %99 % 16.3112.7063.66 22.924.309.92 32.353.185.84 42.132.784.60 52.022.574.03 61.942.453.71 71.902.263.50 81.862.313.36 91.832.263.25 101.812.233.17 111.802.203.11 121.782.183.06 131.772.163.01 141.762.142.98 151.752.132.95 161.752.122.92 171.742.112.90 181.732.102.88 191.732.092.86 201.722.092.85 infinity1.641.962.58

Example: Data for the analysis of calcium in rock: 14.35%, 14.41%, 14.40%, 14.32% and 14.37%. Calculate the confidence interval at 95% confidence level. Min, = 14.37 Standard deviation, s = 0.037 From the table, At 95 % confidence level, N - 1 = 4, t = 2.78. Confidence interval,

At different confidence level, Confidence LevelConfidence Interval 90%  = 14.37%  0.04 95%  = 14.37%  0.05 99%  = 14.37%  0.08 Summary: If the confidence level increased, the confidence interval is also increased. The probabilities of  appear in the interval increased

Example: Data for the analysis of calcium in rock: 14.35%, 14.41%, 14.40%, 14.32%, 14.45%, 14.50%, 14.25% and 14.37%. Calculate the confidence interval.

OTHER USAGE OF CONFIDENCE INTERVAL To determine number of replicates needed for the mean to be within the confidence interval. To determine systematic error. TO DETERMINE NUMBER OF REPLICATES  If  is known (s   ), If  is unknown

Example: Calculate the number of replicates needed to reduce the confidence interval to 1.5  g/mL at 95% confidence level. Given, s = 2.4  g/mL. At 95% confidence level, t = 1.96,

TO DETERMINE SYSTEMATIC ERROR Example A standard solution gave an absorption reading of 0.470 at a particular wavelength. Ten measurements were done on a sample and the mean gave a value of 0.461, with standard deviation of 0.003. Show whether systematic error exists in the measurements at 95% confidence level. Answer At 95% confidence level, N – 1 = 9, t = 2.26, The calculation gives confidence limit of, 0.459 <  < 0.463 Does the true mean 0.470 belong to the interval? Does systematic error present?

Observations Hypothesis Model Valid?Reject Basis for further experiments YES NO TESTING A HYPOTHESIS

Hypothesis  Data  Theory Normally, the measured data is not always the same and seldom agree. We use statistics to test the disagreement. NULL HYPOTHESIS Null Hypothesis, H o – the values of two measured quantities do not differ (significantly) UNLESS we can prove it that the two values are significantly different. “Innocent until proven guilty” In Null Hypothesis, the calculated value of a parameter from the equation is compared to the parameter value from the table. If the calculated value is smaller than the table value, the hypothesis is accepted and vice-versa.

Null Hypothesis can be used to: Compare  and Compare and from two sets of data Compare s 1 and s 2 or  1 and  2 Compare s and 

t TEST Comparison between experimental mean and true mean (  and ) – the presence of systematic error Steps: 1) If  is not known, If  is known,

ii)Calculate t or z (t calc ) from the data. iii)Compare t calc and t table iv)If t calc > t table Reject Null Hypothesis (H o ) i.e. The different is due to systematic error. v)If t calc < t table Accept Null Hypothesis (H o ) i.e. ( ) The different is due to random error. t test

Example: The sulphur content of a sample of kerosene was found to be 0.123%. A new method was used on the same sample and the following data is obtained: %S : 0.112; 0.118; 0.113; 0.119 Show whether systematic error is present in the new method. Null Hypothesis, Ho :  = = 0.116%,  = 0.123% s = 0.0032 Since t calc > t table, Ho is rejected and the two means are significantly different and thus systematic error is present. t table = 3.18 (95 %, N-1 = 3)

Other Solution: Since, ) is significant and there is systematic error in the measurement. (i.e. 0.007 > 0.0051), H o is rejected, the difference ( ) is significant and there is systematic error in the measurement.

2.Comparing two mean values and Normally used to determine whether the two samples are identical or not. The difference in the mean of two sets of the same analysis will provide information on the similarity of the sample or the existence of random error. Data:, and s 1, s 2 Ho : = We want to test whether - = 0

Assume, Calculate the value of t; Compare t calc with ttable; if t calc < t table, H o is accepted. The pooled standard deviation, s p is calculated using: where, N 1, N 2 are numbers of data in sets 1 and 2 N s is the number of data sets

Example: The source of wine (the vineyard) is identified by determining the alcohol content of the different barrels. Barrel 1: 6 determinations, = 12.61% Barrel 2: 4 determinations, = 12.53% sp = 0.07% Prove that both wine sources are different??? H o : = From the t table at 95% confidence, with degrees of freedom 6 + 4 – 2 = 8, t table is 2.31. Since t calc is smaller than t table, Ho is accepted and the source of the wine statistically is the same.

COMPARING THE PRECISION OF TWO MEASUREMENTS (THE F-TEST) Is Method A more precise than Method B? Is there any significant difference between both methods? With the degree of freedom = N – 1 H o : the precision is identical; s 1 = s 2 Then, if F calc < F table,H o is accepted. Since the values of F (from table) are always greater than 1, the smaller variance (the more precise) always become the denominator. V 1 > V 2, so

Example: The determination of CO in a mixture of gases using the standard procedure gave an s value of 0.21 ppm. The method was modified twice giving s 1 of 0.15 (12 degrees of freedom) and s 2 of 0.12 (12 degrees of freedom) Are the two modified methods more precise than the standard? H o : s 1 = s std. and H o : s 2 = s std. In the standard method, s  and the degrees of freedom becomes infinity. Refer the F table: Numerator = , and denominator = 12; giving the critical F value of 2.30 Since F 1 < F table, Ho is accepted, While F 2 >F table ; so Ho is rejected.

H o : s 1 = s std. and H o : s 2 = s std. CONCLUSIONS

THE DIXON TEST OR THE Q TEST A way of detecting outlier, a data, which is statistically, does not belong to the set. Example: Data: 10.05, 10.10, 10.15, 10.05, 10.45, 10.10 By inspection, 10.45 seem to be out of the data normal range. It is easier to see it when the numbers are arranged in a decreasing or increasing order. 10.05, 10.05, 10.10, 10.10, 10.15, 10.45 Should this data (10.45) be eliminated, the mean will change from the original value!

where, x q is the questionable data x n is its nearest neighbour w is the difference between the highest and the lowest value (range). The Q expt or Q calc will be compared with the Q critical or Q table, and the null hypothesis is checked. = 0.75 Q critical (95%, n = 6) = 0.625 Q expt > Q critical Data (10.45) can be rejected.

VALUE OF Q Number of Observations Confidence Level 90 %95 %99 % 30.9410.9700.994 40.7650.8290.926 50.6420.7100.821 60.5600.6250.740 70.5070.5680.680 80.4680.5260.634 90.4370.4930.599 100.4120.4660.568

Example: An analysis on calcite gave the following percentage of CaO: 55.45, 56.04, 56.23, 56.00, 55.08 Arrange the data in order: 55.45, 56.00, 56.04, 56.08, 56.23 Suspected data: 55.45 OR 56.23 Q table from 5 determinations, 95% = 0.710 Since Q calc <Q table. Data cannot be rejected. Since the Q calc is = Q table, the data cannot be rejected. Number of Observat ions Confidence Level 90 %95 %99 % 30.9410.9700.994 40.7650.8290.926 50.6420.7100.821 60.5600.6250.740 70.5070.5680.680 80.4680.5260.634 90.4370.4930.599 100.4120.4660.568

Example: An analysis on calcite gave the following percentage of CaO: 55.45, 56.04, 56.80, 56.23, 56.00,55.30, 55.08, 54.80 and 55.80. Any data should be rejected at 90% confidence level? Number of Observatio ns Confidence Level 90 %95 %99 % 30.9410.9700.994 40.7650.8290.926 50.6420.7100.821 60.5600.6250.740 70.5070.5680.680 80.4680.5260.634 90.4370.4930.599 100.4120.4660.568

DATA ANALYSIS ERRORS IN CHEMICAL ANALYSIS Normal phrases in describing results of an analysis “pretty sure” “very sure” “most likely” “improbable” Replaced.

Similar presentations

Presentation on theme: "DATA ANALYSIS ERRORS IN CHEMICAL ANALYSIS Normal phrases in describing results of an analysis “pretty sure” “very sure” “most likely” “improbable” Replaced."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

DATA ANALYSIS ERRORS IN CHEMICAL ANALYSIS Normal phrases in describing results of an analysis “pretty sure” “very sure” “most likely” “improbable” Replaced.

Similar presentations

Presentation on theme: "DATA ANALYSIS ERRORS IN CHEMICAL ANALYSIS Normal phrases in describing results of an analysis “pretty sure” “very sure” “most likely” “improbable” Replaced."— Presentation transcript:

Similar presentations

About project

Feedback