Statistics of repeated measurements

Slides:



Advertisements
Similar presentations
Estimation of Means and Proportions
Advertisements

Chapter 6 Confidence Intervals.
Chapter 7 Statistical Data Treatment and Evaluation
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Estimation in Sampling
Sampling: Final and Initial Sample Size Determination
Introduction to Statistics
Chapter 5: Confidence Intervals.
Copyright © Cengage Learning. All rights reserved. 9 Inferences Based on Two Samples.
Chapters 3 Uncertainty January 30, 2007 Lec_3.
Chapter 6 Continuous Random Variables and Probability Distributions
Quality Control Procedures put into place to monitor the performance of a laboratory test with regard to accuracy and precision.
Statistical Background
Data Handling l Classification of Errors v Systematic v Random.
CHAPTER 6 Statistical Analysis of Experimental Data
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution and Other Continuous Distributions.
Inferential Statistics
Confidence Interval A confidence interval (or interval estimate) is a range (or an interval) of values used to estimate the true value of a population.
Business Statistics: Communicating with Numbers
Chapter 6 Random Error The Nature of Random Errors
Review of normal distribution. Exercise Solution.
Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department.
CHEMISTRY ANALYTICAL CHEMISTRY Fall Lecture 4.
Chapter 6 The Normal Probability Distribution
Confidence Intervals Chapter 6. § 6.1 Confidence Intervals for the Mean (Large Samples)
Chapter 6 Confidence Intervals.
Slide 1 Copyright © 2004 Pearson Education, Inc..
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
Estimation of Statistical Parameters
Chapter 3 Basic Concepts in Statistics and Probability
Confidence Intervals Chapter 6. § 6.1 Confidence Intervals for the Mean (Large Samples)
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
16-1 Copyright  2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e Chapter 16 The.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Physics 114: Exam 2 Review Lectures 11-16
Lecture 4 Basic Statistics Dr. A.K.M. Shafiqul Islam School of Bioprocess Engineering University Malaysia Perlis
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 6 Normal Probability Distributions 6-1 Review and Preview 6-2 The Standard Normal.
I Introductory Material A. Mathematical Concepts Scientific Notation and Significant Figures.
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Significant Figures When using calculators we must determine the correct answer. Calculators are ignorant boxes of switches and don’t know the correct.
Chapter 2 Statistical Background. 2.3 Random Variables and Probability Distributions A variable X is said to be a random variable (rv) if for every real.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 7-5 Estimating a Population Variance.
Copyright © 2014 by McGraw-Hill Higher Education. All rights reserved. Essentials of Business Statistics: Communicating with Numbers By Sanjiv Jaggia and.
Chapter 12 Confidence Intervals and Hypothesis Tests for Means © 2010 Pearson Education 1.
B. Neidhart, W. Wegscheider (Eds.): Quality in Chemical Measurements © Springer-Verlag Berlin Heidelberg 2000 U. PyellBasic Course Experiments to Demonstrate.
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example: In a recent poll, 70% of 1501 randomly selected adults said they believed.
Chapters 6 & 7 Overview Created by Erin Hodgess, Houston, Texas.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 6-1 The Normal Distribution.
Chapter 7 Estimates and Sample Sizes 7-1 Overview 7-2 Estimating a Population Proportion 7-3 Estimating a Population Mean: σ Known 7-4 Estimating a Population.
Copyright © Cengage Learning. All rights reserved. 9 Inferences Based on Two Samples.
CHAPTER – 1 UNCERTAINTIES IN MEASUREMENTS. 1.3 PARENT AND SAMPLE DISTRIBUTIONS  If we make a measurement x i in of a quantity x, we expect our observation.
Chapter 4 Exploring Chemical Analysis, Harris
Lecture 8: Measurement Errors 1. Objectives List some sources of measurement errors. Classify measurement errors into systematic and random errors. Study.
CHAPTER- 3.1 ERROR ANALYSIS.  Now we shall further consider  how to estimate uncertainties in our measurements,  the sources of the uncertainties,
Chapter 7: The Distribution of Sample Means
Section 6-1 – Confidence Intervals for the Mean (Large Samples) Estimating Population Parameters.
Confidence Intervals. Point Estimate u A specific numerical value estimate of a parameter. u The best point estimate for the population mean is the sample.
This represents the most probable value of the measured variable. The more readings you take, the more accurate result you will get.
Chapter 6: Random Errors in Chemical Analysis. 6A The nature of random errors Random, or indeterminate, errors can never be totally eliminated and are.
Chapter6 Random Error in Chemical Analyses. 6A THE NATURE OF RANDOM ERRORS 1.Error occur whenever a measurement is made. 2.Random errors are caused by.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
The Statistical Imagination Chapter 7. Using Probability Theory to Produce Sampling Distributions.
Physics 114: Lecture 13 Probability Tests & Linear Fitting
Sampling Distributions and Estimation
Introduction to Instrumentation Engineering
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Presentation transcript:

Statistics of repeated measurements Chapter 2 Statistics of repeated measurements

Mean and standard deviation

The distribution of repeated measurements Although the standard deviation gives a measure of the spread of a set of results about the mean value, it does not indicate the shape of the distribution. To illustrate this we need a large number of measurements

= 0.500 S = 0.0165

This shows that the distribution of the measurements is roughly symmetrical about the mean, with the measurements clustered towards the centre. The set of all possible measurements is called the population. I f there are no systematic errors, then the mean of this population, denoted by µ, is the true value of the nitrate ion concentration, and the standard deviation denoted by  s usually gives an estimate of ????

Normal or Gaussian distribution The mathematical model that describes a continuous curve: The curve is symmetrical about u and the greater the value of  the greater the spread of the curve

More detailed analysis shows that, whatever the values of  and  and a, the normal distribution has the following properties. For a normal distribution with  and , approximately 68% of the population values lie within ±  of the mean, approximately 95% of the population values lie within ±2  of the mean and approximately 99.7% of the population values lie within ±3  of the mean.

Standardized normal cumulative distribution function, F(z) For a normal distribution with known mean, u, and standard deviation, , the exact proportion of values which lie within any interval can be found from tables, provided that the values are first standardized so as to give z-values. This is done by expressing a value of x in terms of its deviation from the mean in units of standard deviation, . That is

Standardized normal cumulative distribution function, F(z) Table A.1 (Appendix 2) gives the proportion of values, F(z), that lie below a given value of z. F(z) is called the standard normal cumulative distribution function. For example the proportion of values below z = 2 is F(2) = 0.9772 and the proportion of values below z = -2 is F(-2) = 0.0228. Thus the exact value for the proportion of measurements lying within two standard deviations of the mean is 0.9772 - 0.0228 = 0.9544.

Standardized normal cumulative distribution function, F(z) If repeated values of a titration are normally distributed with mean 10.15 ml and standard deviation 0.02 ml, find the proportion of measurements which lie between 10.12 ml and 10.20 ml. Standardizing the first value gives z = (10.12 - 10.15)/0.02 = From Table A.1, F(-1.5) = 0.0668.Standardizing the second value gives z 10.15)/0.02 = 2. From Table A.1, F(2.5) = 0.9938.Thus the proportion of values between x = 10.12 to 10.20 (which corresponds to z = -1.5 to 2.5) is 0.9938 - 0.0668 = 0.927. Values of F(z) can also be found using Excel or Minitab

Log-normal distribution

Another example of a variable which may follow a log-normal distribution is the particle size of the droplets formed by the nebulizers used in flame spectroscopy. Particle size distributions in atmospheric aerosols may also take the log-normal form, and the distribution is used to describe equipment failure rates Minitab allows this distribution to be simulated and studied. However, by no means all asymmetrical population distributions can be converted to normal ones by the logarithmic transformation. The distribution of the logarithms of the blood serum concentration shown in Figure 2.5(b) has mean 0.15 and standard deviation 0.20. This means that approximately 68% of the logged values lie in the interval 0.15 - 0.20 to 0.15 + 0.20, that is -0.05 to 0.35. Taking antilogarithms we find that 68% of the original measurements lie in the interval 10-0.05 to 100.35, that is 0.89 to 2.24. The antilogarithm of the mean of the logged values, 100.15 = 1.41, gives the geometric mean of the original distribution

Definition of a sample The Commission on Analytical Nomenclature of the Analytical Chemistry Division of the International Union of Pure and Applied Chemistry has pointed out that confusion and ambiguity can arise if the term `sample' is also used in its colloquial sense of the `actual material being studied' (Commission on Analytical Nomenclature, 1990). It recommends that the term sample is confined to its statistical concept. Other words should be used to describe the material on which measurements are being made, in each case preceded by 'test', for example test solution or test extract. We can then talk unambiguously of a sample of measurements on a test extract, or a sample of tablets from a batch. A test portion from a population which varies with time, such as a river or circulating blood, should be described as a specimen. Unfortunately this practice is by no means usual, so the term 'sample' remains in use for two related but distinct uses.

The sampling distribution of the mean In the absence of systematic errors, the mean of a sample of measurements provides us with an estimate of the true value, u, of the quantity we are trying to measure. Even in the absence of systematic errors, the individual measurements vary due to random errors and so it is most unlikely that the mean of the sample will be exactly equal to the true value. For this reason it is more useful to give a range of values which is likely to include the true value. The width of this range depends on two factors. The first is the precision of the individual measurements, which in turn depends on the standard deviation of the population. The second is the number of measurements in the sample. The more measurements we make, the more reliable our estimate of ,u, the true value, will be.

Sampling distribution of the mean Assuming each column is a sample: The mean for a each sample: 0.506, 0.504, 0.502, 0.492, 0.506, 0.504, 0.500, 0.486 The distribution of all possible sample means (in this case an infinite number) is called the sampling distribution of the mean. Its mean is the same as the mean of the original population. Its standard deviation is called the standard error of the mean (s.e.m.). There is an exact mathematical relationship between the latter and the standard deviation, , of the distribution of the individual measurements:

For a sample of n measurements, standard error of the mean (s.e.m.) = As expected, the larger n is, the smaller the value of the s.e.m. and consequently the smaller the spread of the sample means about , the true value, u. Another property of the sampling distribution of the mean is that, even if the original population is not normal, the sampling distribution of the mean tends to the normal distribution as n increases. This result is known as the central limit theorem. This theorem is of great importance because many statistical tests are performed on the mean and assume that it is normally distributed. Since in practice we can assume that distributions of repeated measurements are at least approximately normally distributed, it is reasonable to assume that the means of quite small samples (say >5) are normally distributed.

Confidence limits of the mean for large samples The range which may be assumed to include the true value is known as a confidence interval and the extreme values of the range are called the confidence limits. The term `confidence' implies that we can assert with a given degree of confidence, i.e. a certain probability, that the confidence interval does include the true value. The size of the confidence interval will obviously depend on how certain we want to be that it includes the true value: the greater the certainty, the greater the interval required.

Sampling distribution of the mean for samples of size n. If we assume that this distribution is normal then 95% of the sample means will lie in the range given by:

In practice, we usually have one sample, of known mean, and we require a range for ,u, the true value: The equation gives the 95% confidence interval of the mean. The 95% confidence limits are In practice we are unlikely to know  exactly. However, provided that the sample is large,  can be replaced by its estimate, s.

Confidence limits of the mean for small Examples

The subscript (n - 1) indicates that t depends on this quantity, which is known as the number of degrees of freedom, d.f. (usually given the symbol ). The term 'degrees of freedom' refers to the number of independent deviations In this case the number is (n - 1), because when (n - 1) deviations are known the last can be deduced since The value of t also depends on the degree of confidence required.

Presentation of results The mean is quoted as an estimate of the quantity measured and the standard deviation as the estimate of the precision. Less commonly, the standard error of the mean (s.e.m) is sometimes quoted instead of the standard deviation, Or the result is given in the form of the 95% confidence limits of the mean. (Uncertainty estimates, see Chapter 4, are also sometimes used.) Since there is no universal convention it is obviously essential to state the form used and, provided that the value of n is given, the three forms can be easily inter-converted by using previous equations

Rounding off The important principle is that the number of significant figures given indicates the precision of the experiment. It would clearly be of no mean, for example, to give the result of a titrimetric analysis as 0.107846 M - no analyst could achieve the implied precision of 0.000001 in ca. 0.1, i.e. 0.001%. In practice it is usual to quote as significant figures all the digits which are certain, plus the first uncertain one. For example, the mean of the values 10.09, 10.11, 10.09, 10.10 and 10.12 is 10.102, and their standard deviation is 0.01304.

Clearly there is uncertainty in the second decimal place; the results are all 10.1 to one decimal place, but disagree in the second decimal place. Using the suggested method the result would be quoted as: If it was felt that this resulted in an unacceptable rounding-off of the standard deviation, then the result could be given as: When confidence limits are calculated there is no point in giving the value of to more than two significant figures. The value of should be given corresponding number of decimal places.

Rounding off the 5 the bias can be avoided by rounding the 5 to the nearest even number. For example: 4.75 is rounded to 4.8.

Example The absorbance scale of a spectrometer is tested at a particular wavelength with a standard solution which has an absorbance given as 0.470. Ten measurements of the absorbance with the spectrometer give = 0.461, and s = 0.003. Find the 95% confidence interval for the mean absorbance as measured by the spectrometer, and hence decide whether a systematic error is present.

Confidence limits of the geometric mean for a log-normal distribution The following values (expressed as percentages) give the antibody concentration in human blood serum for a sample of eight healthy adults.2.15, 1.13, 2.04, 1.45, 1.35, 1.09, 0.99, 2.07. Calculate the 95% confidence interval for the geometric mean, assuming that the antibody concentration is log-normally distributed. The logarithms (to the base 10) of these values are: 0.332, 0.053, 0.310, 0.161, 0.130, 0.037, -0.004, 0.316 The mean of these logged values is 0.1669, giving 100.1669 = 1.47 as the geomet­ric mean of the original values. The standard deviation of the logged values is 0.1365. The 95% confidence limits for the logged values are: antilogarithms of these limits gives the 95% confidence interval of the geometric mean as 1.13 to 1.91.

Propagation of random errors It is most important to note that the procedures used for combining random and systematic errors are completely different. This is because random errors to some extent cancel each other out, whereas every systematic error occurs in a definite and known sense. Suppose, for example, that the final result of an experiment, x, is given by x = a + b. If a and b each have a systematic error of +1, it is clear that the systematic error in x is +2. If, however, a and b each have a random error of ±1, the random error in x is not ±2: this is because there will be occasions when the random error in a is positive while that in b is negative (or vice versa).

y=k+ka+kb+kc+……where k, ka, kb, kc, etc., are constants. Linear combinations this case the final value, y, is calculated from a linear combination of measured quantities a, b, c, etc., by: y=k+ka+kb+kc+……where k, ka, kb, kc, etc., are constants. The variance of a sum or difference of independent quantities is equal to the sum of their variances. It can be shown that if a,  b,  c, etc., are the standard deviations of a, b, c, etc., then the standard deviation of y, y is given by:

Example In a titration the initial reading on the burette is 3.51 ml and the final reading is 15.67 ml, both with a standard deviation of 0.02 ml. What is the volume of titrant used and what is its standard deviation? Volume used = 12.16± 0.028 ml

Multiplicative expressions The error in the result of a multiplication and/or division such as: y = kab/cd where a, b, c and d are independent measured quantities and k is a constant. In such a case there is a relationship between the squares of the relative standard deviations: =(kaaf + (k,Volume used = 15.67 - 3.51 = 12.16aa)2+(b)2+(7c)2+Propagation of random errors(2.12)

If the relationship is y = bn then the standard deviations of y and b are related by:

If y is a general function of x, y = f(x), then the standard deviations of x and y are related by:

The relative standard deviation (r.s.d.) of A is given by Differentiation of this expression with respect to T shows that the r.s.d. of A is a minimum when T= 1/e= 0.368.

Propagation of systematic errors If y is calculated from measured quantities by use of equation and the systematic errors in a, b, c, etc., are a,  b,  c, etc., then the systematic error in y,  y, is calculated from:  y = ka+ kh  b + kc  c + Remember that the systematic errors are either positive or negative and that these signs must be included in the calculation of  y. The total systematic error can sometimes be zero. Considering the analytical balanca, the weight of the solute used is found from the difference between two weighings, the systematic errors cancel out. It should be pointed out that this applies only to an electronic balance with a single internal reference weight. Carefully considered procedures, such as this, can often minimize the systematic errors, as described in Chapter 1.

If y is calculated from the measured quantities by use of equation then relative systematic errors are used: