Statistics and Quantitative Analysis Chemistry 321, Summer 2014.

Slides:



Advertisements
Similar presentations
1 COMM 301: Empirical Research in Communication Lecture 15 – Hypothesis Testing Kwan M Lee.
Advertisements

Welcome to PHYS 225a Lab Introduction, class rules, error analysis Julia Velkovska.
Mean, Proportion, CLT Bootstrap
Chapter 7 Statistical Data Treatment and Evaluation
Sampling Distributions and Sample Proportions
1 Introduction to Inference Confidence Intervals William P. Wattles, Ph.D. Psychology 302.
McGraw-Hill Ryerson Copyright © 2011 McGraw-Hill Ryerson Limited. Adapted by Peter Au, George Brown College.
Central Limit Theorem.
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 18, Slide 1 Chapter 18 Confidence Intervals for Proportions.
2-5 : Normal Distribution
Chapter 7: Statistical Analysis Evaluating the Data.
Evaluating Hypotheses
Copyright © 2010 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
1 The Sample Mean rule Recall we learned a variable could have a normal distribution? This was useful because then we could say approximately.
Measurement, Quantification and Analysis Some Basic Principles.
ANALYTICAL CHEMISTRY CHEM 3811
Chapter 6 Random Error The Nature of Random Errors
Standard Error and Research Methods
A P STATISTICS LESSON 9 – 1 ( DAY 1 ) SAMPLING DISTRIBUTIONS.
Estimating a Population Mean
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Inferences for Regression
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Estimation Statistics with Confidence. Estimation Before we collect our sample, we know:  -3z -2z -1z 0z 1z 2z 3z Repeated sampling sample means would.
Sampling Distributions
A Sampling Distribution
Dan Piett STAT West Virginia University
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 7. Using Probability Theory to Produce Sampling Distributions.
AP Statistics Chapter 9 Notes.
Introduction to Inferential Statistics. Introduction  Researchers most often have a population that is too large to test, so have to draw a sample from.
PARAMETRIC STATISTICAL INFERENCE
Measurement Uncertainties Physics 161 University Physics Lab I Fall 2007.
Significance Tests: THE BASICS Could it happen by chance alone?
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Chem. 31 – 9/21 Lecture Guest Lecture Dr. Roy Dixon.
Make observations to state the problem *a statement that defines the topic of the experiments and identifies the relationship between the two variables.
Introduction to Analytical Chemistry
Uncertainty & Error “Science is what we have learned about how to keep from fooling ourselves.” ― Richard P. FeynmanRichard P. Feynman.
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
Chapter 7 Sampling Distributions Statistics for Business (Env) 1.
February 2012 Sampling Distribution Models. Drawing Normal Models For cars on I-10 between Kerrville and Junction, it is estimated that 80% are speeding.
Section 10.1 Confidence Intervals
Chapter 8 Delving Into The Use of Inference 8.1 Estimating with Confidence 8.2 Use and Abuse of Tests.
6.1 Inference for a Single Proportion  Statistical confidence  Confidence intervals  How confidence intervals behave.
Understanding Your Data Set Statistics are used to describe data sets Gives us a metric in place of a graph What are some types of statistics used to describe.
CHEMISTRY ANALYTICAL CHEMISTRY Fall Lecture 6.
 Assumptions are an essential part of statistics and the process of building and testing models.  There are many different assumptions across the range.
ERT 207 Analytical Chemistry ERT 207 ANALYTICAL CHEMISTRY Dr. Saleha Shamsudin.
CHAPTER – 1 UNCERTAINTIES IN MEASUREMENTS. 1.3 PARENT AND SAMPLE DISTRIBUTIONS  If we make a measurement x i in of a quantity x, we expect our observation.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
Uncertainty2 Types of Uncertainties Random Uncertainties: result from the randomness of measuring instruments. They can be dealt with by making repeated.
Lecture 8: Measurement Errors 1. Objectives List some sources of measurement errors. Classify measurement errors into systematic and random errors. Study.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
This represents the most probable value of the measured variable. The more readings you take, the more accurate result you will get.
BPS - 5th Ed. Chapter 231 Inference for Regression.
Sampling Distributions Chapter 18. Sampling Distributions A parameter is a number that describes the population. In statistical practice, the value of.
Chapter 6: Random Errors in Chemical Analysis. 6A The nature of random errors Random, or indeterminate, errors can never be totally eliminated and are.
Chapter 4 Variability PowerPoint Lecture Slides Essentials of Statistics for the Behavioral Sciences Seventh Edition by Frederick J Gravetter and Larry.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
SUR-2250 Error Theory.
Introduction, class rules, error analysis Julia Velkovska
Distribution of the Sample Means
Significant Figures The significant figures of a (measured or calculated) quantity are the meaningful digits in it. There are conventions which you should.
Arithmetic Mean This represents the most probable value of the measured variable. The more readings you take, the more accurate result you will get.
Basic Practice of Statistics - 3rd Edition Inference for Regression
Sampling Distributions
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Presentation transcript:

Statistics and Quantitative Analysis Chemistry 321, Summer 2014

Statistics is the field of study that allows you to understand the limitations of your data; in other words, what reasonable conclusion can you draw from your data? Warning: What follows is not meant to be a complete course in statistics. It will be enough, though. to get you through this course, but do not apply it blindly to other situations!

Quantitative measurements must be replicated to establish the credibility of the data Clearly, if your observations during the experiment suggest that a procedural error occurred, then the data for that trial may be safely omitted. In other words, be careful in lab. But are there methods to detect so- called “outliers”? Yes, this is where statistics is helpful. Remember, just because you have a statistical outlier does not mean that you should necessarily throw out that data point!

But first, some slides about accuracy and precision Accuracy is the agreement between your measured value and the published (sometimes called “true”) value. Precision is the agreement between your repeated measurements. Thus, accuracy ≠ precision

An analogy with a dartboard So why distinguish between accuracy and precision? The terms allow us to distinguish different types of errors: those that we can correct easily, and those we can correct with difficulty or not at all.

Systematic versus random errors Systematic (determinate) errors affect accuracy. Because they bias the data one way (always too high) or the other (always too low), they can usually be corrected easily. Random (indeterminate) errors affect precision. Because they are the result of variability or instrument uncertainty, they are much more difficult to correct.

The arithmetic mean (aka the average) The mean is simply the sum of the measurements divided by the number of measurements; symbolically, this is: where is the mean, x i is the i th measurement and N is the number of observations Note that all of the measurements are equally important; in other words, this is an unweighted mean. We will assume for the rest of the course that all measurements are of equal weight.

The standard deviation is a measure of the “spread” of the data set To get a sense of whether the data are closely- spaced or widely scattered in the data space of all possible measurements, the standard deviation is used. The term (x i – x) is the residual for the i th measurement Note that as N increases, s decreases – generally, more measurements decreases the standard deviation

The variable s is used when the standard deviation is calculated for a sample set of data from which you wish to generalize to a population from which the sample was selected. In this case, the denominator of the fraction inside the square root is N–1, as shown. The variable σ (little sigma) is used when the standard deviation is calculated for the entire population (which is not going to happen in this course). In this case, the denominator would simply be N. Point of fact: Once N > 30, then s ≈ σ.

The standard deviation is a useful measure of spread when there are many measurements As a rule of thumb, you should have at least ten repeated measurements, though you will violate this rule often in this course. For instance, if N = 2 and the difference between the two measurements is d, then s = √2 d/2, which is not particularly meaningful.

Standard deviations allow you to distinguish two distinct populations Each graph shows two normal distributions — in each case, are there two distinguishable populations? Compare the means and standard deviations of the distributions.

pre-1982 (g)post-1982 (g) mean std dev (σ) Compare the masses of two sets of ten pennies, one minted before 1982, the other after The question: are the two sets of pennies distinguishable by mass?

Compare the overlap of the two-sigma ranges of each set of pennies Pre-1982 range: ± g Post-1982 range: ± g When using the “ ± “ notation to show one or two-sigma ranges, report the precision of the standard deviation to match the precision of the mean. The pre-1982 penny mass range is therefore to g, whereas the post-1982 penny mass range is to g. The two ranges do not overlap, so at the two-sigma range, the two sets of pennies are distinguishable! There is a good reason for this distinguishability: in 1982, the US Mint changed the composition of the penny from mostly copper to mostly zinc.

The relative standard deviation (RSD%) is a measure of precision Guideline : is good for this course Note that other situations may have a larger or smaller cut- off percentage.

The percent deviation is a measure of accuracy Guideline : is good for this course Note that other situations may have a larger or smaller cut- off percentage.

So how do you know when you can omit a measurement in a set of measurements? For this course, we will assume that all measureable quantities will be distributed normally (in other words, conform to a Gaussian distribution. Note that the x-axis is marked in units of the standard deviation; yes, they are using σ instead of s, but this is customary. For instance, a measurement will be said to be “two-sigma” higher than mean (rather than “two-ess”).

The normal distribution formula is: where x is a given measurement and f(x) is the predicted probability of that measurement.

The behavior of the normal distribution In a normal distribution graph, the y-axis is the number of measurements with the value along the x-axis, so to get a smooth curve as shown, you need literally hundreds of measurements! Fortunately, even though you will have few measurements, we can use the behavior of the normal distribution to check the quality of your data. For instance, we know that 95% of the data points will be within two standard deviations of the mean.

The Q-test for excluding data How do you know when you can omit a data point, even when you have no observational data to do so? On a normal distribution plot of the data, it is far from the mean, and the other points. At this point apply the Q- test. R. B. Dean and W. J. Dixon (1951) "Simplified Statistics for Small Numbers of Observations". Anal. Chem., 1951, 23 (4), 636–638.

The Q-test for excluding data Consider the following data points. Note that point seems to be far off from the rest of the points; is it an outlier that can be omitted? Calculate the parameter Q calculated by dividing the gap between the mean and the test point and the nearest point to it by the range between the high and low values of the data set. Note that the test point will either be the high or low of the data set.

Table 3.3 (page 99) in text has Q table against which your Q calculated can be compared In this course, we’ll be using the 90% confidence level (CL) criterion, which means that if Q calculated > Q table then the outlier can be omitted. Thus, in our example, N = 10 so Q table = Since > 0.412, we can omit the “0.167” point If we used a 95% CL, then the point would not be omitted; higher confidence levels demand a higher cut-off criterion. If there were two fewer data points (N=8), then the point would not be omitted.

Criteria for omitting a data point from your calculations If documented observations in your lab notebook show a procedural error for a particular measurement. If the Q-test on a particular data point determines that that point can be omitted. Note: Do not keep applying the Q-test on the same data set; in other words, after omitting one point, do recalculate the mean and standard deviation but do not apply the Q-test on another outlier.

Challenge problem You collect the following data for an analysis: What is the reportable average and RSD% for this data set? Your observations made in your lab notebook do not allow you to omit any data values; however, you suspect the 13.8 value can be omitted for statistically valid reasons. Apply the Q-test at the 90% confidence level to follow up on this suspicion, and determine whether the 13.8 value can or cannot be omitted.