Statistics Review ChE 477 Winter 2018 Dr. Harding.

Slides:



Advertisements
Similar presentations
Estimation of Means and Proportions
Advertisements

Design of Experiments Lecture I
Welcome to PHYS 225a Lab Introduction, class rules, error analysis Julia Velkovska.
General Statistics Ch En 475 Unit Operations. Quantifying variables (i.e. answering a question with a number) 1. Directly measure the variable. - referred.
Sampling Distributions (§ )
Design of Experiments and Data Analysis. Let’s Work an Example Data obtained from MS Thesis Studied the “bioavailability” of metals in sediment cores.
Propagation of Error Ch En 475 Unit Operations. Quantifying variables (i.e. answering a question with a number) 1. Directly measure the variable. - referred.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Statistics: Data Analysis and Presentation Fr Clinic II.
Quality Control Procedures put into place to monitor the performance of a laboratory test with regard to accuracy and precision.
EART20170 Computing, Data Analysis & Communication skills
Types of Errors Difference between measured result and true value. u Illegitimate errors u Blunders resulting from mistakes in procedure. You must be careful.
Standard error of estimate & Confidence interval.
Inference for regression - Simple linear regression
LINEAR REGRESSION Introduction Section 0 Lecture 1 Slide 1 Lecture 5 Slide 1 INTRODUCTION TO Modern Physics PHYX 2710 Fall 2004 Intermediate 3870 Fall.
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
Statistical Methods For UO Lab — Part 1 Calvin H. Bartholomew Chemical Engineering Brigham Young University.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
880.P20 Winter 2006 Richard Kass 1 Confidence Intervals and Upper Limits Confidence intervals (CI) are related to confidence limits (CL). To calculate.
Statistics for Data Miners: Part I (continued) S.T. Balke.
Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.
PARAMETRIC STATISTICAL INFERENCE
Measurement Uncertainties Physics 161 University Physics Lab I Fall 2007.
General Statistics Ch En 475 Unit Operations. Quantifying variables (i.e. answering a question with a number) Each has some error or uncertainty.
Sampling Distributions & Standard Error Lesson 7.
General Statistics Ch En 475 Unit Operations. Quantifying variables (i.e. answering a question with a number) 1. Directly measure the variable. - referred.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Propagation of Error Ch En 475 Unit Operations. Quantifying variables (i.e. answering a question with a number) 1. Directly measure the variable. - referred.
Statistics Presentation Ch En 475 Unit Operations.
CHEMISTRY ANALYTICAL CHEMISTRY Fall Lecture 6.
LECTURE 3: ANALYSIS OF EXPERIMENTAL DATA
Experimental Data Analysis Prof. Terry A. Ring, Ph. D. Dept. Chemical & Fuels Engineering University of Utah
Statistical Data Analysis
NON-LINEAR REGRESSION Introduction Section 0 Lecture 1 Slide 1 Lecture 6 Slide 1 INTRODUCTION TO Modern Physics PHYX 2710 Fall 2004 Intermediate 3870 Fall.
Statistics Review ChE 475 Fall 2015 Dr. Harding. 1. Repeated Data Points Use t-test based on measured st dev (s) true mean measured mean In Excel, =T.INV(
Statistics Presentation Ch En 475 Unit Operations.
MECH 373 Instrumentation and Measurements
And distribution of sample means
Ranges of Magnitudes & Quantities
Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression.
Regression Analysis AGEC 784.
MSA / Gage Capability (GR&R)
Correlation, Bivariate Regression, and Multiple Regression
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
Introduction, class rules, error analysis Julia Velkovska
PCB 3043L - General Ecology Data Analysis.
Statistics Presentation
Inference for Regression Lines
Statistical Methods For Engineers
CHAPTER 29: Multiple Regression*
Introduction to Instrumentation Engineering
Data Analysis and Statistical Software I ( ) Quarter: Autumn 02/03
Significant Figures The significant figures of a (measured or calculated) quantity are the meaningful digits in it. There are conventions which you should.
CONCEPTS OF ESTIMATION
Discrete Event Simulation - 4
Geology Geomath Chapter 7 - Statistics tom.h.wilson
Hypothesis testing and Estimation
Review (1) ChE 475 Fall 2016 Dr. Harding.
Integration of sensory modalities
Regression Statistics
Chapter 7: The Normality Assumption and Inference with OLS
Statistical Data Analysis
Sampling Distributions (§ )
Ensemble forecasts and seasonal precipitation tercile probabilities
Chapter 8: Confidence Intervals
2.3. Measures of Dispersion (Variation):
Objectives 6.1 Estimating with confidence Statistical confidence
Objectives 6.1 Estimating with confidence Statistical confidence
Statistical inference for the slope and intercept in SLR
Physics 1 – Aug 22, 2019 P3 Challenge – Do Now (on slips of paper)
Presentation transcript:

Statistics Review ChE 477 Winter 2018 Dr. Harding

Statistical Methods Repeated Data Points Comparing Averages of Measured Variables Linear Regression Propagation of Errors Propagation of max error - brute force Propagation of max error – analytical Propagation of variance – analytical Propagation of variance - brute force – Monte Carlo simulation

1. Repeated Data Points measured mean true mean Use t-test based on measured st dev (s) measured mean true mean In Excel, =T.INV(,r) for one-tailed test ( =0.025 for 95% confidence interval) =T.INV.2T(,r) for two-tailed test ( =0.05 for 95% confidence interval) r = n-1

Gaussian Distribution 68.27% of distribution lies within one  95.45% of distribution lies within two  99.73% of distribution lies within three  t-test is used when we do not have enough data points (<30) 68.27% 95.45% 99.73%

2. Comparing averages of measured variables Experiments were completed on two separate days. When comparing means at a given confidence level (e.g. 95%), is there a difference between the means? Day 1: Day 2:

2. Comparing averages of measured variables New formula: Larger |T|: More likely different Step 1 (compute T) For this example, Step 2 Compute net r r = nx1+nx2-2 In Excel, =T.INV(,r) for one-tailed test ( =0.025 for 95% confidence interval) =T.INV.2T(,r) for two-tailed test ( =0.05 for 95% confidence interval)

2. Comparing averages of measured variables Step 3 Compute net t from net r Step 4 Compare |T| with t 2-tail At a given confidence level (e.g. 95% or a=0.05), there is a difference if: 𝑇 >𝑡 𝛼 2 ,𝑟 T t 2.5 > 2.145 95% confident there is a difference! (but not 98% confident)

3. Linear Regression y = mx + b Fit m and b, get r2 Find confidence intervals for m and b m = 3.56  0.02, etc. Use standard error and t-statistic Excel add-on (or Igor) Find confidence intervals for line Use standard error around mean Narrow waisted curves around line Depends on n Meaning: How many ways can I draw a line through data Find prediction band for line Meaning: Where are the bounds of where the data lie

3. Linear Regression (Confidence Interval) Prediction Band

Example

Confidence Intervals and Prediction Bands What good is the confidence interval for a line? Shows how many ways the line can fit the points Let’s you state the confidence region for any predicted point r2 still helps determine how good the fit is What good is the prediction band? Shows where the data should lie Helps identify outlying data points to consider discarding

4. Propagation of Errors Quantifying variables (i.e. answering a question with a number) Directly measure the variable. - referred to as “measured” variable ex. Temperature measured with thermocouple Calculate variable from “measured” or “tabulated” variables - referred to as “calculated” variable ex. Flow rate m = r A v (measured or tabulated) Each has some error or uncertainty

4. Propagation of Errors To obtain uncertainty of “calculated” variable DO NOT just calculate variable for each set of data and then average and take standard deviation DO calculate uncertainty using error from input variables: use uncertainty for “calculated” variables and error for input variables Plan: Obtain max error (d) for each input variable then obtain uncertainty of calculated variable Method 1: Propagation of max error - brute force Method 2: Propagation of max error - analytical Method 3: Propagation of variance - analytical Method 4: Propagation of variance - brute force – Monte Carlo simulation

Sources of Uncertainty Estimation - we guess! Discrimination - device accuracy (single data point) Calibration - may not be exact (error of curve fit) Technique - i.e. measure ID rather than OD Constants and data - not always exact! Noise - which reading do we take? Model and equations - i.e. ideal gas law vs. real gas Humans - transposing, …

Estimates of Error (d) for input variables (d’s are propagated to find uncertainty) Measured: measure multiple times; obtain s; d ≈ 2.5s Reason: 99% of data is within ± 2.5s Example: s = 2.3 ºC for thermocouple, d = 5.8 ºC Tabulated : d ≈ 2.5 times last reported significant digit (with 1) Reason: Assumes last digit is ± 2.5 (± 0 assumes perfect, ± 5 assumes next left digit is fuzzy) Example: r = 1.3 g/ml at 0º C, d = 0.25 g/ml Example: People = 127,000 d = 2500 people

Estimates of Error (d) for input variables Manufacturer spec or calibration accuracy: use given spec or accuracy data Example: Pump spec is ± 1 ml/min, d = 1 ml/min Variable from regression (i.e. calibration curve): d ≈ 2.5*standard error (std error is stdev of residual) Example: Velocity is slope with std error = 2 m/s Judgment for a variable: use judgment for d Example: Read pressure to ± 1 psi, d = 1 psi

If none of the above rules apply, give your best guess! Estimates of Error (d) for input variables If none of the above rules apply, give your best guess!

Method 1: Propagation of max error- brute force Brute force method: obtain upper and lower limits of all input variables (from maximum errors); plug into equation to get uncertainty of calculated variable (y). Uncertainty of y is between ymin and ymax. This method works for both symmetry and asymmetry in errors (i.e. 10 psi + 3 psi or - 2 psi)

Example: Propagation of max error- brute force m = r A v Brute force method: min max r A v r = 2.0 g/cm3 (table) A = 3.4 cm2 (measured avg) v = 2 cm/s (slope of graph) Additional information: sA = 0.03 cm2 std. error (v) = 0.05 cm/s All combinations What is d for each input variable? mmin < m < mmax

Example: Propagation of max error- brute force m = r A v Brute force method: min max r 1.75 2.25 A 3.325 3.475 v 1.875 2.125 r = 2.0 g/cm3 (table) A = 3.4 cm2 (measured avg) v = 2 cm/s (slope of graph) Additional information: sA = 0.03 cm2 std. error (v) = 0.05 cm/s All combinations What is d for each input variable? mmin < m < mmax 10.9 13.6 16.6 2.69 3.01

Method 2: Propagation of max error- analytical Propagation of error: Utilizes maximum error of input variable (d) to estimate uncertainty range of calculated variable (y) Uncertainty of y: y = yavg ± dy Assumptions: input errors are symmetric input errors are independent of each other equation is linear (works o.k. for non-linear equations if input errors are relatively small) * Remember to take the absolute value!!

Example: Propagation of max error- analytical m = r A v y x1 x2 x3 Av rv rA 3.42 2 2 2 3.4 r = 2.0 g/cm3 (table) A = 3.4 cm2 (measured avg) v = 2 cm/s (slope of graph) m = mavg ± dm = rAv ± dm = 13.6 ± 2.85 g/s Additional information: For r, s = 0.1 g/cm3 d = 0.25 sA = 0.03 cm2, d = 0.075 std. error (v) = 0.05 cm/s d = 0.125 = (3.4)(2)(0.25) = 0.60 (2.85) ferror,r= (fractional error)

Propagation of max error If linear equation, symmetric errors, and input errors are independent  brute force and analytical are same If non-linear equation, symmetric errors, and input errors are independent  brute force and analytical are close if errors are small. If large errors (i.e. >10% or more than order of magnitude), brute force is more accurate. Must use brute force if errors are dependent on each other and/or asymmetric. Analytical method is easier to assess if lots of inputs. Also gives info on % contribution from each error.

Method 3: Propagation of variance- analytical Maximum error can be calculated from max errors of input variables as shown previously: Brute force Analytical Probable error is more realistic Errors are independent (some may be “+” and some “-”). Not all will be in same direction. Errors are not always at their largest value. Thus, propagate variance rather than max error You need variance (s2) of each input to propagate variance. If s (stdev) is unknown, estimate s = d/2.5

Method 3: Propagation of variance- analytical y = yavg ± 1.96 SQRT(s2y) 95% y = yavg ± 2.57 SQRT(s2y) 99% gives propagated variance of y or (stdev)2 gives probable error of y and associated confidence error should be <10% (linear approximation) use propagation of max error if not much data, use propagation of variance if lots of data

Example: Propagation of variance Calculate r and its 95% probable error All independent variables were measured multiple times (Rule 1); averages and s are given M = 5.0 kg s = 0.05 kg L = 0.75 m s = 0.01 m D = 0.14 m s = 0.005 m

Propagation of Errors

Method 4: Monte Carlo Simulation (propagation of variance – brute force) Choose N (N is very large, e.g. 100,000) random ±δi from a normal distribution of standard deviation σi for each variable and add to the mean to obtain N values with errors: rnorm(N,μ,σ) in Mathcad generates N random numbers from a normal distribution with mean μ and std dev σ Find N values of the calculated variable using the generated x’i values. Determine mean and standard deviation of the N calculated variables.

Propagation of Errors Overall Summary measured variables: use average, std dev (data range), and student t-test (mean range and mean comparison) calculated variable: determine uncertainty -- Max error: propagating error with brute force -- Max error: propagating error analytically -- Probable error: propagating variance analytically -- Probable error: propagating variance with brute force (Monte Carlo)