Dr. Richard Young Optronic Laboratories, Inc..  Uncertainty budgets are a growing requirement of measurements.  Multiple measurements are generally.

Slides:



Advertisements
Similar presentations
Richard Young Optronic Laboratories Kathleen Muray INPHORA
Advertisements

Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Part II – TIME SERIES ANALYSIS C3 Exponential Smoothing Methods © Angel A. Juan & Carles Serrat - UPC 2007/2008.
Welcome to PHYS 225a Lab Introduction, class rules, error analysis Julia Velkovska.
Objectives 10.1 Simple linear regression
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
1 BIS APPLICATION MANAGEMENT INFORMATION SYSTEM Advance forecasting Forecasting by identifying patterns in the past data Chapter outline: 1.Extrapolation.
Section 4.2 Fitting Curves and Surfaces by Least Squares.
Chapter 10 Simple Regression.
Statistics for Managers Using Microsoft® Excel 5th Edition
Chapter 5 Time Series Analysis
Chapter 12 Multiple Regression
Curve-Fitting Regression
Statistics.
Correlation 2 Computations, and the best fitting line.
Excellence Justify the choice of your model by commenting on at least 3 points. Your comments could include the following: a)Relate the solution to the.
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
Richard Young Richard Young Optronic Laboratories Kathleen Muray Kathleen Muray INPHORA Carolyn Jones Carolyn Jones CJ Enterprises.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Correlation and Regression Analysis
Introduction to Regression Analysis, Chapter 13,
Simple Linear Regression. Introduction In Chapters 17 to 19, we examine the relationship between interval variables via a mathematical equation. The motivation.
Constant process Separate signal & noise Smooth the data: Backward smoother: At any give T, replace the observation yt by a combination of observations.
CpE- 310B Engineering Computation and Simulation Dr. Manal Al-Bzoor
Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department.
Inference for regression - Simple linear regression
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
CMPS1371 Introduction to Computing for Engineers NUMERICAL METHODS.
Oceanography 569 Oceanographic Data Analysis Laboratory Kathie Kelly Applied Physics Laboratory 515 Ben Hall IR Bldg class web site: faculty.washington.edu/kellyapl/classes/ocean569_.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 Part 4 Curve Fitting.
Measurement Uncertainties and Inconsistencies Dr. Richard Young Optronic Laboratories, Inc.
Copyright © 2010 Pearson Education, Inc. Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
Ch4 Describing Relationships Between Variables. Pressure.
Physics 114: Exam 2 Review Lectures 11-16
Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
1 Statistical Distribution Fitting Dr. Jason Merrick.
Ch4 Describing Relationships Between Variables. Section 4.1: Fitting a Line by Least Squares Often we want to fit a straight line to data. For example.
Sphere Standards and Standard Spheres Dr. Richard Young Optronic Laboratories, Inc.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
Copyright © 2009 Pearson Education, Inc. Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
Curve-Fitting Regression
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
Copyright © 2012 Pearson Education. All rights reserved © 2010 Pearson Education Copyright © 2012 Pearson Education. All rights reserved. Chapter.
Stat 112: Notes 2 Today’s class: Section 3.3. –Full description of simple linear regression model. –Checking the assumptions of the simple linear regression.
Section 10.1 Confidence Intervals
Copyright © 2010 Pearson Education, Inc. Slide A least squares regression line was fitted to the weights (in pounds) versus age (in months) of a.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
INTRODUCTION TO Machine Learning 3rd Edition
1 OUTPUT ANALYSIS FOR SIMULATIONS. 2 Introduction Analysis of One System Terminating vs. Steady-State Simulations Analysis of Terminating Simulations.
Chapter 12 Confidence Intervals and Hypothesis Tests for Means © 2010 Pearson Education 1.
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Principles of Extrapolation
ANOVA, Regression and Multiple Regression March
Machine Learning 5. Parametric Methods.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
BPS - 5th Ed. Chapter 231 Inference for Regression.
Physics 114: Lecture 13 Probability Tests & Linear Fitting
Chapter 4 Basic Estimation Techniques
Statistical Methods For Engineers
Basic Practice of Statistics - 3rd Edition Inference for Regression
Basis Expansions and Generalized Additive Models (1)
Presentation transcript:

Dr. Richard Young Optronic Laboratories, Inc.

 Uncertainty budgets are a growing requirement of measurements.  Multiple measurements are generally required for estimates of uncertainty.  Multiple measurements can also decrease uncertainties in results.  How many measurement repeats are enough?

Here is an example probability distribution function of some hypothetical measurements. We can use a random number generator with this distribution to investigate the effects of sampling.

Here is a set of 10,000 data points…

Plotting Sample # on a log scale is better to show behaviour at small samples.

There is a lot of variation, but how is this affected by the data set?

Here we have results for 200 data sets.

The most probable value for the sample standard deviation of 2 samples is zero! Many samples are needed to make 10 most probable.

Sometimes it is best to look at the CDF. The 50% level is where lower or higher values are equally likely.

What if the distribution was uniform instead of normal? The most probable value for >2 samples is  10.

Underestimated values are still more probable because the PDF is asymmetric.

 Throwing a die is an example of a uniform random distribution.  A uniform distribution is not necessarily random however.  It may be cyclic e.g. temperature variations due to air conditioning.  With computer controlled acquisition, data collection is often at regular intervals.  This can give interactions between the cycle period and acquisition interval.

For symmetric cycles, any multiple of two data points per cycle will average to the average of the cycle.

Correct averages are obtained when full cycles are sampled, regardless of the phase. Unless synchronized, data collection may begin at any point (phase) within the cycle.

Again, whole cycles are needed to give good values. The value is not 10 because sample standard deviation has a (n-1) 0.5 term. Standard Deviation

The population standard deviation is 10 at each complete cycle. Each cycle contains all the data of the population. The standard deviation for full cycle averages = 0.

 Smoothing involves combining adjacent data points to create a smoother curve than the original.  A basic assumption is that data contains noise, but the calculation does NOT allow for uncertainty.  Smoothing should be used with caution.

What is the difference?

Here is a spectrum of a white LED. It is recorded at very short integration time to make it deliberately noisy.

A 25 point Savitzky-Golay smooth gives a line through the center of the noise.

The result of the smooth is very close to the same device measured with optimum integration time

But how does the number of data points affect results? Here we have 1024 data points.

Now we have 512 data points.

Now we have 256 data points.

Now we have 128 data points.

A 25 point smooth follows the broad peak but not the narrower primary peak.

To follow the primary peak we need to use a 7 point smooth… But it doesn’t work so well on the broad peak.

Comparing to the optimum scan, the intensity of the primary peak is underestimated. This is because some of the higher signal data have been removed.

Beware of under-sampling peaks – you may underestimate or overestimate intensities.

Here is the original data again. What about other types of smoothing?

An exponential smooth shifts the peak. Beware of asymmetric algorithms!

This is the optimum integration scan but with 128 points like the noisy example. With lower noise, can we describe curves with fewer points?

… 64 points.

… 32 points. Is this enough to describe the peak?

 Interpolation is the process of estimating data between given points.  National Laboratories often provide data that requires interpolation to be useful.  Interpolation algorithms generally estimate a smooth curve.

 There are many forms of interpolation:  LeGrange, B-spline, Bezier, Hermite, Cardinal spline, cubic, etc.  They all have one thing in common:  They go through each given point and hence ignore uncertainty completely.  Generally, interpolation algorithms are local in nature and commonly use just 4 points.

The interesting thing about interpolating data containing random noise is you never know what you will get. Let’s zoom this portion…

The Excel curve can even double back. Uneven sampling can cause overshoots.

 If a spectrum can be represented by a function, e.g. polynomial, the closest “fit” to the data can provide smoothing and give the values between points.  The “fit” is achieved by changing the coefficients of the function until it is closest to the data.  A least-squares fit.

 The square of the differences between values predicted by the function, and those given by the data are added to give a “goodness of fit” measure.  Coefficients are changed until the “goodness of fit” is minimized.  Excel has a regression facility that performs this calculation.

 Theoretically, any simple smoothly varying curve can be fitted by a polynomial.  Sometimes it is better to “extract” the data you want to fit by some reversible calculation.  This means you can use, say, 9 th order polynomials instead of 123 rd order to make the calculations easier.

NIST provide data at uneven intervals. To use the data, we have to interpolate to intervals required by our measurements.

NIST recommend to fit a high- order polynomial to data values multiplied by 5 /exp(a+b/ ) for interpolation. The result looks good, but…

...on a log scale, the match is very poor at lower values.

When converted back to the original scale, lower values bear no relation to the data.

 The “goodness of fit” parameter is a measure of absolute differences, not relative differences.  NIST use a weighting of 1/E 2 to give relative differences, and hence closer matching, but that is not easy in Excel.  Large values tend to dominate smaller ones in the calculation.  A large dynamic range of values should be avoided.  We are trying to match data over 4 decades!

 Although NIST’s 1/E 2 weighting gives closer matches than this data, to get best results they split the data into 2 regions and calculate separate polynomials for each.  This a reasonable thing to do but can lead to local data effects and arbitrary splits that do not fit all examples.  Is there an alternative?

A plot of the log of E* 5 values vs. -1 is a gentle curve – almost a straight line. – almost a straight line. We can calculate a polynomial without splitting the data. The fact that we are fitting a log scale means we are effectively using relative differences in the least squares calculation.

Incandescent lamp emission is close to that of a blackbody.

If we calculate a scaled blackbody curve as we would to get the distribution temperature… …and then divide the data by the blackbody...

...we get a smooth curve with very little dynamic range. The “fit” is not good because of the high initial slope and almost linear falling slope.

Plotting vs. -1, as in alternative method 1, allows close fitting of the polynomial.

Method 2 shows lower residuals, but there is not much difference.

All methods discussed give essentially the same result when converted back to the original scale.

 None of the algorithms mentioned allow for uncertainty or assume it is constant.  If we replaced the least-squares “goodness of fit” parameter with “most probable,” this would use the uncertainty we know is there to determine the best fit.  Why is this not done?  Difficult in Excel.  Easy with custom programs.

From the data value (mean) and the standard deviation, we can calculate the PDF. The value from the fit has a probability that we can use.

 Multiply the probabilities at each point to give the “goodness of fit” parameter.  Use this parameter instead of the least- squares in the fit calculations.  MAXIMIZE the “goodness of fit” parameter to obtain the best fit.  The fit will be closest where uncertainties are lowest.

 Standard deviations may be under- estimated with small samples.  Cyclic variations should be integrated for complete cycle periods.  Smoothing and interpolation should be used with caution:  Do not assume results are valid – check.

 Polynomial fits can give good results, but:  Avoid large dynamic range  Avoid complex curvatures  Avoid high initial slopes  All these manipulations ignore uncertainty (or assume it is constant).  But least-squares fits can be replaced by maximum probability to take uncertainty into consideration.