Using uncertainty to test model complexity Barry Croke.

Slides:



Advertisements
Similar presentations
DATA & STATISTICS 101 Presented by Stu Nagourney NJDEP, OQA.
Advertisements

The Simple Linear Regression Model Specification and Estimation Hill et al Chs 3 and 4.
Estimating a Population Variance
Section 6-4 Sampling Distributions and Estimators.
Institute for Reference Materials and Measurements (IRMM) Geel, Belgium 230th ACS NM, Washington D.C.,
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
Quiz Do random errors accumulate? Name 2 ways to minimize the effect of random error in your data set.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Simple Linear Regression. Start by exploring the data Construct a scatterplot  Does a linear relationship between variables exist?  Is the relationship.
Errors & Uncertainties Confidence Interval. Random – Statistical Error From:
Precision versus Accuracy (1) Precision is the variation of X around – expressed as standard deviation or variance Accuracy is the closeness of to the.
Data Sources The most sophisticated forecasting model will fail if it is applied to unreliable data Data should be reliable and accurate Data should be.
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
7-1 Chapter Seven SAMPLING DESIGN. 7-2 Sampling What is it? –Drawing a conclusion about the entire population from selection of limited elements in a.
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
Quantitative Business Analysis for Decision Making Simple Linear Regression.
Formalizing the Concepts: Simple Random Sampling.
1 A MONTE CARLO EXPERIMENT In the previous slideshow, we saw that the error term is responsible for the variations of b 2 around its fixed component 
Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: prediction Original citation: Dougherty, C. (2012) EC220 - Introduction.
Standard error of estimate & Confidence interval.
1 PREDICTION In the previous sequence, we saw how to predict the price of a good or asset given the composition of its characteristics. In this sequence,
How do you simplify? Simple Complicated.
Common Probability Distributions in Finance. The Normal Distribution The normal distribution is a continuous, bell-shaped distribution that is completely.
Stats for Engineers Lecture 9. Summary From Last Time Confidence Intervals for the mean t-tables Q Student t-distribution.
Chapter Nine Copyright © 2006 McGraw-Hill/Irwin Sampling: Theory, Designs and Issues in Marketing Research.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
Physics 270 – Experimental Physics. Standard Deviation of the Mean (Standard Error) When we report the average value of n measurements, the uncertainty.
Slide 1 © 2002 McGraw-Hill Australia, PPTs t/a Introductory Mathematics & Statistics for Business 4e by John S. Croucher 1 n Learning Objectives –Identify.
McGraw-Hill/Irwin © 2003 The McGraw-Hill Companies, Inc.,All Rights Reserved. Part Two THE DESIGN OF RESEARCH.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Section 6-3 Estimating a Population Mean: σ Known.
L Berkley Davis Copyright 2009 MER301: Engineering Reliability Lecture 12 1 MER301: Engineering Reliability LECTURE 12: Chapter 6: Linear Regression Analysis.
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate its.
Fundamentals of Data Analysis Lecture 11 Methods of parametric estimation.
Statistics for Business and Economics 7 th Edition Chapter 7 Estimation: Single Population Copyright © 2010 Pearson Education, Inc. Publishing as Prentice.
Challenges to the Epidemiology of Aging: The REasons for Geographic And Racial Differences in Stroke Study George Howard, DrPH UAB School of Public Health.
Independent Samples: Comparing Means Lecture 39 Section 11.4 Fri, Apr 1, 2005.
Chapter 10 Confidence Intervals for Proportions © 2010 Pearson Education 1.
MECH 373 Instrumentation and Measurements
Research methodology MSC COURSE VALIDATING of MODELS
Introduction For inference on the difference between the means of two populations, we need samples from both populations. The basic assumptions.
Regression Analysis AGEC 784.
Sampling Distributions and Estimators
STA 291 Spring 2010 Lecture 12 Dustin Lueker.
Model validation and prediction
Introduction to estimation: 2 cases
Section 11.1 Day 2.
Comparing Theory and Measurement
Graduate School of Business Leadership
Confidence Intervals for Proportions
Lecture 19: Spatial Interpolation II
POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.
Correlation and Regression
Chapter 12 Curve Fitting : Fitting a Straight Line Gab-Byung Chae
Introduction to Instrumentation Engineering
Variance Variance: Standard deviation:
Residuals The residuals are estimate of the error
No notecard for this quiz!!
Random sampling Carlo Azzarri IFPRI Datathon APSU, Dhaka
BOOTSTRAPPING: LEARNING FROM THE SAMPLE
10701 / Machine Learning Today: - Cross validation,
BASIC REGRESSION CONCEPTS
Cross-validation for the selection of statistical models
Chapter 13 Additional Topics in Regression Analysis
Errors and Uncertainties
STA 291 Spring 2008 Lecture 13 Dustin Lueker.
Sample vs Population (true mean) (sample mean) (sample variance)
STA 291 Summer 2008 Lecture 12 Dustin Lueker.
STA 291 Spring 2008 Lecture 12 Dustin Lueker.
Bootstrapping and Bootstrapping Regression Models
Presentation transcript:

Using uncertainty to test model complexity Barry Croke

Concept Possible to test whether a model has an appropriate complexity through analysis of scatter in residuals Take the standard deviation of the uncertainty distribution as a measure of uncertainty Divide each residual by its uncertainty (need uncertainty in both observed and modelled values)

Concept Result will be a distribution of values: If model has appropriate complexity, then the standard deviation of the distribution of normalised residuals should be near 1. If standard deviation of normalised residuals >> 1 then model is too simple. If standard deviation << 1 then model is too complex (has fitted to noise) Result depends on the accuracy and precision of the uncertainty estimates

Complications assumption is that the estimates of the uncertainties in the observed and modelled values are sufficiently well known. A systematic over-estimation of the uncertainties will result in a reduction in the standard deviation of the normalised residuals leading to a bias toward simpler models. an under-estimation of the uncertainties will bias the result to more complex models.

Complications Further, any random noise in the uncertainty estimates will tend to increase the standard deviation of the normalised residuals, biasing the result towards more complex models. Any structured noise in the uncertainty estimates may lead to an over- or under-estimation in the standard deviation of the normalised residuals. Must be evaluated based on the confidence in the estimated uncertainties in the observed and modelled values.