Research methodology MSC COURSE VALIDATING of MODELS

Slides:



Advertisements
Similar presentations
Properties of Least Squares Regression Coefficients
Advertisements

© Copyright 2001, Alan Marshall1 Regression Analysis Time Series Analysis.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Environmental Data Analysis with MatLab Lecture 23: Hypothesis Testing continued; F-Tests.
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Section 12.2: Statistics and Parameters. You analyzed data collection techniques. Identify sample statistics and population parameters. Analyze data sets.
Regression Analysis. Unscheduled Maintenance Issue: l 36 flight squadrons l Each experiences unscheduled maintenance actions (UMAs) l UMAs costs $1000.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
PERFORMANCE MODELS Lecture 16. Understand use of performance models Identify common modeling approaches Understand methods for evaluating reliability.
Gordon Stringer, UCCS1 Regression Analysis Gordon Stringer.
Quantitative Business Analysis for Decision Making Simple Linear Regression.
Correlation and Regression Analysis
Inference for regression - Simple linear regression
Accuracy Precision % Error
MAT 254 – Probability and Statistics Sections 1,2 & Spring.
Hydrologic Modeling: Verification, Validation, Calibration, and Sensitivity Analysis Fritz R. Fiedler, P.E., Ph.D.
Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)
Measurement Uncertainties Physics 161 University Physics Lab I Fall 2007.
Least-Squares Regression Section 3.3. Why Create a Model? There are two reasons to create a mathematical model for a set of bivariate data. To predict.
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Accuracy Precision % Error. Variable is a factor that affects the outcome of an experiment. 3 Types of variables Experimental/ Independent Variable The.
Multivariate Analysis. One-way ANOVA Tests the difference in the means of 2 or more nominal groups Tests the difference in the means of 2 or more nominal.
Regression. Population Covariance and Correlation.
Lecture 2 Forestry 3218 Lecture 2 Statistical Methods Avery and Burkhart, Chapter 2 Forest Mensuration II Avery and Burkhart, Chapter 2.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Statistical analysis. Types of Analysis Mean Range Standard Deviation Error Bars.
Measurement Errors. Digital scale Addition and Subtraction Measurement Errors Δy = the absolute error in y Δa = the absolute error in a Δb = the absolute.
Lecture 5 Introduction to Sampling Distributions.
How to detect the change of model for fitting. 2 dimensional polynomial 3 dimensional polynomial Prepare for simple model (for example, 2D polynomial.
Lecture 10 Introduction to Linear Regression and Correlation Analysis.
Here are the IB Physics student requirements for dealing with uncertainties:  Describe and give examples of random and systematic errors. Dealing With.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
Chapter 4 Variability PowerPoint Lecture Slides Essentials of Statistics for the Behavioral Sciences Seventh Edition by Frederick J Gravetter and Larry.
14.0 Math Review 14.1 Using a Calculator Calculator
Using uncertainty to test model complexity Barry Croke.
Statistics Introduction.
Modeling in R Sanna Härkönen.
Topic 10 - Linear Regression
Model validation and prediction
Basic Estimation Techniques
25 Math Review Part 1 Using a Calculator
Comparing Theory and Measurement
Uncertainty, Measurements and Error Analysis
Lecture 19: Spatial Interpolation II
Research methodology R Statistics – Introduction
Lecture 4: Meta-analysis
Correlation and Simple Linear Regression
Basic Estimation Techniques
Measuring Bias in forecast
I271B Quantitative Methods
Everyone thinks they know this stuff
measurement and data processing Topic 11.1 & 11.2 (not 11.3)
Measure of precision - Probability curve shows the relationship between the size of an error and the probability of its occurrence. It provides the most.
Correlation and Simple Linear Regression
Section 7.1 Sampling Distributions
Simple Linear Regression and Correlation
DATA ANALYSIS: STATISTICS AND GRAPHING
Product moment correlation
Lecture 1: Descriptive Statistics and Exploratory
The data sets {19, 20, 21} and {0, 20, 40} have the same mean and median, but the sets are very different. The way that data are spread out from the mean.
Sample vs Population (true mean) (sample mean) (sample variance)
Chapter 5: Errors in Chemical Analysis
Propagation of Error Berlin Chen
STA 291 Summer 2008 Lecture 12 Dustin Lueker.
Propagation of Error Berlin Chen
Comparing Theory and Measurement
Chapter 13 Simple Linear Regression
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Research methodology MSC COURSE VALIDATING of MODELS Sanna HÄRKÖNEN

LECTURE CONTENTS Validating of models Model BIAS and RMSE Instructions for doing the course group work

VALIDATING MODELS Important information about model’s applicability. Does it work also outside of modeling data set? For example: leaf biomass model has been fitted with inventory data from Southern Finland. Can you apply the same model for Lappland?

Concept 1. Modeling data set: Data set used for building a model For example TREE_BIOMASS = f(height, diameter) Checking model results: R2, p-values, residuals -> Goodness of the model in your modeling data set 2. Evaluation data set: Another data set used for checking, if model works also elsewhere Running the existing model with this data and comparing the model results with measured data

Example case You have build a model, which estimates tree height as a function of tree diameter H= f(D). Modeling data was measured in Northern Finland. Does it work properly in other areas? -> check with data from elsewhere. For example: apply model to calculate H for trees with Joensuu data and compare the modeled and measured values. Do the results differ those in the modeling data set? Are the differences significant?

VALIDATING MODELS Commonly used measures: Modeled Commonly used measures: RMSE (ROOT MEAN SQUARED ERROR) Describes how much scatter there is between the measured and modeled values Precision BIAS Describes how much the average level of the modeled values differ from the measured ones Accuracy Measured Modeled Measured

Absolute RMSE and relative RMSE (%): MeasH ModH SUM((measH-modH)^2) 4 13 16 9 19 2 3 1 0+9+9+1+1=20    

ABSOLUTE BIAS and RELATIVE BIAS% MeasH ModH SUM(measH-modH) 4 13 16 -3 19 2 3 -1 0+(-3)+(-3)+(-1)+(-1)= -8    

BIAS AND RMSE: what do they mean? MODEL 1: BIAS = -1.6 RMSE = 2.0 BIAS AND RMSE: what do they mean? MODEL 2: BIAS = -1.4 RMSE = 4.7 MODEL 3: BIAS = -7.6 RMSE = 7.9 RMSE -> how much there is scatter between the modeled and measured values. With high RMSE, model can still be unbiased (i.e. it is averagely in the same level than the measured values) Bias -> Describes model’s tendency to systematically either over or underestimate the values

BIAS AND RMSE: what do they mean? Significance of model bias? Are the measured vs. modeled values statistically different, or is the bias just caused by random variation? Can be tested by T-test by comparing the measured and modeled data sets together (see Blas’ lecture notes / T-test) In R the command is: t.test(y1,y2,paired=TRUE) (y1 and y2 would be measured and modeled values in your data set, and paired=TRUE denotes that your two data sets includes data for same ”individuals”) If T-test results (p-value) is <0.05, the bias is significant