Toward a Characterization of

Slides:



Advertisements
Similar presentations
Dummy Dependent variable Models
Advertisements

The Simple Linear Regression Model Specification and Estimation Hill et al Chs 3 and 4.
Logistic Regression Psy 524 Ainsworth.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Part V The Generalized Linear Model Chapter 16 Introduction.
Chapter 11 Multiple Regression.
Variance and covariance Sums of squares General linear models.
Further distributions
Random Sampling, Point Estimation and Maximum Likelihood.
Today: Lab 9ab due after lecture: CEQ Monday: Quizz 11: review Wednesday: Guest lecture – Multivariate Analysis Friday: last lecture: review – Bring questions.
Analysis Overheads1 Analyzing Heterogeneous Distributions: Multiple Regression Analysis Analog to the ANOVA is restricted to a single categorical between.
Lecture 9. Continuous Probability Distributions David R. Merrell Intermediate Empirical Methods for Public Policy and Management.
Sean Canavan David Hann Oregon State University The Presence of Measurement Error in Forestry.
Statistics 300: Elementary Statistics Sections 7-2, 7-3, 7-4, 7-5.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
Model Comparison. Assessing alternative models We don’t ask “Is the model right or wrong?” We ask “Do the data support a model more than a competing model?”
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
Probability Distributions  A variable (A, B, x, y, etc.) can take any of a specified set of values.  When the value of a variable is the outcome of a.
Stats Methods at IC Lecture 3: Regression.
Chapter 6 The Normal Distribution and Other Continuous Distributions
Inference about the slope parameter and correlation
Chapter 20 Linear and Multiple Regression
The Maximum Likelihood Method
CHAPTER 12 MODELING COUNT DATA: THE POISSON AND NEGATIVE BINOMIAL REGRESSION MODELS Damodar Gujarati Econometrics by Example, second edition.
Standard Errors Beside reporting a value of a point estimate we should consider some indication of its precision. For this we usually quote standard error.
Normal Distribution and Parameter Estimation
Regression Analysis: Statistical Inference
שיטות כמותיות בחקר רשתות פרופ' רן גלעדי המח' להנדסת מערכות תקשורת
IEE 380 Review.
Sampling Distributions
Probability and Estimation
Engineering Probability and Statistics - SE-205 -Chap 3
MTH 161: Introduction To Statistics
B&A ; and REGRESSION - ANCOVA B&A ; and
Generalized Linear Models
Statistics in MSmcDESPOT
Distribution functions
Random Variates 2 M. Overstreet Spring 2005
CPSC 531: System Modeling and Simulation
Sampling Distributions
Parameter, Statistic and Random Samples
Correlation and Simple Linear Regression
Probability and Estimation
t distribution Suppose Z ~ N(0,1) independent of X ~ χ2(n). Then,
BA 275 Quantitative Business Methods
CHAPTER 26: Inference for Regression
Methods of Economic Investigation Lecture 12
Linear Regression.
Probability Review for Financial Engineers
Regression Models - Introduction
The Regression Model Suppose we wish to estimate the parameters of the following relationship: A common method is to choose parameters to minimise the.
Scatter Plots of Data with Various Correlation Coefficients
Discrete Event Simulation - 4
Correlation and Simple Linear Regression
The Gamma PDF Eliason (1993).
What is Regression Analysis?
Interval Estimation and Hypothesis Testing
Addition of Independent Normal Random Variables
12 Inferential Analysis.
CHAPTER 15 SUMMARY Chapter Specifics
Statistics Lecture 12.
Simple Linear Regression and Correlation
Simple Linear Regression
Statistics II: An Overview of Statistics
Carey Williamson Department of Computer Science University of Calgary
M248: Analyzing data Block A UNIT A3 Modeling Variation.
Chapter 7 The Normal Distribution and Its Applications
Regression Models - Introduction
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Toward a Characterization of Measurement Error Sean Canavan David Hann Oregon State University

Recall: Measurement error enters into forestry in many different ways and forms The errors can have very negative effects on model parameters, model estimates, and the variances of model parameters and model estimates. Correction techniques do exist for countering the effects of measurement errors in many situations, but typically require knowing something about the form of the errors. People have generally made the assumption that the errors are Normal in distribution.

Study Data:

Study Data: Dbh: n = 2175 < 0 : 529, = 0 : 368, > 0 : 1278 < 0 : 529, = 0 : 368, > 0 : 1278 0.8” – 72.1” Species: DF, TF, PP, SP, IC Ht: n = 1238 < 0 : 722, = 0 : 30, > 0 : 486 8.4’ – 231.7’ Positive fit R2 = 0.9486

The Normal Assumption: It is often assumed that measurement errors follow a Normal distribution - (Nester 1981, Garcia 1984, Smith 1986, Päivinen & Yli-Kojola 1989, Gertner 1991, McRoberts et al. 1994, Kozak 1998, Kangas 1998, Kangas & Kangas 1999, Phillips et al. 2000, Williams & Schreuder 2000) Bias assumption: μ = 0 Variance assumption: homogeneous (σ2 constant) heterogeneous (σ2 not constant)

Normal(0,1) PDF 0.45 0.4 0.35 0.3 0.25 f(x) 0.2 0.15 0.1 0.05 -5 -4 -3 -2 -1 1 2 3 4 5 x

The Normal Assumption: It is often assumed that measurement errors follow a Normal distribution - (Nester 1981, Garcia 1984, Smith 1986, Päivinen & Yli-Kojola 1989, Gertner 1991, McRoberts et al. 1994, Kozak 1998, Kangas 1998, Kangas & Kangas 1999, Phillips et al. 2000, Williams & Schreuder 2000) Bias assumption: μ = 0 Variance assumption: homogeneous (σ2 constant) heterogeneous (σ2 not constant) What happens when there are many correct measurements? example: Dbh measured to a tenth of an inch

Measurement Error Value Cumulative Probability 25% Correct 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 -1.5 -1 -0.5 1 1.5 nsig = 115 50% nsig = 50 75% nsig = 12 100% nsig = 6 Measurement Error Value Cumulative Probability

Error Distribution Modeling: First Approach: PDF modeling Second Approach: CDF modeling Part 1: Modeling Error Type Probabilities Part 2: Modeling the Positive and Negative error portions of the curve

Normal(0,1) CDF 1 0.9 0.8 0.7 0.6 F(x) = P(X < x) 0.5 0.4 0.3 0.2 0.1 -5 -4 -3 -2 -1 1 2 3 4 5 x

Empirical Dbh Error CDF Surface 1.00 0.75 Cumulative Probability 0.50 0.25 0.0 10.0 20.0 Dbh (inches) 30.0 40.0 1.8 1.2 0.6 0.0 -0.6 Error (inches)

Error Distribution Modeling: First Approach: PDF modeling Second Approach: CDF modeling Part 1: Modeling Error Type Probabilities Part 2: Modeling the Positive and Negative error portions of the curve

{ } Fitted CDF Equation: P(X = x) = Pr(ε < 0)*Negative Error CDF ε < 0 Pr(ε < 0) + Pr(ε = 0) ε = 0 Pr(ε < 0) + Pr(ε = 0) + Pr(ε > 0)*Positive Error CDF ε > 0 P(X = x) = 0.00 0.20 0.40 0.60 0.80 1.00 -1.5 -1 -0.5 0.5 1 1.5 Error Size Cumulative Probabiility } Pr(ε = 0) Pr(ε < 0) Pr(ε > 0) Positive fit R2 = 0.9486

Part 1: Error Type Probability Modeling Multinomial Regression in S-Plus GLM with a Poisson link function Overdispersion/Quasilikelihood Counts by 1-inch Dbh Classes / 5-ft. & 10-ft. Ht Classes Candidate predictors: Dbh, Dbh½, Dbh2, Dbh-1 Ht, Ht½, Ht2, Ht-1 Probability model forms: ) ( 2 1 i Dbh f e + ) ( 2 1 Dbh f e +

Fits of Error Type Probabilities 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 Dbh (inches) Probability P(e > 0) P(e = 0) P(e < 0)

Part 2: Modeling Positive and Negative CDFs Negative Errors: CDFs by Dbh class Step 1: Exponential fits model form: exp(β*Error Size) actually fit: 1 – exp(β*Error Size) Step 2: Parameter Modeling βi = f(Dbh) Step 3: Combined Equation Fit 1 – exp(f(Dbh)*Error Size) So we want a function that looks like an exponential, but changes as Dbh changes. Or, we want to fit exponential functions by Dbh and look at how the parameter changes with Dbh.

Cumulative Probability 21.5” Class 2.5” Class 0.2 0.4 0.6 0.8 1 -1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 21.5” Class 2.5” Class 45.0” Class Error Size (inches)

Part 2: Modeling Positive and Negative CDFs Negative Errors: CDFs by Dbh class Step 1: Exponential fits model form: exp(β*Error Size) actually fit: 1 – exp(β*Error Size) Step 2: Parameter Modeling βi = f(Dbh) Step 3: Combined Equation Fit 1 – exp(f(Dbh)*Error Size)

Parameter Modeling: Fitted Exponential Coefficients Dbh Class 20 40 60 80 100 120 5 10 15 25 30 35 45 50 Dbh Class Fitted Exponential Coefficients 1000 1200 10.04exp(-0.03Dbh + 1.77Dbh-1 + 0.59Dbh-2)

Part 2: Modeling Positive and Negative CDFs Negative Errors: CDFs by Dbh class Step 1: Exponential fits model form: exp(β*Error Size) actually fit: 1 – exp(β*Error Size) Step 2: Parameter Modeling βi = f(Dbh) Step 3: Combined Equation Fit 1 – exp(f(Dbh)*Error Size)

Combined equation fit: Variable power on error size: 1 – exp[b0*exp(b1Dbh + b2Dbh-1 + b3Dbh-2)*(error size)c1] Resulting CDF equation: exp[10.04*exp(-0.03*Dbh + 1.77*Dbh-1)*(error size)0.59] adjusted R2 = 0.8664 Positive fit R2 = 0.9486

Fitted Dbh Error CDF Surface 1.00 0.75 Cumulative Probability 0.50 0.25 0.00 8 16 Dbh (inches) 24 32 40 2.00 1.00 0.00 -1.00 Error (inches)

Alternative Surfaces (Dbh): Normal 1: Unbiased, homogeneous Normal: μ = 0.0, σ = 0.2237 Normal 2: Constant bias, homogeneous Normal: μ = 0.0901, σ = 0.2237 Normal 3: Non-constant bias, homogeneous Normal: μ = 0.003983*Dbh + 0.000121*Dbh2, σ = 0.2237 Normal 4: Unbiased, heterogeneous Normal: μ = 0.0, σ = σD*exp[0.1145*Dbh] Normal 5: Non-constant bias, heterogeneous Normal: μ = μ = 0.003983*Dbh + 0.000121*Dbh2, σ = σD*exp[0.1145*Dbh]

Sum of Squared Differences Comparison of Surface Fits: Sum of Squared Differences Distribution Dbh (n=2175) Ht (n=1238) Ours 4.7893 7.9583 Normal 1 41.1679 26.3670 Normal 2 61.5424 19.2941 Normal 3 39.1445 18.9771 Normal 4 20.3516 17.3235 Normal 5 19.4147 6.8926

Conclusions: Case of many correct measurements Case of few correct measurements Drawing random samples Species differences Changing precision levels: Dbh: 0.1”  1.0”  368  1087 out of 2175 Ht: 0.1’  1.0’  30  274 out of 1238

"Sampling gets you to the final answer, if you do it often enough. Measuring everything correctly gets you to the correct answer. Don't get those mixed up." Olde Statistical Sayings Inventory and Cruising Newsletter Issue No. 32, October 1995