How long do we need to run an experiment? Ignacio Colonna & Don Bullock.

Slides:



Advertisements
Similar presentations
Spatial point patterns and Geostatistics an introduction
Advertisements

Autocorrelation Functions and ARIMA Modelling
Introduction Describe what panel data is and the reasons for using it in this format Assess the importance of fixed and random effects Examine the Hausman.
1 Regression as Moment Structure. 2 Regression Equation Y =  X + v Observable Variables Y z = X Moment matrix  YY  YX  =  YX  XX Moment structure.
The Simple Regression Model
Brief introduction on Logistic Regression
Experimental Design, Response Surface Analysis, and Optimization
The General Linear Model Or, What the Hell’s Going on During Estimation?
Model generalization Test error Bias, variance and complexity
SPATIAL DATA ANALYSIS Tony E. Smith University of Pennsylvania Point Pattern Analysis Spatial Regression Analysis Continuous Pattern Analysis.
Basic geostatistics Austin Troy.
W. McNair Bostick, Oumarou Badini, James W. Jones, Russell S. Yost, Claudio O. Stockle, and Amadou Kodio Ensemble Kalman Filter Estimation of Soil Carbon.
The Simple Linear Regression Model: Specification and Estimation
Curve-Fitting Regression
Chapter 4 Multiple Regression.
Deterministic Solutions Geostatistical Solutions
Bivariate & Multivariate Regression correlation vs. prediction research prediction and relationship strength interpreting regression formulas process of.
Ordinary Kriging Process in ArcGIS
Experimental Evaluation
The Calibration Process
Dealing with Heteroscedasticity In some cases an appropriate scaling of the data is the best way to deal with heteroscedasticity. For example, in the model.
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
Autocorrelation Lecture 18 Lecture 18.
Applications in GIS (Kriging Interpolation)
Method of Soil Analysis 1. 5 Geostatistics Introduction 1. 5
So are how the computer determines the size of the intercept and the slope respectively in an OLS regression The OLS equations give a nice, clear intuitive.
Relationships Among Variables
Hypothesis Testing in Linear Regression Analysis
Determining Sample Size
Agronomic Spatial Variability and Resolution What is it? How do we describe it? What does it imply for precision management?
Spatial Interpolation of monthly precipitation by Kriging method
2-1 MGMG 522 : Session #2 Learning to Use Regression Analysis & The Classical Model (Ch. 3 & 4)
Montecarlo Simulation LAB NOV ECON Montecarlo Simulations Monte Carlo simulation is a method of analysis based on artificially recreating.
Agronomic Spatial Variability and Resolution What is it? How do we describe it? What does it imply for precision management?
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
POSC 202A: Lecture 12/10 Announcements: “Lab” Tomorrow; Final ed out tomorrow or Friday. I will make it due Wed, 5pm. Aren’t I tender? Lecture: Substantive.
1 Statistical Distribution Fitting Dr. Jason Merrick.
Modern Navigation Thomas Herring
Explorations in Geostatistical Simulation Deven Barnett Spring 2010.
Statistical Methods Statistical Methods Descriptive Inferential
Geographic Information Science
Repeated Measurements Analysis. Repeated Measures Analysis of Variance Situations in which biologists would make repeated measurements on same individual.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Managerial Economics Demand Estimation & Forecasting.
1Spring 02 Problems in Regression Analysis Heteroscedasticity Violation of the constancy of the variance of the errors. Cross-sectional data Serial Correlation.
1 G Lect 14M Review of topics covered in course Mediation/Moderation Statistical power for interactions What topics were not covered? G Multiple.
Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
Lowenberg-DeBoer, Lambert, Bongiovanni 1 Appropriate On-Farm Trial Designs for Precision Farming J. Lowenberg-DeBoer 1, D. Lambert 1, R. Bongiovanni 2.
Vamsi Sundus Shawnalee. “Data collected under different conditions (i.e. treatments)  whether the conditions are different from each other and […] how.
11 Chapter 5 The Research Process – Hypothesis Development – (Stage 4 in Research Process) © 2009 John Wiley & Sons Ltd.
Dynamic Models, Autocorrelation and Forecasting ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.
Agronomic Spatial Variability and Resolution What is it? How do we describe it? What does it imply for precision management?
Linear Regression Linear Regression. Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Purpose Understand Linear Regression. Use R functions.
More on regression Petter Mostad More on indicator variables If an independent variable is an indicator variable, cases where it is 1 will.
SPM short – Mai 2008 Linear Models and Contrasts Stefan Kiebel Wellcome Trust Centre for Neuroimaging.
Geo479/579: Geostatistics Ch12. Ordinary Kriging (2)
G Lecture 71 Revisiting Hierarchical Mixed Models A General Version of the Model Variance/Covariances of Two Kinds of Random Effects Parameter Estimation.
MathematicalMarketing Slide 5.1 OLS Chapter 5: Ordinary Least Square Regression We will be discussing  The Linear Regression Model  Estimation of the.
Single Season Study Design. 2 Points for consideration Don’t forget; why, what and how. A well designed study will:  highlight gaps in current knowledge.
The simple linear regression model and parameter estimation
MEASURES OF CENTRAL TENDENCY Central tendency means average performance, while dispersion of a data is how it spreads from a central tendency. He measures.
Linear Mixed Models in JMP Pro
POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.
How to handle missing data values
Basic Estimation Techniques
Chapter 12 – Autocorrelation
OVERVIEW OF LINEAR MODELS
Tutorial 1: Misspecification
Seminar in Economics Econ. 470
Presentation transcript:

How long do we need to run an experiment? Ignacio Colonna & Don Bullock

Grain yield maps show a considerable variability across years Yield map Algorithm NF = 21.4 kg N / MT grain * YG – Rotation Credit – Incidental N

 But is the variability in yield that matters?  It is the variability in response to inputs that matters.  Assuming a profit-maximizing farmer… Yield response function

Yield response and profit functions (only due to N)

Profit N = Yield N * $ Corn – kg N * $ N

 Statistics vs. management: Responses may be significantly different statistically, yet yield similar management decisions. Different curves, same optimum Similar curves, different optima

 But these are ‘after the fact’ optimum N rates…  Farmers’ decisions usually based on the best a priori guess  Concept of ex-post vs ex-ante optima Ex-post optimum: computed after collecting the data Ex-ante optimum: best guess given the information available before the fact. (“long-run” optimum) e.g. we have 15 years of N response data for a site:

 For this study Question 1: How does the uncertainty about the ‘true’ N rate at a given site change with years of experimentation? No published estimates on uncertainty in ex- ante N rates as a function of experiment length in the US Midwest.

Question 2: What is the cost of not knowing the true N rate at a given site? No published estimates on practical consequences of different lengths of experimentation on fertilizer application decisions.

Data : source N fertilizer trial at Monmouth, IL (conducted by Nafziger,Adee,Hoeft,Mainz)

Data : experimental design 21 years : Split plot in RCBD, 3 reps. 2 rotations: C/C and C/S 5 fertilizer rates: 0, 67, 134, 201, 269 kg/ha (pre-plant) Individual plots 6.1 m 18 m

21 years x 2 Rotations: Raw Yield Means C/C C/S (

21 years x 2 Rotations: Model fits Yield response (ton/ha) C/C C/S

21 years x 2 Rotations: Variability in ex-post N opt ‘True’ ex-ante N opt =173 kg/ha ‘True’ ex-ante N opt =110 kg/ha

Pick two years at random (#1) Compute ex-ante optimum N rate (#1) Pick two years at random (#2) Compute ex-ante optimum N rate (#2). Pick two years at random (#3) 1000 samples=1000 estimates of ex-ante N rate Repeat for groups of 3 years,4 years,…etc.  A look at uncertainty in ex-ante N optima

 Results from resampling approach: distributions Ex-ante optimum N (kg/ha)

 Results from resampling approach: SD and CV Years of experimentation Ex-ante N opt std.deviation Ex-ante N opt CV C/C C/S

 Results from resampling approach: Practical implications C/C Error (+ or -)

 Results from resampling approach: Practical implications C/C Profit at ‘true’ ex- ante N opt = 249 $/ha Loss relative to maximum

 Conclusions so far : Relatively small effect in monetary terms (~very small for >4 years, e.g. at 4 years 10$/ha But, how do these errors compare with within- field spatial variability in N opt ? Is this of use to conventional systems?

Regression of Crop Yield with Soil and Landscape Attributes: An Assessment of Some Common Methods for Dealing with Spatially Correlated Residuals

Spatial correlation of residuals in regressions are often overlooked in agronomic and engineering research, especially so in analyses related to precision agriculture, with a few exceptions. We argue that this oversight is not trivial and neither is the choice for its solution.

Field Experiment

Soybean yield monitor data (2 years)

Soil sample data (P and K)

Elevation data and derivatives (Slope, Aspect, etc.)

20 m grid

OLS (Ordinary least squares) if errors are as assumed, but often residuals do show spatial correlation due to variables not included in the model Spatial Mixed o Errors not assumed independent. o Σ estimated with geostatistical models. o Parameters for  est by ML or REML Example of code in SAS® Proc Mixed *Iterative. Initial values obtained from inspection of variogram of OLS residuals parms/*sill*/(600)/*range*/(90)/ *nugget*/(650); repeated /subject=intercept local type=sp(sph)(x y); GLS: Generalized Least Squares estimator

Semivariograms of OLS residuals

Spatial Mixed Errors not assumed independent. Σ estimated with geostatistical models. Parameters for  est by ML or REML GLS: Generalized Least Squares estimator

Nearest Neighbors (non-iterative version – computations are simple) Average of neighboring OLS residuals Computation: Compute OLS regression Y=X  +  and save residuals (  ). Compute average of neighboring residuals for each point (W  ). Compute new OLS regression but using W  from 2 as a covariate in: Y=X  +  W  + 

Spatially autoregressive approaches SAR error - the effect of the observed OLS residuals is due to the omission of spatially structured explanatory variables in the X matrix. SAR lag - value of response variable is in part due to a contagion or diffusion from the same variable at nearby locations or there is a mismatch between the scale at the a variable is measured and the true scale of the process. Decide upon model based on substantive interpretation and Lagrange Multiplier specification tests (Anselin).

“Queen Structure” for W Yellow: neighbors = 1 Blue: not neighbors = 0 Red: Point i

SAR-Error (Spatial Autoregressive – Error) Average of neighboring OLS residuals

SAR-Lag (Spatial Autoregressive – Lag) Average of neighboring values for Y “Direct effect of neighbors on point i”

Flat line→spatially uncorrelated residuals. All methods seem to achieve similar results in terms of residual spatial structure. Points shifted vertically to aid visualization.

Shaded values are significantly different from OLS estimates

Regression example - Conclusions Spatial Mixed, SAR-error and SAR-lag parameter estimates showed significant differences to those from OLS only for the year with the largest spatial structure. Parameter estimates from NN where not significantly different from OLS ones, despite the apparent difference in magnitude.

Estimates from SAR-lag were in general smaller in magnitude relative to all other methods. This is due to the “filtering” performed by this method on the response variable. Is this reasonable for this type of analysis? We believe it is not.

So, which method should we choose to account for the spatial correlation of residuals in regression? This question motivates the second part of our analysis.

Simulation Experiment

3 independent variables: x 1,x 2 and e with short and long range error structures:

Random values for each variable generated at 4 densities in a 400 m x 400 m field. Values generated using Sim2d in SAS®. Based on LU decomposition of the covariance matrix. Spatial structure based on a spherical model realizations for each variable-density-error structure combination (e.g. e-440-short range)

Generate dependent variable Y: Y short = x x 2 +e short Y long = x x 2 +e short Adjusted theoretical R 2 = = x1x1 x2x2 e y 1000 X

Regression model: Y=b 0 +b 1 x 1 +b 2 x 2 Parameters Estimated by OLS Spatial Mixed SAR-error SAR-lag Nearest neighbors  Methodology: Analysis of simulated data

Higher point densities: Dispersion: OLS and NN show a considerably higher dispersion than Spatial Mixed and SAR methods. Bias: SAR-lag shows a marked downward bias at high densities, resulting in an underestimation of the true effect of x 1. Lower point densities: Neither dispersion nor bias differ among methods for a short correlation range. For a larger correlation range in the residuals, dispersion for OLS and bias for SAR-lag are still important at lower densities. Results are similar for  2 (not shown) SAR-lag bias

Spatial structure effect

Conclusions from simulationsn (partial) The inadequate use of a SAR-lag model can generate a considerable downward bias in parameter estimates. The meaningfulness of such model for regression analysis of agronomic data as in the example above may be questionable (i.e. there is no direct “influence of neighbors yield on yield at point i”).

Spatial Mixed and SAR-error resulted in similar outcomes when the latter was based on a “Queen” neighbors matrix. The use of other matrices proved inefficient (not shown), while the results for Spatial Mixed were consistent even when the covariance model used was incorrect (e.g. exponential instead of spherical).

While all “spatial” methods showed a markedly lower dispersion than OLS, NN was clearly less efficient than Spatial Mixed and SAR-Error. An iterative version of NN was not evaluated and might prove more efficient than the simple version used here.