How long do we need to run an experiment? Ignacio Colonna & Don Bullock.

How long do we need to run an experiment? Ignacio Colonna & Don Bullock

Grain yield maps show a considerable variability across years Yield map Algorithm NF = 21.4 kg N / MT grain * YG – Rotation Credit – Incidental N

 But is the variability in yield that matters?  It is the variability in response to inputs that matters.  Assuming a profit-maximizing farmer… Yield response function

Yield response and profit functions (only due to N)

Profit N = Yield N * $ Corn – kg N * $ N

 Statistics vs. management: Responses may be significantly different statistically, yet yield similar management decisions. Different curves, same optimum Similar curves, different optima

 But these are ‘after the fact’ optimum N rates…  Farmers’ decisions usually based on the best a priori guess  Concept of ex-post vs ex-ante optima Ex-post optimum: computed after collecting the data Ex-ante optimum: best guess given the information available before the fact. (“long-run” optimum) e.g. we have 15 years of N response data for a site:

 For this study Question 1: How does the uncertainty about the ‘true’ N rate at a given site change with years of experimentation? No published estimates on uncertainty in ex- ante N rates as a function of experiment length in the US Midwest.

Question 2: What is the cost of not knowing the true N rate at a given site? No published estimates on practical consequences of different lengths of experimentation on fertilizer application decisions.

Data : source N fertilizer trial at Monmouth, IL (conducted by Nafziger,Adee,Hoeft,Mainz)

Data : experimental design 21 years : 1983-2003 Split plot in RCBD, 3 reps. 2 rotations: C/C and C/S 5 fertilizer rates: 0, 67, 134, 201, 269 kg/ha (pre-plant) Individual plots 6.1 m 18 m

21 years x 2 Rotations: Raw Yield Means C/C C/S (

21 years x 2 Rotations: Model fits Yield response (ton/ha) C/C C/S

21 years x 2 Rotations: Variability in ex-post N opt ‘True’ ex-ante N opt =173 kg/ha ‘True’ ex-ante N opt =110 kg/ha

Pick two years at random (#1) Compute ex-ante optimum N rate (#1) Pick two years at random (#2) Compute ex-ante optimum N rate (#2). Pick two years at random (#3) 1000 samples=1000 estimates of ex-ante N rate Repeat for groups of 3 years,4 years,…etc.  A look at uncertainty in ex-ante N optima

 Results from resampling approach: distributions Ex-ante optimum N (kg/ha)

 Results from resampling approach: SD and CV Years of experimentation Ex-ante N opt std.deviation Ex-ante N opt CV C/C C/S

 Results from resampling approach: Practical implications C/C Error (+ or -)

 Results from resampling approach: Practical implications C/C Profit at ‘true’ ex- ante N opt = 249 $/ha Loss relative to maximum

 Conclusions so far : Relatively small effect in monetary terms (~very small for >4 years, e.g. at 4 years 10$/ha But, how do these errors compare with within- field spatial variability in N opt ? Is this of use to conventional systems?

Regression of Crop Yield with Soil and Landscape Attributes: An Assessment of Some Common Methods for Dealing with Spatially Correlated Residuals

Spatial correlation of residuals in regressions are often overlooked in agronomic and engineering research, especially so in analyses related to precision agriculture, with a few exceptions. We argue that this oversight is not trivial and neither is the choice for its solution.

Field Experiment

Soybean yield monitor data (2 years) 19992001

Soil sample data (P and K)

Elevation data and derivatives (Slope, Aspect, etc.)

20 m grid

OLS (Ordinary least squares) if errors are as assumed, but often residuals do show spatial correlation due to variables not included in the model Spatial Mixed o Errors not assumed independent. o Σ estimated with geostatistical models. o Parameters for  est by ML or REML Example of code in SAS® Proc Mixed *Iterative. Initial values obtained from inspection of variogram of OLS residuals parms/*sill*/(600)/*range*/(90)/ *nugget*/(650); repeated /subject=intercept local type=sp(sph)(x y); GLS: Generalized Least Squares estimator

Semivariograms of OLS residuals

Spatial Mixed Errors not assumed independent. Σ estimated with geostatistical models. Parameters for  est by ML or REML GLS: Generalized Least Squares estimator

Nearest Neighbors (non-iterative version – computations are simple) Average of neighboring OLS residuals Computation: Compute OLS regression Y=X  +  and save residuals (  ). Compute average of neighboring residuals for each point (W  ). Compute new OLS regression but using W  from 2 as a covariate in: Y=X  +  W  + 

Spatially autoregressive approaches SAR error - the effect of the observed OLS residuals is due to the omission of spatially structured explanatory variables in the X matrix. SAR lag - value of response variable is in part due to a contagion or diffusion from the same variable at nearby locations or there is a mismatch between the scale at the a variable is measured and the true scale of the process. Decide upon model based on substantive interpretation and Lagrange Multiplier specification tests (Anselin).

“Queen Structure” for W Yellow: neighbors = 1 Blue: not neighbors = 0 Red: Point i

SAR-Error (Spatial Autoregressive – Error) Average of neighboring OLS residuals

SAR-Lag (Spatial Autoregressive – Lag) Average of neighboring values for Y “Direct effect of neighbors on point i”

Flat line→spatially uncorrelated residuals. All methods seem to achieve similar results in terms of residual spatial structure. Points shifted vertically to aid visualization.

Shaded values are significantly different from OLS estimates

Regression example - Conclusions Spatial Mixed, SAR-error and SAR-lag parameter estimates showed significant differences to those from OLS only for the year with the largest spatial structure. Parameter estimates from NN where not significantly different from OLS ones, despite the apparent difference in magnitude.

Estimates from SAR-lag were in general smaller in magnitude relative to all other methods. This is due to the “filtering” performed by this method on the response variable. Is this reasonable for this type of analysis? We believe it is not.

So, which method should we choose to account for the spatial correlation of residuals in regression? This question motivates the second part of our analysis.

Simulation Experiment

3 independent variables: x 1,x 2 and e with short and long range error structures:

Random values for each variable generated at 4 densities in a 400 m x 400 m field. Values generated using Sim2d in SAS®. Based on LU decomposition of the covariance matrix. Spatial structure based on a spherical model. 1000 realizations for each variable-density-error structure combination (e.g. e-440-short range)

Generate dependent variable Y: Y short =10+0.6 x 1 +1.2 x 2 +e short Y long =10+0.6 x 1 +1.2 x 2 +e short Adjusted theoretical R 2 =0.37 + + = x1x1 x2x2 e y 1000 X

Regression model: Y=b 0 +b 1 x 1 +b 2 x 2 Parameters Estimated by OLS Spatial Mixed SAR-error SAR-lag Nearest neighbors  Methodology: Analysis of simulated data

Higher point densities: Dispersion: OLS and NN show a considerably higher dispersion than Spatial Mixed and SAR methods. Bias: SAR-lag shows a marked downward bias at high densities, resulting in an underestimation of the true effect of x 1. Lower point densities: Neither dispersion nor bias differ among methods for a short correlation range. For a larger correlation range in the residuals, dispersion for OLS and bias for SAR-lag are still important at lower densities. Results are similar for  2 (not shown) SAR-lag bias

Spatial structure effect

Conclusions from simulationsn (partial) The inadequate use of a SAR-lag model can generate a considerable downward bias in parameter estimates. The meaningfulness of such model for regression analysis of agronomic data as in the example above may be questionable (i.e. there is no direct “influence of neighbors yield on yield at point i”).

Spatial Mixed and SAR-error resulted in similar outcomes when the latter was based on a “Queen” neighbors matrix. The use of other matrices proved inefficient (not shown), while the results for Spatial Mixed were consistent even when the covariance model used was incorrect (e.g. exponential instead of spherical).

While all “spatial” methods showed a markedly lower dispersion than OLS, NN was clearly less efficient than Spatial Mixed and SAR-Error. An iterative version of NN was not evaluated and might prove more efficient than the simple version used here.

How long do we need to run an experiment? Ignacio Colonna & Don Bullock.

Similar presentations

Presentation on theme: "How long do we need to run an experiment? Ignacio Colonna & Don Bullock."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

How long do we need to run an experiment? Ignacio Colonna & Don Bullock.

Similar presentations

Presentation on theme: "How long do we need to run an experiment? Ignacio Colonna & Don Bullock."— Presentation transcript:

Similar presentations

About project

Feedback