Download presentation
Presentation is loading. Please wait.
Published byDerrick Ray Modified over 8 years ago
1
Semi-mechanistic modelling in Nonlinear Regression: a case study by Katarina Domijan 1, Murray Jorgensen 2 and Jeff Reid 3 1 AgResearch Ruakura 2 University of Waikato 3 Crop and Food Research
2
Introduction Reid (2002) developed a crop response model genetic algorithm no measures of confidence for individual parameter estimates
3
General structure non-linear model semi-mechanistic parts of the model that would ideally be mechanistic are replaced by empirically estimated functions relatively complex (26 parameters to be estimated) generality of application challenging to fit
4
General structure water stress, plant density, quantity of light etc yield under ideal nutrient/pH conditions (maximum yield) nutrient supply (N, P, K, Mg) observed yield These effects are assumed to act multiplicatively and independently of each other and the nutrient effects
5
General structure – maize data Crop hybrid
6
General structure - nutrients Model structure is the same for all nutrients Each nutrient supply is assumed to have: a minimum value (the crop yield is zero) and an optimal value (further increases cause no additional yield) Reid (2002) defines a scaled nutrient supply index which is: = 0 at the minimum nutrient value and = 1 at the optimal nutrient value In soil added as fertilizer nutrient supply efficiency factor 1 efficiency factor 2 Proportion of the optimum amount of nutrient supply
7
General structure - nutrients For each nutrient, the effect of scaled nutrient supply index (x) on yield is modelled using the family of curves: N opt where: γ =shape parameter
8
General structure - combining nutrients The combined scaled yield is given by: (or 0 if this is negative). Note: are scaled yields corrected for unavailability of the respective nutrients Nutrient stresses are assumed to affect yield independently of each other Soil pH Treated as if it were an extra nutrient Only stress due to low pH is modelled and not stress due to excessive pH if the effects of a particular stressor are known to be absent for a set of data, then
9
just 2 nutrients (N and K) Scaled yield General structure - combining nutrients
10
Data model was tested for maize crops grown in the North Island between 1996 and 1999 data was collated from 3 different sources of measurements experimental and commercial crops 12 sites 6 hybrids 84 observations
11
Genetic Algorithms stochastic optimization tools that work on “Darwinian” models of population biology don’t need requirement of differentiability! relatively robust to local minima/maxima don’t need initial values have no indication of how well the algorithm has performed convergence to a global optimum in a fixed number of generations? slow to move from an arbitrary point in the neighbourhood of the global optimum to the optimum point itself no measure of confidence for individual parameters
12
Our approaches: simplifying the model: 1 nutrient (N) 9 parameters simulated data combining GA with derivative based methods: common methods (Gauss-Newton, Levenberg- Marquardt) AD MODEL BUILDER obtain CI’s: gradient information likelihood methods
13
Correlation of Parameter Estimates: g Nm Np d b e1 e2 E.1 Nmin B 1 Nopt.. 1 delta 1 beta, 1 eta1. 1 eta2 1 E.n1, 1 E.n2, + Simulated data - simple model investigate the structure of the correlation matrix generated so it mimics the “real” data as much as possible large n (300), small residual variance (0.01) N min and γ N are highly correlated! (blank) 0-0.3.0.3-0.6,0.6-0.8 +0.8-0.9 *0.9-0.95 B0.95-1 Key:
14
Complete model n=50000, σ 2 =0.0001 Key: (blank) 0-0.3.0.3-0.6,0.6-0.8 +0.8-0.9 *0.9-0.95 B0.95-1
15
N min Levenberg-Marquardt algorithm Maize data use GA estimates as starting values simple model: multicollinearity parameter Nmin tends to –ve reparametrization
16
complete model: (again) biological restrictions (N min, K min =0) problems with equations which are constant for ranges of values (eg scaled yields) replace nondiffentiable functions (pH, water stress) some stressors held constant (P, Mg) N opt constant Levenberg-Marquardt algorithm
17
2 nutrients (N and K) + stress due to low pH + water and population stresses 12 parameters
18
Profile likelihood CI’s N opt estimate Wald CI Likelihood CI Approach outlined in Bates and Watts (1989) Assess validity of the linear approximation to the expectation surface
19
Profile likelihood CI’s Estimation surface seems to be nonlinear with respect to most of the parameters in the model Especially E N1 and pH c -> one sided CIs Better estimates of uncertainty than linear approx. results
20
AD Model builder automatic differentiation faster observed information matrix (better se’s) we run into the same problems as with L-M requires model to be differentiable good initial values
21
In the end... CI’s are too wide to be of ‘practical’ use e.g. for parameter N opt (optimum amt of N supply per tonne of maximum yield) : but in the ‘maize dataset’ N supply per tonne of maximum yield varies between 6 and 54 Problems of nonidentifiability correlated estimates poor precision of estimation in certain directions These phenomena are not clearly distinguished in nonlinear setting L-M and ADMB estimate 95% LINEAR APPROX. CI’s 95% PROFILE LIKELIHOOD CI’s 95% CI’s (ADMB) 18.575(3.80, 33.35)(11.67, 31.45)(11.32, 25.83)
22
Recommendations do more experimentation - collect more information about parameters particularly ‘approximately nonidentifiable’ parameters replace all nondifferentiable equations in the model with smooth versions bootstrapping global optimum?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.