Data, Economics and Computational Agricultural Science John M. Antle Professor of Applied Economics Oregon State University AAEA Fellows Address, August 7 2018, Wash DC Presentation and paper available at agsci.oregonstate.edu/tradeoff-analysis-project/applications-library
Motivation and Objectives The scientific community recognizes the need to transcend the reductionist paradigm in science in order to understand and predict the behavior of complex systems that cannot be subjected to controlled experimentation, but can be modeled and studied using observational data and simulation experiments (NAS, Science Breakthroughs to Advance Food and Agricultural Research by 2030) Advances in disciplinary science, as well as trans-disciplinary integration, are needed to understand and predict system behavior. Data and models are needed that can predict the performance of agricultural systems under current conditions, but more importantly, under novel conditions that cannot be observed in historical data Meeting this challenge raises fundamental methodological issues for all sciences The challenge is particularly daunting for economics and related disciplines – typically involving human behavior – that have favored statistical models estimated with historical data over mechanistic, process-based models My goal is to discuss how advances in computational methods and data infrastructure can accelerate progress in agricultural science, and the role that applied economics can play in this process
Themes In this presentation I’ll summarize some of my ideas for each section of the paper: Towards computational agricultural science Economic analysis of agricultural systems Building a new data infrastructure The case for public investment in data infrastructure and computational science
Agricultural Model Inter-comparison and Improvement Project (AgMIP Agricultural Model Inter-comparison and Improvement Project (AgMIP.org): a new global community of science AgMIP NextGen Project: bridging the gap between data, models and users
AgMIP NextGen Project Computational agricultural science can accelerate innovation & improve decision making Data the most important limitation to model improvement & use Knowledge products needed to connect end-users (in science and in decision making) with data and models Private-public partnerships needed to support pre-competitive and competitive science, data, model, and knowledge-product development
Towards computational agricultural science Goal: overcoming the limitations of field experiments slow, expensive low dimensionality limited heterogeneity lack of external validity Identifying & estimating technologies (production functions) Empirical vs mechanistic, process-based Crop simulation models: bio-engineered production functions Many current limitations, but rapidly being improved with better science, data and methods
Advances in ag systems modeling…some examples Improved modeling of temperature response through model inter-comparisons at global experimental sites (Hwang et al. Nature Plants 2016) better data and methods are substantially improving model performance “… variations in the mathematical functions currently used to simulate temperature responses of physiological processes in 29 wheat models account for >50% of uncertainty in simulated grain yields … a set of new temperature response functions … reduced the error in grain yield simulations across seven global sites with different temperature regimes by 19% to 50% (42% average).”
Advances … Improved yield prediction from model ensembles without site-specific calibration (Martre et al. 2015 Glob Ch. Biol. and others by AgMIP) ensemble modeling methods can improve prediction accuracy for on-farm management as well as landscape-scale analysis Gene-based crop growth models have potential to model “virtual crops” that can incorporate G x E x M and predict “out of sample” better than statistical models (Cooper et al. Crop Science 2016; Hwang et al. Ag Systems 2017)
Economic analysis of agricultural systems Evaluation paradigm for impact of “interventions” P1: implemented interventions in the environment where they are observed (the problem of internal validity in ex post evaluation) P2: implemented interventions in a different but observable environment (the problem of external validity in ex post evaluation) P3: evaluation of new interventions in environments never historically observed (the ex ante evaluation problem). Heckman (2010) P2 and P3 require models satisfying “Marshak’s Maxim”: minimally sufficient structure needed to identify the impact of the intervention “P3 is the problem that economic policy analysts have to solve daily. Structural econometrics addresses this problem. The program evaluation approach does not.” (“program evaluation approach” = estimation of treatment effects without specification of a structural model based on economic theory)
Implications for economic analysis of ag systems Key methodological challenges linked to observability of key phenomena The identification problem(s) Unobserved heterogeneity Prediction of system performance with new technologies in new environments
Identification Problem in non-experimental data: Heckman’s argument for structural models to solve P3 requires strong assumptions of parameter invariance not valid for new technologies Even if “Marshak’s Maxim” satisfied, economic behavior often leads to failed “identification in the data”, i.e., failure of common support condition required for identification of counterfactuals Solutions: combine mechanistic models with better data & statistical models to identify structure of counterfactuals
Example: Identification problem due to lack of common support in non-experimental data
Unobserved heterogeneity In ag systems, many elements of “unobserved heterogeneity” are not time invariant fixed-effects estimators do not solve bias problems E.g., planting date, soil moisture at planting time that determine crop variety and fertilizer use Comparison of “true” production function to “empirical” production functions shows that bias also due to inaccurate and incomplete data E.g., lack of accurate data on most management inputs and cost of production, timing of input use
Prediction of system behavior with new technologies in new environments Two key elements: Prediction of exogenous variables “out of sample” Use participatory scenario methods Representation of new technologies Use “hybrid structural models” that satisfy Marshak’s Maxim and overcome counterfactual identification problem Use better observational data that overcome bias problems from unobserved heterogeneity and incomplete data In the paper I discuss methods for combining mechanistic crop simulation models with statistical production function models
Hybrid structural model test using CropSyst and TOA-MD models in Pacific Northwest Dryland Winter Wheat System System 1: winter-wheat fallow in WWF zone System 2: annual cropping system in WWF zone Annual system in WWF zone: observed adoption rate 23% predicted adoption rate 20%
Building a new data infrastructure Better computational models – both mechanistic and statistical – depend on better data. Prototype data & analytics to support computational ag science for private and public decision making Data “market” failure Data ownership Voluntary vs mandatory Soft & hard infrastructure Capalbo, Antle and Seavert Ag Systems 2017
Current Situation Much hype, expectations of potential for hard and soft infrastructure, big data, AI, and their use in ag & food systems Private, public data not FAIR (findable, accessible, interoperable, reusable) Data “market failure”: property rights not defined; public or club goods? ? Profitable? Sustainable? At what scale?
Making the case for high returns to public investment in better data & computational ag science… Private Data Private Decision Makers Data and Model Development (pre-competitive space) Knowledge Product Development (competitive space) Public Data Public Decision Makers Opportunities for PPPs: AgMIP community of science partnering with industry USDA FACT (Food and Ag Cyberinformatics and Tools) CGIAR Big Data Initiative, GODAN, etc University-industry collaboration Antle, Jones & Rosenzweig, 2017 Ag Systems
Presentation and paper available at agsci.oregonstate.edu/tradeoff-analysis-project/applications-library
Themes Towards computational agricultural science Computational experiments replacing field experiments Ag systems models: bio-phys-engineered production functions Advances in data and modeling Economic analysis of agricultural systems Economic impact evaluation paradigm Implications for agricultural system modeling Identification Unobserved heterogeneity Evaluation of novel systems: hybrid structural models Building a new data infrastructure A prototype private-public data system The current state of private and public agricultural data The economics of data and data infrastructure The need for collaboration with data, engineering and computer sciences The need for institutional innovation The rise of the robot econometricians The case for public investment in data infrastructure and computational science