Computational Science as an enabler for sustainable FEW Systems Baskar Ganapathysubramanian Iowa State University NSF FEW Workshop: Oct 12-13, 2015, ISU 1
Computational Science and Engineering Group What do we do: 1)Algorithm design and software implementation 2)Application driven research: Curiosity driven group Overview of research activities related to Plant Sciences NSF FEW Workshop: Oct 12-13, 2015, ISU 2
Feature extraction: Data for crop models Spatial coverage (Dimensions of field) Temporal Coverage (Crop Cycle) Data for validation/input/calibration Data deluge due to sensor advances and data collection improvements Heterogeneous, multi length and time scale data Noisy, gappy data Need to extract traits used for various ‘down stream’ tasks Have to do this in an automated, high throughput, and efficient way Similar issues faced by other disciplines: Astronomy, Particle physics, Driverless automobiles, security and defense applications Machine learning approaches very promising NSF FEW Workshop: Oct 12-13, 2015, ISU 3
Machine Learning Goal of ML is to generalize beyond training data Pattern recognition, perception and control tasks Very difficult to manually encode all features From opsrules.com MNIST dataset TIMIT dataset Breakthrough in learning algorithms. Prominent examples include ‘deep networks’ NVIDIA cuDNN website More data, Better computing infrastructure NSF FEW Workshop: Oct 12-13, 2015, ISU 4
Learning feature labels in scenes: Convolution networks From Le Cun group, Hinton group, Ng group Machine Learning Examples NSF FEW Workshop: Oct 12-13, 2015, ISU 5
From Le Cun group, Hinton group, Ng group Machine Learning Examples Learning a hierarchy of features: Feature extractions using auto-encoders, sparse encoders, Deep Belief networks, Deep Neural Networks NSF FEW Workshop: Oct 12-13, 2015, ISU 6
Basic hypothesis: Use high throughput phenotyping to enable extraction of detailed characteristics of tassels. Challenges: Identification of tassel locations, followed by extraction of tassel features of close to a million images! ML: Agricultural Examples P. Schnable
Basic hypothesis: Use high throughput phenotyping to understand features affecting (a)biotic stress tolerance A. Singh Standard Area Diagram Example Application: Iron Deficiency Chrolosis (IDC) IDC: Inability of plants to absorb iron from soil Current Methods are Visual: -Time consuming -Labor Intensive -Reliability/Consistency issues ML tools for rapid identification. Deploy as apps ML: Agricultural Examples S. Sarkar
ML for Yield Prediction Goal: 1) Collect and curate dataset of economic, agricultural, meteorological, and crop management traits that is used to make predictions. 2) Develop and deploy suite of statistical and ML tools on data 3) Create a workflow that will enable the larger community to utilize data and test methods Yield forecasting: Combination of knowledge-based computer programs (that simulate plant-weather-soil-management interactions) along with soil and environment data and targeted surveys. D. Hayes Companies such as Climate Corp and other big data firms may now be able to beat the USDA at yield forecasting, leading to detrimental asymmetric markets. A publicly available high quality yield prediction tool will enable the producers to make informed decisions thereby ensuring a symmetrical market. S. Sarkar D. Nettleton NSF FEW Workshop: Oct 12-13, 2015, ISU 9
D. Attinger M. Gilbert Simple physiological model of adult maize plant. Validated in field by Matthew Gilbert (UC Davis) Several field-testable traits: stomatal conductance, root, stem, leaf conductance. Input: Hourly weather data. Outputs: Water use, Photosynthetic yield Optimization: Trait identification for productivity Software engineering Code optimization Integrate with parallel optimization framework Deploy on HPC systems
Optimization: Trait identification for productivity Pareto front with more than 3 million configurations tested. Ran on XSEDE TACC and local HPC resources (unpublished, 2015). Explored traits that perform under well irrigated vs drought conditions. NSF FEW Workshop: Oct 12-13, 2015, ISU 11
Concluding Observations 1)Leverage (rapid) machine learning developments 2)Learn from progress/best practices in other fields 3)Fast ML models as surrogate models for exploration, uncertainty quantification 4)Visualization and data management become important 5)Data exchange/sharing/interoperability protocols have to be set. 6)Critical to incorporate software engineering practices into the workflow (code reuse, modularity). 7)Need sustained support for software development and maintenance 8)Need to be ready for next generation cyber infrastructure 9)Community based approach?