SIMO SIMulation and Optimization ”New generation forest planning system” Antti Mäkinen & Jussi Rasinmäki Dept. of Forest Resource Management / University of Helsinki
Need for new planning system? Forestry databases maintained in Finland mostly collected either with National Forest Inventory compartment-wise inventory Current forest management planning packages are designed to suit these two data sets Stand volume, timber assortments and growth are predicted using single tree models In the future the data sources are much more variable, including data from several grain levels → Need for new planning tools
Aims of SIMO ”FLEXIBLE with respect to data it demands and models that it uses” ”ADAPTABLE to the planning problem” “EXTENDABLE for possible future needs” “provide decision support for different grain levels”
Aims of SIMO continued To produce a set of lightweight program modules, which the users can interface into their planning systems Modules should be adaptable to different planning tasks and needs Simulation parameters can be (easily) controlled by the user Simulations can be locally calibrated Uncertainty in the data can be taken into account
Aims of SIMO continued New models can be implemented by the user Possibility to utilize different data sources Planning system is not a ”black box” – user should know what happens inside the program Inconsistent or erroneus data can be handled, but the user will be informed precisely
SIMO software development The core elements of the planning system will be produced in this project (no user interfaces) All the products will be open-source code (MIT license) The users will handle the integration into their own systems At the moment the simulator module is well underway and other modules under construction
SIMO Simulator Simulator module consists of: simulator core (the simulator program) XML files (data, simulation control, simulation logic etc.) Model library (all the models, functions etc. used in the simulations) ”Business logic” and ”Application logic” separated Other modules will be produced and integrated into the system
Simulator core Intakes simulation control instructions, model chains, model definitions and data in XML format Processes the user defined model chains for each computing unit in the data Calls the Model Library whenever some value needs to be calculated Prints the resulting values into a result XML file The programmatic component, contains the application logic
XML Files (eXtended Markup Language) ”Syntax definition which can be used to express hierarchical data” Platform independent, self-describing In SIMO, XML is used for the ”business logic”: Passing data in and out of simulator (Data XML) Passing simulation instructions (Simulation XML) Encoding the simulation logic (Model Chain XML) Passing information of the models and variables to the simulator (Model XML and Variable XML)
data comp_unit value variable attr … … value variable attr stratum value variable attr … value variable attr … stratum tree … value variable attr …
model_chain evaluate_at task expression condition … model variable value expression condition task variable value … task expression condition model variable value
Model Library Includes all models used in the simulator Users can add new models to the library or create additional model libraries Reports warnings and errors to the simulator Risk level models not yet implemented
Missing modules Optimizer module Finds the best alternative from the alternatives generated by the simulator Possibly many alternative optimizing methods? Validator module Validates the XML files with XSD (Schema) files and by external rules Makes sure that the XML files are well-formed and contain all necessary elements
Strengths of SIMO XML Simulator Virtually any kind of model can be used in the simulations and added to the model library User can define the model chains freely for different kinds of simulations User can define correction/rectification factors for the models, (eg. different factors for geographical areas) Extensive warning and error reporting system (risk control coming later...) Data levels are not confined to strict predifined standard
What can be calculated at the moment? Estimating forest variable development at both stand level & tree level is possible at the moment (300+ models implemented), but Forestry operations not yet implemented in the simulator → ”real world” simulations not yet possible Bucking models still not ready Optimizing module still missing
How the simulation process works in SIMO? XML Files SIMULATOR MODEL LIBRARY Reporter Module IN: data, simulation control, modelchains, model definitions OUT: results IN: modelname, input variables OUT: model result, warnings & errors IN: XML data OUT: transformed XML, graphs SIMULATION PROCESS
What is missing? XML Files SIMULATOR MODEL LIBRARY Reporter Module Optimizer Module MODEL LIBRARY Validator Module
Reporting Module Used for visualizing data & transforming the results from XML format to other formats Intakes data and processing instructions in XML format At the moment can plot different kinds of graphs of given variables XML transformations to be implemented later...
Model risk management Two levels 1. Individual parameter values out of bounds 2. All individual parameter values acceptable, but is the specific combination of them acceptable? Case 1: already in the simulator Case 2: Suggestion 1. get the k nearest neighbours from the VMI data, 2. evaluate the model for the data point and the k nearest neighbours. 3. If the difference for the model estimate between the data point and the neighbours is too big, generate an event of ”unacceptable” model estimate
Isn’t that procedure too heavy computationally? Probably, not yet evaluated But what about if we store the risk evaluation results and use those primarily: 1. Is it safe to call ModelA with parameters (5, 6, 10) when we accept risk level X? 2. Has the risk been evaluated with parameter values (5,6,10) and risk level X before. If yes, get the answer from a table of risk evaluations 3. If not, get k nearest neighbours for data point (5,6,10), evaluate the model with (5,6,10) and k neighbours 4. Store the risk evaluation result and the mean model result for k neighbours for the data point (5,6,10) and risk level X
Open questions: When evaluating model result shall we compare it to: values derived directly from the nearest VMI permanent sample plots OR model estimates for the nearest VMI sample plots?
Software license for SIMO Types of Open Source licenses MIT & Co: “Do whatever you want” LGPL: “Everything you do to the original code must be open source, anything on top of that can be closed” GPL & Co: “Everything you do is open source, …well almost” GPL under the hood: "derivative work" or "mere aggregation“? Derivative work must be open source, but aggregation can be closed source
The case of MySQL Double licensing: open source GPL, commercial development with a commercial license that allows closed source
General software architecture Individual components that communicate over the network Validator Simulator – this is well underway Optimiser Reporter – simulation results to figures and other data formats than XML, or different XML format etc. Implications to licensing? What about if one of the components uses a sub component that is published under GPL?
Architecture continued TCP/IP based communication Security issues? secured traffic (SSL, SSH) inside firewall Scalable