MIRACLE Cloud-based reproducible data analysis and visualization for outputs of agent-based models Xiongbing Jin, Kirsten Robinson, Allen Lee, Gary Polhill,

MIRACLE Cloud-based reproducible data analysis and visualization for outputs of agent-based models Xiongbing Jin, Kirsten Robinson, Allen Lee, Gary Polhill, Calvin Pritchard and Dawn Parker

Background Model communication and transparency important for the ABM community Progress in the areas of documenting, validating, verifying and archiving The ODD protocol The CoMSES OpenABM library Effective communication requires better user interface and content Full transparency requires reproducibility

Reproducibility Reproducible ABM study means complete access to:
Setup and running others’ models are generally difficult and can be impractical due to computing hardware and software requirement Archiving output data and analysis code can provide a shortcut to solving the reproducibility and communication problems Reproducing analysis results without re-running models Basis for interactive data analysis and visualization => communication tools ABM-specific analysis methods for use in the ABM community Assumptions Input Model Output Analysis Results

MIRACLE Cloud-based reproducible data analysis and visualization
ABM output data storage Metadata archiving Data exploration, analysis and visualization inside web browsers No special hardware and software requirement No need for programming and modelling background Prototype Support R scripts (R is the most popular open source data analysis software in scientific research (Muenchen, 2016)) Support CSV data files (CSV is the most popular output format for social simulation models (survey at ESSA 2014 by Polhill))

Infrastructure

Data model (Polhill et al. unpublished)
(1) (2) Coarse scale Fine scale Project Provenance Workflow (3) (4) (5)

Made for ABM and Open Science
Automatic categorization of data files from parameter sweeps into data groups Automatic metadata extraction Automatic transformation of R scripts into web- based interactive parameter space exploration tools Preservation (and restoration in the cloud) of exact R dependencies for full reproducibility The MIRACLE platform itself is also fully and easily reproducible by employing Docker and Docker Compose MIRACLE is open source software

CASE STUDY The LUXE model (Huang et al., 2013, Sun et al., 2014)
(Land Use in eXurban Environment)

The LUXE model Agent-based land market model that simulates residential location choices under heterogeneity and market levels Location choices influenced by Preference heterogeneity: different preferences for open space amenity versus CBD proximity (5x levels of heterogeneity) Income heterogeneity: different budget level (5x levels of heterogeneity) 4x market levels

LUXE output data Parameter space:
Huang et al., 2013: 5x5x4x40 (40 repeated runs with different random seeds) = 4000 data files Sun et al., 2014: 5x3x3x4x50 (50 repeated runs) = data files Only part of the parameter space explored and a very small subset presented in the publications File format: CSV Analysis scripts: R Visualization: R

LUXE + MIRACLE Storage for all output data and scripts
Archiving of metadata for data files and scripts Web-based data exploration Web-based data analysis and visualization Collaboration within research group Reproducibility for external researchers Communication of model results to stakeholders Stakeholders participation Publication review

Data/metadata management

Data exploration

Data analysis – the hard way
Get the R script and all data Try to figure out input variables and what each data column means Find the correct versions of R Find the correct versions of all dependencies Manually adjusting parameters in code (error-prone) Record your run in a notebook (on paper or digitally) Share results with by manual copy/paste of output and graphics Another person repeats the workflow..

Data analysis – MIRACLE

Your project + MIRACLE Simple steps to make your project MIRACLE- compatible Put data and scripts into corresponding folders Declare package dependency, input parameters and input file location using “deployrUtils” Record package dependency versions using “packrat” (a built-in function of RStudio) Step-by-step instructions available online

WHAT’S NEXT

The next steps Support more data analysis languages such as Python (with Jupyter) Support complete provenance tracking and workflow management (with Luigi) Support large amount of projects and users, fully utilizing the power of the cloud (with Docker) Support external data archive such as Dataverse, CKAN and figshare Support utilization of HPC infrastructure …

The MIRACLE of ABM in the cloud
The ultimate goal: cloud-based solution for the entire workflow of ABM research Model creation Model execution Output data analysis and visualization Publication review and dissemination The ASU NSF Big Data Spoke Grant

MIRACLE + you Try our software and send feedback
Contribute to our open source development Attend our upcoming workshop!

Acknowledgements The Digging into Data competition and funding agencies (SSHRC, NSF, JISC, ESRC) Waterloo Institute for Complexity and Innovation The CoMSES network Workshop participants (IEMSS 2014, ESSA 2015, IEMSS 2016) Sharcnet and Compute Canada Cloud

MIRACLE Cloud-based reproducible data analysis and visualization for outputs of agent-based models Xiongbing Jin, Kirsten Robinson, Allen Lee, Gary Polhill,

Similar presentations

Presentation on theme: "MIRACLE Cloud-based reproducible data analysis and visualization for outputs of agent-based models Xiongbing Jin, Kirsten Robinson, Allen Lee, Gary Polhill,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

MIRACLE Cloud-based reproducible data analysis and visualization for outputs of agent-based models Xiongbing Jin, Kirsten Robinson, Allen Lee, Gary Polhill,

Similar presentations

Presentation on theme: "MIRACLE Cloud-based reproducible data analysis and visualization for outputs of agent-based models Xiongbing Jin, Kirsten Robinson, Allen Lee, Gary Polhill,"— Presentation transcript:

Similar presentations

About project

Feedback