MIRACLE Cloud-based reproducible data analysis and visualization for outputs of agent-based models Xiongbing Jin, Kirsten Robinson, Allen Lee, Gary Polhill,

Slides:



Advertisements
Similar presentations
Polska Infrastruktura Informatycznego Wspomagania Nauki w Europejskiej Przestrzeni Badawczej Institute of Computer Science AGH ACC Cyfronet AGH The PL-Grid.
Advertisements

Supporting Simulations on the Cloud using Workflows & Virtual Machines Gary Polhill Macaulay Land Use Research Institute Edoardo Pignotti Computing Science,
Archiving research data in the cloud or in a local repository Michele Kimpton, CEO DuraSpace CNI Dec 2014.
ARCS Data Analysis Software An overview of the ARCS software management plan Michael Aivazis California Institute of Technology ARCS Baseline Review March.
Open Statistics: Envisioning a Statistical Knowledge Network Ben Shneiderman Founding Director ( ), Human-Computer Interaction.
© , Michael Aivazis DANSE Software Issues Michael Aivazis California Institute of Technology DANSE Software Workshop September 3-8, 2003.
US GPO AIP Independence Test CS 496A – Senior Design Team members: Antonio Castillo, Johnny Ng, Aram Weintraub, Tin-Shuk Wong Faculty advisor: Dr. Russ.
© InLoox ® InLoox PM Web App product presentation The Online Project Software.
Virtual Geophysics Laboratory (VGL) VGL v1.2 NeCTAR Project Close R.Fraser, T.Rankine, J.Vote, L.Wyborn, B.Evans, R.Woodcock, C.Kemp July 2013 CSIRO |
A Semantic Workflow Mechanism to Realise Experimental Goals and Constraints Edoardo Pignotti, Peter Edwards, Alun Preece, Nick Gotts and Gary Polhill School.
Annual SERC Research Review - Student Presentation, October 5-6, Extending Model Based System Engineering to Utilize 3D Virtual Environments Peter.
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space The Capabilities of the GridSpace2 Experiment.
Selecting and Combining Tools F. Duveau 02/03/12 F. Duveau 02/03/12 Chapter 14.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
Metadata Creation with the Earth System Modeling Framework Ryan O’Kuinghttons – NESII/CIRES/NOAA Kathy Saint – NESII/CSG July 22, 2014.
Virtual Logbooks and Collaboration in Science and Software Development Dimitri Bourilkov, Vaibhav Khandelwal, Archis Kulkarni, Sanket Totala University.
Authors Project Database Handler The project database handler dbCCP4i is a small server program that handles interactions between the job database and.
A framework to support collaborative Velo: Knowledge Management for Collaborative (Science | Biology) Projects A framework to support collaborative 1.
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
K.Furukawa, Nov Database and Simulation Codes 1 Simple thoughts Around Information Repository and Around Simulation Codes K. Furukawa, KEK Nov.
GAAIN Virtual Appliances: Virtual Machine Technology for Scientific Data Analysis Arihant Patawari USC Stevens Neuroimaging and Informatics Institute July.
Presentation Name / 1 Visual C++ Builds and External Dependencies NAME.
Nature Reviews/2012. Next-Generation Sequencing (NGS): Data Generation NGS will generate more broadly applicable data for various novel functional assays.
NEES Cyberinfrastructure Center at the San Diego Supercomputer Center, UCSD George E. Brown, Jr. Network for Earthquake Engineering Simulation NEES TeraGrid.
Project Database Handler The Project Database Handler is a brokering application that mediates interactions between the project database and the external.
Data Science Background and Course Software setup Week 1.
Application Software System Software.
Data Organization Quality Assurance and Transformations.
A computer contains two major sets of tools, software and hardware. Software is generally divided into Systems software and Applications software. Systems.
Open Science Framework Jeffrey Spies University of Virginia.
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space The Capabilities of the GridSpace2 Experiment.
Digital Archives You Can Do It! The Collective - March 2016 Paul Kelly - Digital Archivist - The Catholic University of America.
The Virtual Observatory and Ecological Informatics System (VOEIS): Using RESTful architecture and an extensible data model to provide a unique data management.
A SCRIPT FOR ARCHIVING DIGITAL RESEARCH DATA IMPROVING ACCURACY AND EFFICIENCY IN THE DATAVERSE NETWORK ABSTRACT SUMMARY Rachel Carriere, Thu-Mai Christian,
Reproducible computational social science Allen Lee Center for Behavior, Institutions, and the Environment
CS 501: Software Engineering Fall 1999 Lecture 23 Design for Usability I.
Petr Škoda, Jakub Koza Astronomical Institute Academy of Sciences
The MIRACLE project: Cyberinfrastructure for visualizing model outputs
Pasquale Pagano (CNR-ISTI) Project technical director
RDA US Science workshop Arlington VA, Aug 2014 Cees de Laat with many slides from Ed Seidel/Rob Pennington.
Foundations of Data Science
Web Application.
University of Chicago and ANL
ReproZip: Computational Reproducibility With Ease
Pasquale Pagano CNR, Italy
CSCI-235 Micro-Computer Applications
Challenges with Maintaining Legacy Software to Achieve Reproducible Computational Analyses: An Example for Hydrologic Modeling Data Processing Pipelines.
Joseph JaJa, Mike Smorul, and Sangchul Song
J. Michael, M. Shing M. Miklaski, J. Babbitt Naval Postgraduate School
Recap: introduction to e-science
Store, Share, Sync and Collaborate
Enhancing Scholarly Communication with ReproZip
EPANET-MATLAB Toolkit An Open-Source Software for Interfacing EPANET with MATLAB™ Demetrios ELIADES, Marios KYRIAKOU, Stelios VRACHIMIS and Marios POLYCARPOU.
Graduation Project Kick-off presentation - SET
InLoox PM Web App product presentation
Module 01 ETICS Overview ETICS Online Tutorials
RecTech - Associated Recreation Council
Office Edition Overview (Dec. 2018).
Brian Matthews STFC EOSCpilot Brian Matthews STFC
Dtk-tools Benoit Raybaud, Research Software Manager.
Storing and Accessing G-OnRamp’s Assembly Hubs outside of Galaxy
Overview of Workflows: Why Use Them?
Automation of Control System Configuration TAC 18
SharePoint has been a pioneer of collaborative work culture and has been dubbed as one of the most successful products by Microsoft for enterprise businesses.
New Platform to Support Digital Humanities in the Czech Republic
Dataverse for citing and sharing research data
A Survey of Interactive Execution Environments
Web Application Development Using PHP
Presentation transcript:

MIRACLE Cloud-based reproducible data analysis and visualization for outputs of agent-based models Xiongbing Jin, Kirsten Robinson, Allen Lee, Gary Polhill, Calvin Pritchard and Dawn Parker

Background Model communication and transparency important for the ABM community Progress in the areas of documenting, validating, verifying and archiving The ODD protocol The CoMSES OpenABM library Effective communication requires better user interface and content Full transparency requires reproducibility

Reproducibility Reproducible ABM study means complete access to: Setup and running others’ models are generally difficult and can be impractical due to computing hardware and software requirement Archiving output data and analysis code can provide a shortcut to solving the reproducibility and communication problems Reproducing analysis results without re-running models Basis for interactive data analysis and visualization => communication tools ABM-specific analysis methods for use in the ABM community Assumptions Input Model Output Analysis Results

MIRACLE Cloud-based reproducible data analysis and visualization ABM output data storage Metadata archiving Data exploration, analysis and visualization inside web browsers No special hardware and software requirement No need for programming and modelling background Prototype Support R scripts (R is the most popular open source data analysis software in scientific research (Muenchen, 2016)) Support CSV data files (CSV is the most popular output format for social simulation models (survey at ESSA 2014 by Polhill))

Infrastructure

Data model (Polhill et al. unpublished) (1) (2) Coarse scale Fine scale Project Provenance Workflow (3) (4) (5)

Made for ABM and Open Science Automatic categorization of data files from parameter sweeps into data groups Automatic metadata extraction Automatic transformation of R scripts into web- based interactive parameter space exploration tools Preservation (and restoration in the cloud) of exact R dependencies for full reproducibility The MIRACLE platform itself is also fully and easily reproducible by employing Docker and Docker Compose MIRACLE is open source software

CASE STUDY The LUXE model (Huang et al., 2013, Sun et al., 2014) (Land Use in eXurban Environment)

The LUXE model Agent-based land market model that simulates residential location choices under heterogeneity and market levels Location choices influenced by Preference heterogeneity: different preferences for open space amenity versus CBD proximity (5x levels of heterogeneity) Income heterogeneity: different budget level (5x levels of heterogeneity) 4x market levels

LUXE output data Parameter space: Huang et al., 2013: 5x5x4x40 (40 repeated runs with different random seeds) = 4000 data files Sun et al., 2014: 5x3x3x4x50 (50 repeated runs) = 9000 data files Only part of the parameter space explored and a very small subset presented in the publications File format: CSV Analysis scripts: R Visualization: R

LUXE + MIRACLE Storage for all output data and scripts Archiving of metadata for data files and scripts Web-based data exploration Web-based data analysis and visualization Collaboration within research group Reproducibility for external researchers Communication of model results to stakeholders Stakeholders participation Publication review

Data/metadata management

Data exploration

Data analysis – the hard way Get the R script and all data Try to figure out input variables and what each data column means Find the correct versions of R Find the correct versions of all dependencies Manually adjusting parameters in code (error-prone) Record your run in a notebook (on paper or digitally) Share results with email by manual copy/paste of output and graphics Another person repeats the workflow..

Data analysis – MIRACLE

Data analysis – MIRACLE

Data analysis – MIRACLE

Data analysis – MIRACLE

Data analysis – MIRACLE

Your project + MIRACLE Simple steps to make your project MIRACLE- compatible Put data and scripts into corresponding folders Declare package dependency, input parameters and input file location using “deployrUtils” Record package dependency versions using “packrat” (a built-in function of RStudio) Step-by-step instructions available online

WHAT’S NEXT

The next steps Support more data analysis languages such as Python (with Jupyter) Support complete provenance tracking and workflow management (with Luigi) Support large amount of projects and users, fully utilizing the power of the cloud (with Docker) Support external data archive such as Dataverse, CKAN and figshare Support utilization of HPC infrastructure …

The MIRACLE of ABM in the cloud The ultimate goal: cloud-based solution for the entire workflow of ABM research Model creation Model execution Output data analysis and visualization Publication review and dissemination The ASU NSF Big Data Spoke Grant

MIRACLE + you Try our software and send feedback Contribute to our open source development https://github.com/comses/miracle Attend our upcoming workshop! http://www.wici.ca

Acknowledgements The Digging into Data competition and funding agencies (SSHRC, NSF, JISC, ESRC) Waterloo Institute for Complexity and Innovation The CoMSES network Workshop participants (IEMSS 2014, ESSA 2015, IEMSS 2016) Sharcnet and Compute Canada Cloud