The MIRACLE project: Cyberinfrastructure for visualizing model outputs

Slides:



Advertisements
Similar presentations
Obesity e-Lab Enabling obesity research using the Health Surveys for England: The Obesity e-Lab project Dexter Canoy The University of Manchester
Advertisements

This file includes speaker notes that are in the Notes module of PPT (go to View--->Notes Page)
A Stepwise Modeling Approach for Individual Media Semantics Annett Mitschick, Klaus Meißner TU Dresden, Department of Computer Science, Multimedia Technology.
US Army Corps of Engineers BUILDING STRONG ® Creating a Data Dictionary for Your Local Data USACE SDSFIE Training Prerequisites: Preparing Your Local Data.
References: [1] [2] [3] Acknowledgments:
Measuring the Power of Learning.™ California Assessment of Student Performance and Progress (CAASPP) Using Formative Assessment and the Digital Library.
A framework to support collaborative Velo: Knowledge Management for Collaborative (Science | Biology) Projects A framework to support collaborative 1.
Introduction of Geoprocessing Topic 7a 4/10/2007.
Automated Benchmarking Of Local Authority Web Sites Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY UKOLN is supported by:
CONCLUSION & FUTURE WORK Normally, users perform search tasks using multiple applications in concert: a search engine interface presents lists of potentially.
Experts Workshop on the IPT, v. 2, Copenhagen, Denmark The Pathway to the Integrated Publishing Toolkit version 2 Tim Robertson Systems Architect Global.
Large Scale Nuclear Physics Calculations in a Workflow Environment and Data Provenance Capturing Fang Liu and Masha Sosonkina Scalable Computing Lab, USDOE.
Portable Infrastructure for the Metafor Metadata System Charlotte Pascoe 1, Gerry Devine 2 1 NCAS-BADC, 2 NCAS-CMS University of Reading PIMMS provides.
The Semantic Logger: Supporting Service Building from Personal Context Mischa M Tuffield et al. Intelligence, Agents, Multimedia Group University of Southampton.
A Practical Approach to Metadata Management Mark Jessop Prof. Jim Austin University of York.
Cyberinfrastructure to promote Model - Data Integration Robert Cook, Yaxing Wei, and Suresh S. Vannan Oak Ridge National Laboratory Presented at the Model-Data.
Introduction of Geoprocessing Lecture 9. Geoprocessing  Geoprocessing is any GIS operation used to manipulate data. A typical geoprocessing operation.
1 The UNCHIKU System A Platform for Collaborative Learning and Knowledge Development with Online Community Mitsuyuki Inaba College of Policy Science Ritsumeikan.
OSU | PSU | UO The Oregon Spatial Data Library: A Vision for Increased Data Sharing Myrica McCune Institute for Natural Resources Marc Rempel Oregon State.
Working Group 4 Data and metadata lifecycle management  1. Policies and infrastructure for data and metadata changes  2. Supporting file and data formats.
Introduction of Geoprocessing Lecture 9 3/24/2008.
Communities of Practice & L ESSONS L EARNED Budget, Finance, and Award Management Large Facilities Office May 2016 Large Facilities Workshop 2016 S. Dillon.
Funded by GFBio – Education module Integrate Lesson in Data Integration.
Reproducible computational social science Allen Lee Center for Behavior, Institutions, and the Environment
Data Sources & Using VIVO Data Visualizing Science VIVO provides network analysis and visualization tools to maximize the benefits afforded by the data.
1 Using DLESE: Finding Resources to Enhance Teaching Shelley Olds Holly Devaul 11 July 2004.
1 CASE Computer Aided Software Engineering. 2 What is CASE ? A good workshop for any craftsperson has three primary characteristics 1.A collection of.
Yannis Ioannidis, Professor Evita Mailli University of Athens Dept. of Informatics & Telecom. MaDgIK Lab.
Jeanette Gurrola Psychology Department School of Behavioral & Organizational Sciences Claremont Graduate University American Evaluation.
DBM 380 AID Focus Dreams/dbm380aid.com
PIRUS PIRUS -Publisher and Institutional Repository Usage Statistics
INTRODUCTION TO GENERATING SERVICES
TRSS Terminology Registry Scoping Study
RDA US Science workshop Arlington VA, Aug 2014 Cees de Laat with many slides from Ed Seidel/Rob Pennington.
MIRACLE Cloud-based reproducible data analysis and visualization for outputs of agent-based models Xiongbing Jin, Kirsten Robinson, Allen Lee, Gary Polhill,
Tools and Services Workshop
Joslynn Lee – Data Science Educator
Cloud based linked data platform for Structural Engineering Experiment
System Design.
ServiceNow Implementation Knowledge Management
FIZZ Database General presentation.
Exploitation of ISS Scientific data - sustainability
Accessing Spatial Information from MaineDOT
Overview of the RHIS Rapid Assessment Tool
Introduction to Lime Survey
FORMAL SYSTEM DEVELOPMENT METHODOLOGIES
C2CAMP (A Working Title)
Quality Control of SDTM Domain Mappings from Electronic Case Report Forms Noga Meiry Lewin, MS Senior SAS Programmer The Emmes Corporation Target: 38th.
What’s New in Colectica 5.3 Part 1
Introduction to vital statistics report writing
Data Management: Documentation & Metadata
SDMX: A brief introduction
(VIP-EDC) Point 6 of the agenda
Intermountain West Data Warehouse
Code Analysis, Repository and Modelling for e-Neuroscience
Task Force Household Budget Survey Innovative tools and sources
Research Infrastructures: Ensuring trust and quality of data
SDMX in the S-DWH Layered Architecture
ESS VIP ICT Project Task Force Meeting 5-6 March 2013.
Spreadsheets, Modelling & Databases
Storing and Accessing G-OnRamp’s Assembly Hubs outside of Galaxy
Overview of Workflows: Why Use Them?
Metadata use in the Statistical Value Chain
Web archives as a research subject
User requirements modelling: Motivation
TOOLS & Projects overview
Code Analysis, Repository and Modelling for e-Neuroscience
Reportnet 3.0 Database Feasibility Study – Approach
Springshare’s LibInsight: E-Journals/Databases Dataset
ADMINISTRATION A guide to setup and manage your innovation platform…
Presentation transcript:

The MIRACLE project: Cyberinfrastructure for visualizing model outputs Dawn Parker, Michael Barton, Terence Dawson, Tatiana Filatova, Xiongbing Jin, Allen Lee, Ju-Sung Lee, Lorenzo Milazzo, Calvin Pritchard, J. Gary Polhill, Kirsten Robinson, and Alexey Voinov

Background and motivation Growing interest in analyzing highly detailed “big data” Concurrent development of a new generation of simulation models including ABMS, which themselves produce “big data” as outputs Need for tools and methods to analyze and compare these two data sources

Motivation for the ABM community Sharing model code is great—but there are large barriers to entry to getting someone else’s model running Sharing model output data can accomplish many of the goals of code sharing It also lets other researcher explore new parameter spaces, or use different algorithms Sharing of analysis algorithms may jump start development of complex-systems specific output analysis methods

Project Objectives 1. Collect, extend, and share methods for statistical analysis and visualization of output from computational agent-based models of coupled human and natural systems (ABM-CHANS). 2. Create web-based facilities to interactively visualize and analyze archives of model output data for ABM-CHANS models as part of CoMSES Net, an existing community modeling archive for ABM.

Project Objectives, cont. 3. Conduct meta-analyses of our own projects, and invite the ABM-CHANS community to conduct further meta-analyses using the new tools. 4. Apply the statistical analysis algorithms we develop to empirical datasets to validate their applicability to large scale data from complex social systems.

Prototype goals Simple as possible demonstration prototype Hosted on the Compute Canada/Sharcnet platform One example project under development from each participating research group, containing: Model output data and metadata Workflow description for data creation Scripts used to analyze output data, with documentation Output analysis

Planned minimal functionality: All Users Permissions Apply for a user login Query/plot results: Display existing analysis or call scripts to run new analysis using existing scripts and output data Show and download analysis existing scripts Save a query Comment on a query, data-set, or project Explore Navigate projects, data-sets, queries, and comments to find an interesting project. Search

Planned minimal functionality: Project user (research group creating data and script archive) Upload data Add/edit/rename metadata (preexisting categories) Upload and activate scripts (must link scripts to data) Permissions Join a project, accept members of a project group Publish a project

Planned minimal functionality: Administrators Permissions Manage user logins Manage problematic users/postings Record user workflows for research on how people model and how groups communicate

Simple(Simple prototype)prototype Beta prototype Simple(Simple prototype)prototype Development team: Xiongbing Jin, Allen Lee, Calvin Pritchard, Kirsten Robinson S(sp)P presented by Allen Lee

Companion threads Methods for statistical analysis of complex simulation model output data (Twente, lead institution) Metadata standards for complex simulation model output data (James Hutton Institute, lead institution) Both threads supported by user workshops

Community input: IEMSS 2014 workshop, “Analysing and synthesising results from complex socio-ecosystem models with high-dimensional input, parameter and output spaces” Focus questions: 1)   What existing and developing methodologies are currently being used to analyze, visualize, and synthesize model output data?

2)   What are the further unmet requirements of this community for data analysis, visualization, and synthesis?

Review paper in press, JASSS “The Complexities of Agent-Based Modeling Output Analysis (J.S. Lee et al.) Reviews state-of-the art approaches to output analysis Examines stability/convergence conditions, sensitivity analysis, spatio-temporal analysis, visualization, and communication Follow-up proposed IEMSS 2016 paper session will focus on novel methods from other domains, promising for ABM output analysis

Community input: ESSA 2014 workshop “Towards metadata standards for social simulation outputs” Rationale: Workflows used to create model output unknown Simulation outputs need metadata to aid interpretation and ensure replicability—data need metadata, regardless of where they come from! If we are to create a tool where users can upload their output data, we need to know its structure Users also need to know what they are looking at

ESSA 2014 workshop continued Questions: What file formats do you use for your simulation outputs? What metadata do you record in or about your simulation outputs? Metadata schema paper in draft

Metadata for ABM output data Goals User needs to understand the data (what’s inside the files, what are the relationships between the files, project and owners…) User needs to know how the data were generated (input data, analysis scripts, parameters, computer environment, workflows that chain several scripts…) Two types of metadata Metadata that describe the current state of data (data structure, file and data table content  Fine Grain Metadata) Metadata that describe the provenance of data (how the data were generated  Coarse Grain Metadata)

Capturing metadata Goal: Automated metadata extraction with minimum user input Fine grain metadata Automatically extracting metadata from files (CSV columns, ArcGIS Shapefile metadata and attribute table columns, etc.) Coarse grain metadata Workflow describes how a script could produce a certain file type, while provenance describes how script A produces file B Provenance can be automatically captured when user runs scripts and workflows using the MIRACLE system (computer environment, user name, application name, process, input files and parameters, output files.) Workflows can be constructed based on captured provenance

Summing up: How might the MIRACLE platform be used? Within a research group: Efficiently share and discuss new model results Let group member explore new parameter spaces Create accessible archives for publications Across groups: Provide prototypes to new researchers, or those looking for new analysis methods Provide examples for teaching and labs Facilitate additional “after-market” research and publication

We hope the MIRACLE project will help to… Develop, share, test, and compare new statistical methods appropriate for analysis of complex systems data; Improve communication and assessment within the modeling community; Reduce barriers to entry for use of models; Improve the ability of policy makers and stakeholders to understand and interact with model output

Acknowledgements “Digging into Data” international funding award to Parker, Dawson, Filatova, and Barton (Canadian, UK, Netherlands, and US national science funding agencies) Waterloo Institute for Complexity and Innovation Workshop participants Compute Canada/Sharcnet