Reproducible computational social science Allen Lee Center for Behavior, Institutions, and the Environment https://cbie.asu.edu.

Reproducible computational social science Allen Lee Center for Behavior, Institutions, and the Environment https://cbie.asu.edu

Computational Social Science Wicked collective action problems Innovation -> Problems -> Innovation Mitigate transaction costs for information transfer

Methodologies Case study analysis Controlled experiments Computational modeling Integrative data analysis / natural experiments

Case Study Analysis seshatdatabank.info “Our goal is to test rival social scientific hypotheses with historical and archaeological data … treating history as a predictive, analytic science.”seshatdatabank.info

SES Library Descriptions of social ecological systems from around the world Embeds mathematical models relating to specific cases where relevant to specific social-ecological dynamics via xppaut

Controlled Behavioral Experiments Web-based experiments: Mechanical Turk, oTree, nodeGame, vcweb Desktop experiments: zTree, CoNG, foraging, irrigation Diversity in software platforms is valuable but also presents challenges General issues summarized in Experimental platforms for behavioral experiments on social-ecological systems (Janssen, Lee, Waring, 2014)

Computational Modeling Extrapolate potential future scenarios for complex systems with many interacting actors Computational modeling makes the processes underlying complex phemonema explicit, sharable, & reproducible. Assumptions are laid bare, and alternative assumptions / parameterizations can be explored via sensitivity analysis George Box – “All models are wrong, but some are useful”

Multiple methods Convergent validity Multiple methods complement each other, e.g., experiments, case study analysis, formal modeling (Poteete, et al., 2010)

Reproducibility Victoria Stodden: how do we know inference is reliable, and why should we believe "Big Data" findings? Need new standards for conducting “Data and Computational Science” and communicating results: sound workflows, sharing specifications, guides to good practice Distinguishing between empirical, statistical, and computational reproducibility

Replicable Research Workflows Planning, organizing, and documenting your research protocols Developing code for data analysis or experiments Running your analyses (generating visualizations) or conducting experiments (generating data) Presenting / publishing findings Cleaning and documenting your code and data Archival and documentation with contextual metadata that preserves provenance https://osf.io is a good example of a full-stack systemhttps://osf.io

Archiving data Vines TH et al. (2013) Current Biology DOI:10.1016/j.cub.2013.11.014

CoMSES Net Computational Model Library for archiving model code, next generation in active development and planning stages Provide suite of microservices for transparency and reproducibility in computational modeling

The MIRACLE project: Cyberinfrastructure for visualizing model outputs Dawn Parker, Michael Barton, Terence Dawson, Tatiana Filatova, Xiongbing Jin, Allen Lee, Ju-Sung Lee, Lorenzo Milazzo, Calvin Pritchard, J. Gary Polhill, Kirsten Robinson, and Alexey Voinov

Background and motivation Growing interest in analyzing highly detailed “big data” Concurrent development of a new generation of simulation models including ABMS, which themselves produce “big data” as outputs Need for tools and methods to analyze and compare these two data sources

Motivation Sharing model code is great—but there are large barriers to entry to getting someone else’s model running (Collberg, et al 2015) Sharing model output data can accomplish many of the goals of code sharing It also lets other researcher explore new parameter spaces, or use different algorithms Sharing of analysis algorithms may jump start development of complex-systems specific output analysis methods

Objectives Collect, extend, and share methods for statistical analysis and visualization of output from computational agent-based models of coupled human and natural systems (ABM-CHANS). Provide interactive visualization and analysis of archived model output data for ABM-CHANS models

Objectives, cont. Conduct meta-analyses of our own projects, and invite the ABM-CHANS community to conduct further meta- analyses using the new tools. Apply the statistical analysis algorithms we develop to empirical datasets to validate their applicability to large scale data from complex social systems.

Metadata for ABM output data Goals –User needs to understand the data (what’s inside the files, what are the relationships between the files, project and owners…) –User needs to know how the data were generated (input data, analysis scripts, parameters, computer environment, workflows that chain several scripts…) Two types of metadata –Metadata that describe the current state of data (data structure, file and data table content  Fine Grain Metadata) –Metadata that describe the provenance of data (how the data were generated  Coarse Grain Metadata)

Capturing metadata Goal: Automated metadata extraction with minimum user input Fine grain metadata –Automatically extracting metadata from files (CSV columns, ArcGIS Shapefile metadata and attribute table columns, etc.) Coarse grain metadata –Workflow describes how a script could produce a certain file type, while provenance describes how script A produces file B –Provenance can be automatically captured when user runs scripts and workflows using the MIRACLE system (computer environment, user name, application name, process, input files and parameters, output files.) –Workflows can be constructed based on captured provenance

MIRACLE platform use cases Within a research group: –Efficiently share and discuss new model results –Let group member explore new parameter spaces –Create accessible archives for publications Across groups: –Provide prototypes to new researchers, or those looking for new analysis methods –Provide examples for teaching and labs –Facilitate additional “after-market” research and publication

MIRACLE project goals Develop, share, test, and compare new statistical methods appropriate for analysis of complex systems data; Improve communication and assessment within the modeling community; Reduce barriers to entry for use of models; Improve the ability of policy makers and stakeholders to understand and interact with model output

CoMSES Net: Catalog Track the state of archival Provide collective- action tools to incentivize model sharing

CoMSES Net: Catalog

CoMSES Net Future Goals Provide one-stop shop for computational modeling containerized execution with bundled dependencies integration with Jupyter and CyVerse and modeling platforms like RePast, NetLogo Reparameterizable data analysis and exploration via the Miracle project Bibliometric tracking Collective action tools to incentivize prosocial behavior among scientists

From http://stanford.edu/~vcs/talks/UIUCDataSummit-Feb5-2016-STODDEN.pdf

Guide to good practice Learn to use a source control system (git, mercurial, SVN) Use it with discipline: – commit early, commit often – write meaningful log messages –create tags and releases at important checkpoints during the research process List versioned dependencies (e.g., packrat, Maven/gradle, pip)

Guide to good practice Plan for reproducibility Use version control efficiently Archive everything – data, code, and contextual / provenance metadata Prefer open, durable, formats (plaintext, CSV, open file formats) Use cloud backups Automate where possible Learn the basics of “software carpentry”

Guides to good practice

Computational Social Science

Comments / Questions?

Reproducible computational social science Allen Lee Center for Behavior, Institutions, and the Environment https://cbie.asu.edu.

Similar presentations

Presentation on theme: "Reproducible computational social science Allen Lee Center for Behavior, Institutions, and the Environment https://cbie.asu.edu."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Reproducible computational social science Allen Lee Center for Behavior, Institutions, and the Environment https://cbie.asu.edu.

Similar presentations

Presentation on theme: "Reproducible computational social science Allen Lee Center for Behavior, Institutions, and the Environment https://cbie.asu.edu."— Presentation transcript:

Similar presentations

About project

Feedback