Download presentation
Presentation is loading. Please wait.
Published byAlberta Gibbs Modified over 8 years ago
1
Reproducible computational social science Allen Lee Center for Behavior, Institutions, and the Environment https://cbie.asu.edu
2
Computational Social Science Wicked collective action problems Innovation -> Problems -> Innovation Mitigate transaction costs for information transfer
3
Methodologies Case study analysis Controlled experiments Computational modeling Integrative data analysis / natural experiments
4
Case Study Analysis seshatdatabank.info “Our goal is to test rival social scientific hypotheses with historical and archaeological data … treating history as a predictive, analytic science.”seshatdatabank.info
5
SES Library Descriptions of social ecological systems from around the world Embeds mathematical models relating to specific cases where relevant to specific social-ecological dynamics via xppaut
6
Controlled Behavioral Experiments Web-based experiments: Mechanical Turk, oTree, nodeGame, vcweb Desktop experiments: zTree, CoNG, foraging, irrigation Diversity in software platforms is valuable but also presents challenges General issues summarized in Experimental platforms for behavioral experiments on social-ecological systems (Janssen, Lee, Waring, 2014)
10
Computational Modeling Extrapolate potential future scenarios for complex systems with many interacting actors Computational modeling makes the processes underlying complex phemonema explicit, sharable, & reproducible. Assumptions are laid bare, and alternative assumptions / parameterizations can be explored via sensitivity analysis George Box – “All models are wrong, but some are useful”
11
Multiple methods Convergent validity Multiple methods complement each other, e.g., experiments, case study analysis, formal modeling (Poteete, et al., 2010)
12
Reproducibility Victoria Stodden: how do we know inference is reliable, and why should we believe "Big Data" findings? Need new standards for conducting “Data and Computational Science” and communicating results: sound workflows, sharing specifications, guides to good practice Distinguishing between empirical, statistical, and computational reproducibility
13
Replicable Research Workflows Planning, organizing, and documenting your research protocols Developing code for data analysis or experiments Running your analyses (generating visualizations) or conducting experiments (generating data) Presenting / publishing findings Cleaning and documenting your code and data Archival and documentation with contextual metadata that preserves provenance https://osf.io is a good example of a full-stack systemhttps://osf.io
14
Archiving data Vines TH et al. (2013) Current Biology DOI:10.1016/j.cub.2013.11.014
15
CoMSES Net Computational Model Library for archiving model code, next generation in active development and planning stages Provide suite of microservices for transparency and reproducibility in computational modeling
16
The MIRACLE project: Cyberinfrastructure for visualizing model outputs Dawn Parker, Michael Barton, Terence Dawson, Tatiana Filatova, Xiongbing Jin, Allen Lee, Ju-Sung Lee, Lorenzo Milazzo, Calvin Pritchard, J. Gary Polhill, Kirsten Robinson, and Alexey Voinov
17
Background and motivation Growing interest in analyzing highly detailed “big data” Concurrent development of a new generation of simulation models including ABMS, which themselves produce “big data” as outputs Need for tools and methods to analyze and compare these two data sources
18
Motivation Sharing model code is great—but there are large barriers to entry to getting someone else’s model running (Collberg, et al 2015) Sharing model output data can accomplish many of the goals of code sharing It also lets other researcher explore new parameter spaces, or use different algorithms Sharing of analysis algorithms may jump start development of complex-systems specific output analysis methods
19
Objectives Collect, extend, and share methods for statistical analysis and visualization of output from computational agent-based models of coupled human and natural systems (ABM-CHANS). Provide interactive visualization and analysis of archived model output data for ABM-CHANS models
20
Objectives, cont. Conduct meta-analyses of our own projects, and invite the ABM-CHANS community to conduct further meta- analyses using the new tools. Apply the statistical analysis algorithms we develop to empirical datasets to validate their applicability to large scale data from complex social systems.
21
Metadata for ABM output data Goals –User needs to understand the data (what’s inside the files, what are the relationships between the files, project and owners…) –User needs to know how the data were generated (input data, analysis scripts, parameters, computer environment, workflows that chain several scripts…) Two types of metadata –Metadata that describe the current state of data (data structure, file and data table content Fine Grain Metadata) –Metadata that describe the provenance of data (how the data were generated Coarse Grain Metadata)
22
Capturing metadata Goal: Automated metadata extraction with minimum user input Fine grain metadata –Automatically extracting metadata from files (CSV columns, ArcGIS Shapefile metadata and attribute table columns, etc.) Coarse grain metadata –Workflow describes how a script could produce a certain file type, while provenance describes how script A produces file B –Provenance can be automatically captured when user runs scripts and workflows using the MIRACLE system (computer environment, user name, application name, process, input files and parameters, output files.) –Workflows can be constructed based on captured provenance
23
MIRACLE platform use cases Within a research group: –Efficiently share and discuss new model results –Let group member explore new parameter spaces –Create accessible archives for publications Across groups: –Provide prototypes to new researchers, or those looking for new analysis methods –Provide examples for teaching and labs –Facilitate additional “after-market” research and publication
24
MIRACLE project goals Develop, share, test, and compare new statistical methods appropriate for analysis of complex systems data; Improve communication and assessment within the modeling community; Reduce barriers to entry for use of models; Improve the ability of policy makers and stakeholders to understand and interact with model output
25
CoMSES Net: Catalog Track the state of archival Provide collective- action tools to incentivize model sharing
26
CoMSES Net: Catalog
27
CoMSES Net Future Goals Provide one-stop shop for computational modeling containerized execution with bundled dependencies integration with Jupyter and CyVerse and modeling platforms like RePast, NetLogo Reparameterizable data analysis and exploration via the Miracle project Bibliometric tracking Collective action tools to incentivize prosocial behavior among scientists
28
From http://stanford.edu/~vcs/talks/UIUCDataSummit-Feb5-2016-STODDEN.pdf
29
Guide to good practice Learn to use a source control system (git, mercurial, SVN) Use it with discipline: – commit early, commit often – write meaningful log messages –create tags and releases at important checkpoints during the research process List versioned dependencies (e.g., packrat, Maven/gradle, pip)
30
Guide to good practice Plan for reproducibility Use version control efficiently Archive everything – data, code, and contextual / provenance metadata Prefer open, durable, formats (plaintext, CSV, open file formats) Use cloud backups Automate where possible Learn the basics of “software carpentry”
31
Guides to good practice
32
Computational Social Science
33
Comments / Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.