ReproZip: Computational Reproducibility With Ease

Slides:



Advertisements
Similar presentations
Implementing Tableau Server in an Enterprise Environment
Advertisements

Module 1: Introduction to SQL Server Reporting Services.
Planning Server Deployments
LINUX-WINDOWS INTERACTION. One software allowing interaction between Linux and Windows is WINE. Wine allows Linux users to load Windows programs while.
VisTrails: Overview Juliana Freire University of Utah Joint work with: Erik Andersen, Steven P. Callahan, David Koop, Emanuele.
Cambodia-India Entrepreneurship Development Centre - : :.... :-:-
 Visual Studio has great support for building ASP.NET web applications  Real web application development involves more than just copying the files created.
Reproducible Environment for Scientific Applications (Lab session) Tak-Lon (Stephen) Wu.
Guidance and resources for migrating from Windows Server 2008 Windows Server 2012 R2 Migration and Upgrade Guide.
SOFTWARE.
INTRODUCTION TO WEB DATABASE PROGRAMMING
DB2 (Express C Edition) Installation and Using a Database
Moving forward our shared data agenda: a view from the publishing industry ICSTI, March 2012.
RMG Study Group Session I: Git, Sphinx, webRMG Connie Gao 9/20/
Introduction to HP LoadRunner Getting Familiar with LoadRunner >>>>>>>>>>>>>>>>>>>>>>
Sikuli Ivailo Dinkov QA Engineer PhoneX Team Telerik QA Academy.
OnTimeMeasure Integration with Gush Prasad Calyam, Ph.D. (PI) Tony Zhu (Software Programmer) Alex Berryman (REU Student) GEC10 Selected.
SAS Workshop Lecture 1 Lecturer: Annie N. Simpson, MSc.
WaveMaker Visual AJAX Studio 4.0 Training Installation.
Learningcomputer.com SQL Server 2008 Configuration Manager.
By Rashid Khan Lesson 10-From Here to There: Remote Installation of the Windows XP Professional Client.
DEV325 Deploying Visual Studio.NET Applications Billy Hollis Author / Consultant.
Embedded Multicore processing for Mobile Communications Real-time Contracts and Thread Profiling York Jack Whitham.
A framework to support collaborative Velo: Knowledge Management for Collaborative (Science | Biology) Projects A framework to support collaborative 1.
The PROGRESS Grid Service Provider Maciej Bogdański Portals & Portlets 2003 Edinburgh, July 14th-17th.
1 The Five Parts of an Information System
Styx Grid Services: Lightweight, easy-to-use middleware for e-Science Jon Blower Keith Haines Reading e-Science Centre, ESSC, University of Reading, RG6.
ABSTRACT The JDBC (Java Database Connectivity) API is the industry standard for database- independent connectivity between the Java programming language.
ReproZip Packing Experiments for Sharing and Publication Fernando Chirigati, Juliana Freire | NYU-Poly Dennis Shasha | NYU.
Module 8 : Configuration II Jong S. Bok
Introduction of Geoprocessing Lecture 9. Geoprocessing  Geoprocessing is any GIS operation used to manipulate data. A typical geoprocessing operation.
Data Science Background and Course Software setup Week 1.
1 AHM -2-4 Sept 2003 e-Science Centre Running SRB Ananta Manandhar.
Panasonic UC Pro - UC Pro Server setup with Active Directory -
Selenium server By, Kartikeya Rastogi Mayur Sapre Mosheca. R
ACCESSING DATA IN THE NIS USING THE KEPLER WORKFLOW SYSTEM Corinna Gries.
Lecture 11 Introduction to R and Accessing USGS Data from Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2013 This work was funded by National.
Planning Server Deployments Chapter 1. Server Deployment When planning a server deployment for a large enterprise network, the operating system edition.
 Cloud Computing technology basics Platform Evolution Advantages  Microsoft Windows Azure technology basics Windows Azure – A Lap around the platform.
Windchill WorkGroup Manager (WGM) for Inventor installation
Accessing the VI-SEEM infrastructure
SmartCenter for Pointsec - MI
Build and Test system for FairRoot
Mike Hildreth representing the DASPOS project
Incorporating W3C’s DQV and PROV in CISER’s Data Quality Review and
An Approach to Software Preservation
What are they? The Package Repository Client is a set of Tcl scripts that are capable of locating, downloading, and installing packages for both Tcl and.
MIRACLE Cloud-based reproducible data analysis and visualization for outputs of agent-based models Xiongbing Jin, Kirsten Robinson, Allen Lee, Gary Polhill,
Data Virtualization Demoette… ADO.NET Client
Taming the Complexity of Artifact Reproducibility
Figure 2: Make a component
Enhancing Scholarly Communication with ReproZip
R Programming.
ReproZip: Reproducibility with Ease
Skill Based Assessment
Skill Based Assessment - Entity Framework -
DHCP, DNS, Client Connection, Assignment 1 1.3
Chapter 23 – ASP.NET Outline 23.1 Introduction NET Overview
Haiyan Meng and Douglas Thain
Rui Wu, Jose Painumkal, Sergiu M. Dascalu, Frederick C. Harris, Jr
Data management for reproducible research
Section 14.1 Section 14.2 Identify the technical needs of a Web server
Microsoft Virtual Academy
Information Technology Ms. Abeer Helwa
Configuration Of A Pull Network.
Step by step installation of a Domino server on Docker
Dtk-tools Benoit Raybaud, Research Software Manager.
Azure Container Service
Web Application Development Using PHP
Scientific Workflows Lecture 15
Presentation transcript:

ReproZip: Computational Reproducibility With Ease Fernando Chirigati, Rémi Rampin, Dennis Shasha, and Juliana Freire Pack your experiment on your system S … … and unpack on another system S’ … … using as few as 2 commands for each step ! Reproducibility is Necessary, but Hard … ReproZip! - Data files - Software and library dependencies - Environment variables - etc. reprozip trace reprozip pack Open, unpack, and reproduce anywhere, anytime! reprounzip setup reprounzip run … Until now! Come Try ReproZip! Packing Experiments on Operating System S File and Dataflow Management Input files can be replaced using reprounzip upload. Output files can be retrieved using reprounzip download. ReproZip can also derive a specification of the experiment for the VisTrails system, which represents the original workflow in a GUI and enables the dataflow to be modified to explore different techniques, perform analyses, and reuse some of the steps for your own research. Demos System Call Tracing (reprozip trace) ReproZip transparently captures the provenance of the execution of the experiment, i.e., all the required information to correctly reproduce the experiment, including data files, databases, programs, library dependencies, and OS information. The execution trace is stored in SQLite. Provenance Analysis (reprozip trace) Given the files that were read and using the package manager of the OS, ReproZip identifies the software packages on which the experiment depends. ReproZip also uses some heuristics to identify input and output files. All the required information is written to a human-readable configuration file. Package Customization The configuration file can be edited by researchers, e.g., to remove large files that can be obtained elsewhere, or to remove sensitive or proprietary information. Package Generation (reprozip pack) All the required files are packed on the author’s system S in a .rpz file. ReproZip supports a wide range of experiments, including client-server scenarios, databases, and graphical and interactive tools. Our repository includes a variety of examples from different domains (http://bit.ly/reprozip-examples). The following example is available at http://bit.ly/stacked-up: This example ports an entire Web application (http://stackedup.org/) to a different machine/OS. The Web application includes the Django Web framework and a PostgreSQL database. Using reprounzip docker, it is also possible to port the entire app to a cloud server (e.g.: AWS). Unpacking Experiments on Operating System S’ Other examples include: An interactive visualization tool: http://bit.ly/bus-vis A simulation in statistical physics: http://bit.ly/ising-model Plots from a SIGMOD’16 paper: http://bit.ly/data-polygamy Other Use Cases Unpackers If S and S’ are incompatible: vagrant, docker If S and S’ are compatible: directory, chroot, vagrant, docker Experiment Setup (reprounzip setup) The experiment is automatically extracted and set up depending on the chosen unpacker. Experiment Reproduction (reprounzip run) The experiment is reproduced depending on the chosen unpacker. For instance, for vagrant and docker, this is done inside a virtual image and a Docker container, respectively. ReproZip has been used in the reproducibility section of Elsevier’s Information Systems Journal. ReproZip was recommended by the ACM SIGMOD Reproducibility Review. ReproZip has been listed on the Artifact Evaluation Process guidelines. References ReproZip’s Homepage: https://vida-nyu.github.io/reprozip/ ReproZip’s Documentation: https://reprozip.readthedocs.io Acknowledgments: This work was supported in part by NSF awards CNS-1229185 and CI-EN-1405927, and by the Moore-Sloan Data Science Environment at NYU.