Enhancing Scholarly Communication with ReproZip

Slides:



Advertisements
Similar presentations
Module 1: Introduction to SQL Server Reporting Services.
Advertisements

LINUX-WINDOWS INTERACTION. One software allowing interaction between Linux and Windows is WINE. Wine allows Linux users to load Windows programs while.
VisTrails: Overview Juliana Freire University of Utah Joint work with: Erik Andersen, Steven P. Callahan, David Koop, Emanuele.
1 Software Testing and Quality Assurance Lecture 30 – Testing Systems.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 10: Server Administration.
 Visual Studio has great support for building ASP.NET web applications  Real web application development involves more than just copying the files created.
Systems Software Operating Systems.
Reproducible Environment for Scientific Applications (Lab session) Tak-Lon (Stephen) Wu.
DB2 (Express C Edition) Installation and Using a Database
Introduction to HP LoadRunner Getting Familiar with LoadRunner >>>>>>>>>>>>>>>>>>>>>>
OnTimeMeasure Integration with Gush Prasad Calyam, Ph.D. (PI) Tony Zhu (Software Programmer) Alex Berryman (REU Student) GEC10 Selected.
SAS Workshop Lecture 1 Lecturer: Annie N. Simpson, MSc.
IT Essentials 1 v4.0 Chapters 4 & 5 JEOPARDY RouterModesWANEncapsulationWANServicesRouterBasicsRouterCommands RouterModesWANEncapsulationWANServicesRouterBasicsRouterCommands.
Capture and Replay Often used for regression test development –Tool used to capture interactions with the system under test. –Inputs must be captured;
Learningcomputer.com SQL Server 2008 Configuration Manager.
By Rashid Khan Lesson 10-From Here to There: Remote Installation of the Windows XP Professional Client.
Introduction of Geoprocessing Topic 7a 4/10/2007.
Systems Software Operating Systems. What is software? Software is the term that we use for all the programs and data that we use with a computer system.
The PROGRESS Grid Service Provider Maciej Bogdański Portals & Portlets 2003 Edinburgh, July 14th-17th.
1 The Five Parts of an Information System
Large Scale Nuclear Physics Calculations in a Workflow Environment and Data Provenance Capturing Fang Liu and Masha Sosonkina Scalable Computing Lab, USDOE.
ABSTRACT The JDBC (Java Database Connectivity) API is the industry standard for database- independent connectivity between the Java programming language.
A university for the world real R © 2009, Chapter 9 The Runtime Environment Michael Adams.
ReproZip Packing Experiments for Sharing and Publication Fernando Chirigati, Juliana Freire | NYU-Poly Dennis Shasha | NYU.
Introduction of Geoprocessing Lecture 9. Geoprocessing  Geoprocessing is any GIS operation used to manipulate data. A typical geoprocessing operation.
Data Science Background and Course Software setup Week 1.
Shell Interface Shell Interface Functions Data. Graphical Interface Graphical Interface Command-line Interface Command-line Interface Experiments Private.
J.P. Wellisch, CERN/EP/SFT SCRAM Information on SCRAM J.P. Wellisch, C. Williams, S. Ashby.
Khalid Belhajjame 1, Paolo Missier 2, and Carole A. Goble 1 1 University of Manchester 2 University of Newcastle Detecting Duplicate Records in Scientific.
Introduction of Geoprocessing Lecture 9 3/24/2008.
Software Installation and Copyrights Basic Computer Concepts Installation Basics  Installation Process  Copy files from distribution disks.
Software testing techniques Software testing techniques REGRESSION TESTING Presentation on the seminar Kaunas University of Technology.
ACCESSING DATA IN THE NIS USING THE KEPLER WORKFLOW SYSTEM Corinna Gries.
SHIWA Desktop Cardiff University David Rogers, Ian Harvey, Ian Taylor, Andrew Jones.
From InDiCo to JACoW in one (well maybe a few) click(s) J. Poole.
Desktop Database and Climate Analysis Steven Burian and Erfan Goharian Hydroinformatics Fall 2013.
New Superpowers for FME Server Mark Stoakes Manager, Professional Services.
WP4 Models and Contents Quality Assessment
SmartCenter for Pointsec - MI
Extended Operating System Support
Incorporating W3C’s DQV and PROV in CISER’s Data Quality Review and
Sharing models as social objects through HydroShare
Microsoft Virtual Academy
An Approach to Software Preservation
Software Architecture ATAM Process Presentation
What are they? The Package Repository Client is a set of Tcl scripts that are capable of locating, downloading, and installing packages for both Tcl and.
MIRACLE Cloud-based reproducible data analysis and visualization for outputs of agent-based models Xiongbing Jin, Kirsten Robinson, Allen Lee, Gary Polhill,
ReproZip: Computational Reproducibility With Ease
CyVerse Discovery Environment
Computer Software.
On the road: Test automation in practice for a BMW map update service
Figure 2: Make a component
Intranet and Internet Based Groupware and Workflow
ReproZip: Reproducibility with Ease
How to Fix Microsoft Office Error 2932 at Support Number
Introduction to Computers
DHCP, DNS, Client Connection, Assignment 1 1.3
Lesson 1: Introduction to Trifacta Wrangler
Chapter 23 – ASP.NET Outline 23.1 Introduction NET Overview
Haiyan Meng and Douglas Thain
Business Process Management Software
Exploring the Power of EPDM Tasks - Working with and Developing Tasks in EPDM By: Marc Young XLM Solutions
Information Technology Ms. Abeer Helwa
Course: Module: Lesson # & Name Instructional Material 1 of 32 Lesson Delivery Mode: Lesson Duration: Document Name: 1. Professional Diploma in ERP Systems.
Configuration Of A Pull Network.
Dtk-tools Benoit Raybaud, Research Software Manager.
Chapter 5 Architectural Design.
Basic Dynamic Analysis VMs and Sandboxes
1.3.7 High- and low-level languages and their translators
Web Application Development Using PHP
Presentation transcript:

Enhancing Scholarly Communication with ReproZip Fernando Chirigati, Rémi Rampin, Victoria Steeves, Dennis Shasha, and Juliana Freire Pack your experiment on your system S … … and unpack on another system S’ … … using as few as 2 commands for each step ! Reproducibility is Necessary, but Hard … ReproZip! - Data files - Software and library dependencies - Environment variables - etc. reprozip trace reprozip pack Open, unpack, and reproduce anywhere, anytime! reprounzip setup reprounzip run … Until now! Come Try ReproZip! Packing Experiments on Operating System S File and Dataflow Management Input files can be replaced using reprounzip upload Output files can be retrieved using reprounzip download ReproZip can also derive a specification of the experiment for the VisTrails system, which represents the original workflow in a GUI and enables the dataflow to be modified to explore different techniques, perform analyses, and reuse some of the steps for your own research. Example System Call Tracing (reprozip trace) ReproZip transparently captures the provenance of the execution of the experiment, i.e., all the required information to correctly reproduce the experiment, including data files, programs, library dependencies, and OS information. The execution trace is stored in SQLite. Provenance Analysis (reprozip trace) Given the files that were read and using the package manager of the OS, ReproZip identifies the software packages on which the experiment depends. ReproZip also uses some heuristics to identify input and output files. All the required information is written to a human-readable configuration file. Package Customization The configuration file can be edited by researchers, e.g., to remove large files that can be obtained elsewhere, or to remove sensitive or proprietary information. Package Generation (reprozip pack) All the required files are packed on the author’s system S in a .rpz file. This data journalism example is available at http://bit.ly/BechdelFiveThirtyEight This example tries to replicate the claims of an article in FiveThirtyEight that analyzes gender bias in the movie business using the Bechdel test. Some of the conclusions from this reproduction were the same, some were different, and some were new. Since there are no details Unpacking Experiments on Operating System S’ on the analysis performed in the original article, it is difficult to know why some of the conclusions differ. Use Cases ReproZip supports a wide range of experiments, including client-server scenarios, experiments with databases, and graphical and interactive tools. ReproZip has been used by the Information Systems Journal to reproduce the results of published articles. ReproZip was recommended by the ACM SIGMOD 2015 Reproducibility Review ReproZip has been listed on the Artifact Evaluation Process guidelines Unpackers S and S’ are incompatible: vagrant, docker S and S’ are compatible: directory, chroot, vagrant, docker Experiment Setup (reprounzip setup) The experiment is automatically extracted and set up depending on the chosen unpacker. Experiment Reproduction (reprounzip run) The experiment is reproduced depending on the chosen unpacker. For instance, for vagrant and docker, this is done inside a virtual image and a Docker container, respectively, in an automatic and transparent way through reprounzip and its command-line interfaces. Acknowledgments This work was supported in part by NSF awards CNS-1229185 and CI-EN-1405927, and by the Moore-Sloan Data Science Environment at NYU. References [1] ReproZip’s Homepage: https://vida-nyu.github.io/reprozip/ [2] ReproZip Examples and Demos: https://github.com/ViDA-NYU/reprozip-examples [3] F. Chirigati, R. Rampin, D. Shasha, and J. Freire, “ReproZip: Computational Reproducibility with Ease” In: Proceedings of SIGMOD’16, Demo Session, 2016.