ReproZip Packing Experiments for Sharing and Publication Fernando Chirigati, Juliana Freire | NYU-Poly Dennis Shasha | NYU.

Slides:



Advertisements
Similar presentations
CONCEPTUAL WEB-BASED FRAMEWORK IN AN INTERACTIVE VIRTUAL ENVIRONMENT FOR DISTANCE LEARNING Amal Oraifige, Graham Oakes, Anthony Felton, David Heesom, Kevin.
Advertisements

The Virtual Estuary: Simulation meets Visualization Yvette Spitz Scott Durski Erik Anderson Joel Daniels Juliana Freire Claudio Silva Antonio Baptista.
Ewa Deelman, Integrating Existing Scientific Workflow Systems: The Kepler/Pegasus Example Nandita Mangal,
Trustworthy and Personalized Computing Christopher Strasburg Department of Computer Science Iowa State University November 12, 2008.
Toward Replayable Research in Networking and Systems Eric Eide University of Utah, School of Computing May 25, 2010.
CSCI 3 Introduction to Computer Science. CSCI 3 Course Description: –An overview of the fundamentals of computer science. Topics covered include number.
VisTrails: Overview Juliana Freire University of Utah Joint work with: Erik Andersen, Steven P. Callahan, David Koop, Emanuele.
Integrated Scientific Workflow Management for the Emulab Network Testbed Eric Eide, Leigh Stoller, Tim Stack, Juliana Freire, and Jay Lepreau and Jay Lepreau.
2010 © University of Michigan 1 Text Retrieval and Data Mining in SI - An Introduction Qiaozhu Mei School of Information Computer Science and Engineering.
Computer Software.
Virtual Geophysics Laboratory (VGL) VGL v1.2 NeCTAR Project Close R.Fraser, T.Rankine, J.Vote, L.Wyborn, B.Evans, R.Woodcock, C.Kemp July 2013 CSIRO |
Software Cluster Improve Collaboration and Community Engagement Work with diverse communities that contribute to the sustainability of scientific software.
Building Data-intensive Pipelines Ravi K Madduri Argonne National Lab University of Chicago.
Virtualization Technology Prof D M Dhamdhere CSE Department IIT Bombay Moving towards Virtualization… Department of Computer Science and Engineering, IIT.
Using Provenance to Support Real-Time Collaborative Design of Workflows Tommy Ellkvist 1, Erik Anderson 2, David Koop 2, Juliana Freire 2, and Claudio.
Introduction Methodology Results This study aims to explore the current progress of using different types of software with various autism conditions. One.
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space The Capabilities of the GridSpace2 Experiment.
University of Kansas Electrical Engineering Computer Science Jerry James and Douglas Niehaus Information and Telecommunication Technology Center Electrical.
Lecture 01: Introduction September 5, 2012 COMP Visual Analytics and Provenance.
Java Virtual Machine Java Virtual Machine A Java Virtual Machine (JVM) is a set of computer software programs and data structures that use.
Parser-Driven Games Tool programming © Allan C. Milne Abertay University v
November 2003 Presented to “Commercializing RDF” Semantic Software Solutions for Enterprise Web Management International World Wide Web Conference 2004.
LAB CVP 2009 ‘Leveraging the LIMS Investment’. Invested in a Laboratory Information Management System (LIMS) Solution is limited to Storing and Reporting.
Composing Adaptive Software Authors Philip K. McKinley, Seyed Masoud Sadjadi, Eric P. Kasten, Betty H.C. Cheng Presented by Ana Rodriguez June 21, 2006.
BalticGrid-II Project 2nd BG-II AHM, , Riga, Latvia1 Overview of application CoPS (Comparison of Protein Structures) D.Ludviga IMCS UL (SigmaNet)
Solution Showcase for the Microsoft Office System Technical Overview and Benefits Increasing Student Achievement and Sharing Instructional Best Practices.
Productivity Tools Ken Nguyen Department of Information Technology Clayton State University.
NanoHUB.org and HUBzero™ Platform for Reproducible Computational Experiments Michael McLennan Director and Chief Architect, Hub Technology Group and George.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
DAME: A Distributed Diagnostics Environment for Maintenance Duncan Russell University of Leeds.
PEERSPECTIVE.MPI-SWS.ORG ALAN MISLOVE KRISHNA P. GUMMADI PETER DRUSCHEL BY RAGHURAM KRISHNAMACHARI Exploiting Social Networks for Internet Search.
Infrastructures for Social Simulation Rob Procter National e-Infrastructure for Social Simulation ISGC 2010 Social Simulation Tutorial.
Nature Reviews/2012. Next-Generation Sequencing (NGS): Data Generation NGS will generate more broadly applicable data for various novel functional assays.
Portable Infrastructure for the Metafor Metadata System Charlotte Pascoe 1, Gerry Devine 2 1 NCAS-BADC, 2 NCAS-CMS University of Reading PIMMS provides.
Microsoft Management Seminar Series SMS 2003 Change Management.
LOGOPolyUnpack: Automating the Hidden-Code Extraction of Unpack-Executing Malware Royal, P.; Halpin, M.; Dagon, D.; Edmonds, R.; Wenke Lee; Computer Security.
Data Science Background and Course Software setup Week 1.
LaHave House Project 1 LaHave House Project Automated Architectural Design BML + ARC.
Biodiversity Data Exchange Using PRAGMA Cloud Umashanthi Pavalanathan, Aimee Stewart, Reed Beaman, Shahir Shamsir C. J. Grady, Beth Plale Mount Kinabalu.
Introduction to Operations Research. MATH Mathematical Modeling 2 Introduction to Operations Research Operations research/management science –Winston:
IPlant Discovery Environment An Overview. What is it? The Discovery Environment has been described in many ways… “It’s a virtual workbench…” “It’s where.
Mike Hildreth DASPOS Update Mike Hildreth representing the DASPOS project 1.
VisTrails Second Provenance Challenge Tommy Ellkvist David Koop Juliana Freire Joint work with: Erik Andersen, Steven P. Callahan, Emanuele Santos, Carlos.
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space The Capabilities of the GridSpace2 Experiment.
IPlant Collaborative Tools and Services Workshop Overview of the iPlant Discovery Environment Sriram Srinivasan.
Collection and storage of provenance data Jakub Wach Master of Science Thesis Faculty of Electrical Engineering, Automatics, Computer Science and Electronics.
Overview + Digital Strategy + Interactive Engineering + Experience Design + Product Incubation + Data Visualization and Discovery + Data Management.
Using Provenance to Enable Reproducible Science Juliana Freire NYU Poly.
Using Docker in a CyVerse World The main portion of this tutorial should take about 45 minutes to go through, and assumes you have already gone through.
A Model for Computational Science Investigations Supercomputing Challenge 2007.
1 Visual Computing Institute | Prof. Dr. Torsten W. Kuhlen Virtual Reality & Immersive Visualization Till Petersen-Krauß | GUI Testing | GUI.
INSPIREHEP … and data Sünje Dallmeier-Tiessen (CERN) for many collaborators in GS-SIS and IT-CIS.
Research Objects Preserving scientific data and methods Stian Soiland-Reyes, Khalid Belhajjame School of Computer Science, Univ of Manchester myGrid NIHBI.
Accessing the VI-SEEM infrastructure
RDA US Science workshop Arlington VA, Aug 2014 Cees de Laat with many slides from Ed Seidel/Rob Pennington.
Budget JRA2 Beneficiaries Description TOT Costs incl travel
Cloudy Skies: Astronomy and Utility Computing
Center for Open Science: Practical Steps for Increasing Openness
ReproZip: Computational Reproducibility With Ease
CSCI-235 Micro-Computer Applications
Juliana Freire, Norbert Fuhr, Andreas Rauber
A cloud platform for interactive reproducible computational experiments Siddeswara Guru Data Science Director.
Enhancing Scholarly Communication with ReproZip
Containers in HPC By Raja.
ReproZip: Reproducibility with Ease
Graduation Project Kick-off presentation - SET
What, why and best practices in open research
Leigh Grundhoefer Indiana University
Application Problem Resolution The “Invisible Problem” or
Dtk-tools Benoit Raybaud, Research Software Manager.
Presentation transcript:

ReproZip Packing Experiments for Sharing and Publication Fernando Chirigati, Juliana Freire | NYU-Poly Dennis Shasha | NYU

Motivation Published articles are not made reproducible Computational reproducibility may be difficult to achieve Some current solutions require the user to adopt a system o GenePattern [1], Madagascar [2], Scientific Workflow Systems [3] Other solutions rely on capturing information about the computational environment o Virtual Machines o CDE [4] Author How to encapsulate my experiment? Too many dependencies… Too many files to keep track… Sigh. Reviewers Collaborators How to compile this program? How to execute it? How to explore it? Sigh. ReproZip: Packing Experiments for Sharing and Publication Fernando Chirigati – NYU-Poly

ReproZip ReproZip is a packaging solution o It makes it easier for authors to pack experiments and for reviewers to verify computational results It creates reproducible packages from existing experiments on computational environment E o No need to port experiments to other system o Leverages provenance of computational results It unpacks an experiment on computational environment E’ It generates a workflow specification that encapsulates the execution of the experiment o Eases the verification process o Allows users to explore the experiment, while keeping track of provenance ReproZip: Packing Experiments for Sharing and Publication Fernando Chirigati – NYU-Poly

Overview packing (on environment E) files + binaries + workflow Reproducible Package Workflow Provenance Tree Experiment unpacking (on environment E’) Reproducible Package Experiment Extraction files + binaries + workflow verification and exploration ReproZip: Packing Experiments for Sharing and Publication Fernando Chirigati – NYU-Poly

References 1.GenePattern. 2.Madagascar. 3.S. B. Davidson and J. Freire. Provenance and scientific workflows: challenges and opportunities. In SIGMOD, pages , P. Guo. CDE: A Tool for Creating Portable Experimental Software Packages. Computing in Science and Engineering, 14(4):32-35, SystemTap. 6.MongoDB.

Thank You! Fernando Chirigati