XSEDE14 Reproducibility Workshop: Reproducibility in Large Scale Computing – Where do we stand Mark R. Fahey, NICS Robert McLay, TACC XSEDE14 - Reproducibility.

Slides:



Advertisements
Similar presentations
Configuration management
Advertisements

Configuration management
IT Issues and Support Structures Simulation Education and Complex Technology Based Practice.
Individual Position Slides: Jonathan Katz (University of Maryland) (Apologies I can’t be here in person)
May 12, 2015 XSEDE New User Tutorials and User Support: Lessons Learned Marcela Madrid.
1 ShouldersCorp contributions to OpenHealthTools October 1, 2010.
An Approach to Secure Cloud Computing Architectures By Y. Serge Joseph FAU security Group February 24th, 2011.
Java.  Java is an object-oriented programming language.  Java is important to us because Android programming uses Java.  However, Java is much more.
Proposed IT Research Support Mark E Bookout 10 Dec 2009.
UML Static diagrams. Static View: UML Component Diagram Component diagrams show the organization and dependencies among software components. Component:
CSCD 555 Research Methods for Computer Science
Software Frameworks for Acquisition and Control European PhD – 2009 Horácio Fernandes.
SETUP AND CONFIGURATIONS WEBLOGIC SERVER. 1.Weblogic Installation 2.Creating domain through configuration wizard 3.Creating domain using existing template.
Cambodia-India Entrepreneurship Development Centre - : :.... :-:-
Systems Life Cycle A summary of what needs to be done.
F Run II Experiments and the Grid Amber Boehnlein Fermilab September 16, 2005.
M. Taimoor Khan * Java Server Pages (JSP) is a server-side programming technology that enables the creation of dynamic,
HEPiX Catania 19 th April 2002 Alan Silverman HEPiX Large Cluster SIG Report Alan Silverman 19 th April 2002 HEPiX 2002, Catania.
1 Building and Maintaining Information Systems. 2 Opening Case: Yahoo! Store Allows small businesses to create their own online store – No programming.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
ControlDraw, Modularisation, Standards And Re-Use Standardised Specification and Modular Design How ControlDraw Help.
PCGRID ‘08 Workshop, Miami, FL April 18, 2008 Preston Smith Implementing an Industrial-Strength Academic Cyberinfrastructure at Purdue University.
Rocks ‘n’ Rolls An Introduction to Programming Clusters using Rocks © 2008 UC Regents Anoop Rajendra.
 To explain the importance of software configuration management (CM)  To describe key CM activities namely CM planning, change management, version management.
Making a great Project 2 OCR 1994/2360. Analysis This is the key to getting it right. Too many candidates skip through this section. It’s worth 20% of.
1 Thomas Lippert Senior Product Manager - Mobile What’s new in SMC 5.0.
Service Computation 2010November 21-26, Lisbon.
INFSOM-RI Juelich, 10 June 2008 ETICS - Maven From competition, to collaboration.
The Cluster Computing Project Robert L. Tureman Paul D. Camp Community College.
1 G4MICE Design Iteration Malcolm Ellis MICE Video Conference 21 st April 2004.
Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.
Downloading and Installing Autodesk Revit 2016
Michael Still Google Inc. October, Managing Unix servers the slack way Tools and techniques for managing large numbers of Unix machines Michael.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public ITE PC v4.0 Chapter 1 1 Operating Systems Networking for Home and Small Businesses – Chapter.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Troubleshooting 101: Effective Information Gathering Martha Lundgren Texas Association of School Boards Copyright 2005, Texas Association of School Boards.
Downloading and Installing Autodesk Inventor Professional 2015 This is a 4 step process 1.Register with the Autodesk Student Community 2.Downloading the.
George Tsouloupas University of Cyprus Task 2.3 GridBench ● 1 st Year Targets ● Background ● Prototype ● Problems and Issues ● What's Next.
Project Estimation techniques Estimation of various project parameters is a basic project planning activity. The important project parameters that are.
Heartbeat Is a daemon that provides cluster infrastructure. It must be combined with a cluster resource manager. The CRM takes care of stopping and starting.
CASE (Computer-Aided Software Engineering) Tools Software that is used to support software process activities. Provides software process support by:- –
XSEDE14 BoF: Drilling Down: Understanding User–Level Activity on Today’s Supercomputers XSEDE14 BoF: Drilling Down: Understanding User-Level Activity on.
National Energy Research Scientific Computing Center (NERSC) CHOS - CHROOT OS Shane Canon NERSC Center Division, LBNL SC 2004 November 2004.
Mike Hildreth DASPOS Update Mike Hildreth representing the DASPOS project 1.
Nanbor Wang, Balamurali Ananthan Tech-X Corporation Gerald Gieraltowski, Edward May, Alexandre Vaniachine Argonne National Laboratory 2. ARCHITECTURE GSIMF:
GP 2015 Client Event. Management Reporter As of 5/1/2015 the most recent version of MR is 2012 Cumulative Update 12. Mainstream Support for FRx 6.7 ended.
1 Configuration Database David Forrest University of Glasgow RAL :: 31 May 2009.
Software. Introduction n A computer can’t do anything without a program of instructions. n A program is a set of instructions a computer carries out.
TM Vienna v2.0. TM An Overview of Vienna v2.0 Vienna 2.0 was designed to address issues that exist with test management and execution software available.
© ExplorNet’s Centers for Quality Teaching and Learning 1 Describe applications and services. Objective Course Weight 5%
Planning Server Deployments Chapter 1. Server Deployment When planning a server deployment for a large enterprise network, the operating system edition.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
Analysis Model Zhengyun You University of California Irvine Mu2e Computing Review March 5-6, 2015 Mu2e-doc-5227.
© 2007 UC Regents1 Rocks – Present and Future The State of Things Open Source Grids and Clusters Conference Philip Papadopoulos, Greg Bruno Mason Katz,
CyVerse Workshop Discovery Environment Overview. Welcome to the Discovery Environment A Simple Interface to Hundreds of Bioinformatics Apps, Powerful.
Scientific Linux Inventory Project (SLIP) Troy Dawson Connie Sieh.
Canadian Bioinformatics Workshops
The Post Windows Operating System
Operating System Structures
Project Center Use Cases
Project Center Use Cases
Evaluating Existing Systems
Evaluating Existing Systems
Malwarebytes Installation Issues Number Facing error with Malwarebytes software is not something unusual as most of the users use to face.
1.2 System Design Basics.
Department of Intelligent Systems Engineering
DEPLOYING SECURITY CONFIGURATION
 Is a machine that is able to take information (input), do some work on (process), and to make new information (output) COMPUTER.
System Analysis and Design:
Outline Announcements: Version control with CVS HW II due today!
Presentation transcript:

XSEDE14 Reproducibility Workshop: Reproducibility in Large Scale Computing – Where do we stand Mark R. Fahey, NICS Robert McLay, TACC XSEDE14 - Reproducibility Workshop1

Reproducibility – what it means to me Full documentation of how an experiment (simulation) was conducted – Source code (unique versioning) – Input data – Computing environment Hardware Software (probably the most lacking component) – How often are the OS, compilers, MPI versions fully documented so that one knows how to reproduce the build environment – Ever seen a list of all the libraries linked into a code and the version of each library? – Published results XSEDE14 - Reproducibility Workshop2

Computing Center responsibilities Yale Report makes no mention of the role of computing centers I believe computing centers have an obligation to help solve some of the reproducibility issues – Namely documentation of the software environment Expecting a researcher to document all the system software in a complete way is asking too much – A researcher may not know what should be documented XSEDE14 - Reproducibility Workshop3

What can/should be done Need an automatic way to collect the information on the software (and versions) used by the researcher This is what the centers (national and campus level) should be providing – A couple prototypes exist that do this For example, NICS and TACC provide two similar but slightly different prototypes (ALTD and Lariat, respectively) that capture the libraries and their versions for each code built and run Solves part of the documentation problem; in fact NERSC uses so ALTD so that users can find out provenance data from old builds so they can rebuild their codes exactly like they did months or years before A new effort (called XALT) is under development to combine and extend these prototypes from NICS and TACC to capture even more information – everything mentioned above – Every center should be doing this for a variety of reasons better user support; efficient use of staff resources provenance data collection security related concerns And of course documentation for reproducibility – Collecting this information is very doable (as proven by the prototypes) and has proven to be very useful. It would help the researchers greatly with providing the information the Yale report recommends XSEDE14 - Reproducibility Workshop4

What can/should be done (2) Computing centers (university and national level) can also somewhat address repository and software versioning issues for researchers by providing snapshots of the OS and libraries and providing views into databases – Centers could document each and every version of all of their software and the duration it was the default on the machine – There are already efforts to capture most of this information at some centers. For example, at NICS, the programming environment software versioning is documented – Provide users a list of all the system defaults at any time from the past with “all-in-one” modules – Centers could make RPM bundles of the system software and provide a test bed cluster with which one could “revert” to past system software installations to confirm reproducibility Only for the life of the technology/award Test bed clusters are sometimes not part of HPC deployments Test bed clusters would likely be only a few nodes, unable to reproduce large simulations XSEDE14 - Reproducibility Workshop5