The CAVES Project Collaborative Analysis Versioning Environment System The CODESH Project COllaborative DEvelopment SHell Dimitri Bourilkov University.

Slides:



Advertisements
Similar presentations
Remote Visualisation System (RVS) By: Anil Chandra.
Advertisements

Overview of local security issues in Campus Grid environments Bruce Beckles University of Cambridge Computing Service.
Data Management Expert Panel - WP2. WP2 Overview.
SOFTWARE PRESENTATION ODMS (OPEN SOURCE DOCUMENT MANAGEMENT SYSTEM)
1 Richard White Design decisions: architecture 1 July 2005 BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Design decisions: architecture.
Chapter 9 Chapter 9: Managing Groups, Folders, Files, and Object Security.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
Using subversion COMP 2400 Prof. Chris GauthierDickey.
Astrophysics, Biology, Climate, Combustion, Fusion, Nanoscience Working Group on Simulation-Driven Applications 10 CS, 10 Sim, 1 VR.
Enterprise Search With SharePoint Portal Server V2 Steve Tullis, Program Manager, Business Portal Group 3/5/2003.
© , Michael Aivazis DANSE Software Issues Michael Aivazis California Institute of Technology DANSE Software Workshop September 3-8, 2003.
eGovernance Under guidance of Dr. P.V. Kamesam IBM Research Lab New Delhi Ashish Gupta 3 rd Year B.Tech, Computer Science and Engg. IIT Delhi.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
G51FSE Version Control Naisan Benatar. Lecture 5 - Version Control 2 On today’s menu... The problems with lots of code and lots of people Version control.
Version Control with git. Version Control Version control is a system that records changes to a file or set of files over time so that you can recall.
Version Control with Subversion. What is Version Control Good For? Maintaining project/file history - so you don’t have to worry about it Managing collaboration.
January, 23, 2006 Ilkay Altintas
Linux Operations and Administration
Customized cloud platform for computing on your terms !
MCSE Guide to Microsoft Exchange Server 2003 Administration Chapter Four Configuring Outlook and Outlook Web Access.
علیرضا فراهانی استاد درس: جعفری نژاد مهر Version Control ▪Version control is a system that records changes to a file or set of files over time so.
A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai.
ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.
Grid Leadership Avery –PI of GriPhyN ($11 M ITR Project) –PI of iVDGL ($13 M ITR Project) –Co-PI of CHEPREO –Co-PI of UltraLight –President of SESAPS Ranka.
Virtual Logbooks and Collaboration in Science and Software Development Dimitri Bourilkov, Vaibhav Khandelwal, Archis Kulkarni, Sanket Totala University.
Yannick Patois – CVS and Autobuild tools at CCIN2P3 – hepix - October, n° 1 CVS setup at CC-IN2P3 and Datagrid edg- build tools CVS management,
Subversion Code Deployment LifeCycle August 2011.
Hands-On Microsoft Windows Server 2008 Chapter 5 Configuring, Managing, and Troubleshooting Resource Access.
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
Git workflow and basic commands By: Anuj Sharma. Why git? Git is a distributed revision control system with an emphasis on speed, data integrity, and.
I Information Systems Technology Ross Malaga 4 "Part I Understanding Information Systems Technology" Copyright © 2005 Prentice Hall, Inc. 4-1 DATABASE.
Module 7 Active Directory and Account Management.
An Approach To Automate a Process of Detecting Unauthorised Accesses M. Chmielewski, A. Gowdiak, N. Meyer, T. Ostwald, M. Stroiński
DOSAR Workshop, Sao Paulo, Brazil, September 16-17, 2005 LCG Tier 2 and DOSAR Pat Skubic OU.
An Intro to Concurrent Versions System (CVS) ECE 417/617: Elements of Software Engineering Stan Birchfield Clemson University.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
CVS – concurrent versions system Network Management Workshop intERlab at AIT Thailand March 11-15, 2008.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
CSE 219 Computer Science III CVS
Datasets on the GRID David Adams PPDG All Hands Meeting Catalogs and Datasets session June 11, 2003 BNL.
NOVA Networked Object-based EnVironment for Analysis P. Nevski, A. Vaniachine, T. Wenaus NOVA is a project to develop distributed object oriented physics.
Giuseppe Codispoti INFN - Bologna Egee User ForumMarch 2th BOSS: the CMS interface for job summission, monitoring and bookkeeping W. Bacchi, P.
Computer Networking From LANs to WANs: Hardware, Software, and Security Chapter 13 FTP and Telnet.
The Advanced Data Searching System The Advanced Data Searching System with 24 February APCTP 2010 J.H Kim & S. I Ahn & K. Cho on behalf of the Belle-II.
Large Scale Nuclear Physics Calculations in a Workflow Environment and Data Provenance Capturing Fang Liu and Masha Sosonkina Scalable Computing Lab, USDOE.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.
The Astronomy challenge: How can workflow preservation help? Susana Sánchez, Jose Enrique Ruíz, Lourdes Verdes-Montenegro, Julian Garrido, Juan de Dios.
The Grid Effort at UF Presented by Craig Prescott.
SRG: A Digital Document-Enhanced Service Oriented Research Grid Ahmet E. Topcu Ahmet Fatih Mustacoglu Geoffrey C. Fox Aurel Cami Indiana University Computer.
EGEE User Forum Data Management session Development of gLite Web Service Based Security Components for the ATLAS Metadata Interface Thomas Doherty GridPP.
SWGData and Software Access - 1 UCB, Nov 15/16, 2006 THEMIS SCIENCE WORKING TEAM MEETING Data and Software Access Ken Bromund GST Inc., at NASA/GSFC.
DGC Paris WP2 Summary of Discussions and Plans Peter Z. Kunszt And the WP2 team.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
NOVA A Networked Object-Based EnVironment for Analysis “Framework Components for Distributed Computing” Pavel Nevski, Sasha Vanyashin, Torre Wenaus US.
J.P. Wellisch, CERN/EP/SFT SCRAM Information on SCRAM J.P. Wellisch, C. Williams, S. Ashby.
Internet Documentation and Integration of Metadata (IDIOM) Presented by Ahmet E. Topcu Advisor: Prof. Geoffrey C. Fox 1/14/2009.
Introduction to Git Yonglei Tao GVSU. Version Control Systems  Also known as Source Code Management systems  Increase your productivity by allowing.
22 Copyright © 2008, Oracle. All rights reserved. Multi-User Development.
Institute for the Protection and Security of the Citizen HAZAS – Hazard Assessment ECCAIRS Technical Course Provided by the Joint Research Centre - Ispra.
Introduction to AFS IMSA Intersession 2003 An Overview of AFS Brian Sebby, IMSA ’96 Copyright 2003 by Brian Sebby, Copies of these slides.
Event-Based Model for Reconciling Digital Entities Ahmet Fatih Mustacoglu Ahmet E. Topcu Aurel Cami Geoffrey C. Fox Indiana University Computer Science.
Origami: Scientific Distributed Workflow in McIDAS-V Maciek Smuga-Otto, Bruce Flynn (also Bob Knuteson, Ray Garcia) SSEC.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Introduction to Data Management in EGI
Enterprise Computing Collaboration System Example
Basic Concepts in Data Management
Module 01 ETICS Overview ETICS Online Tutorials
Architecture Competency Group
Presentation transcript:

The CAVES Project Collaborative Analysis Versioning Environment System The CODESH Project COllaborative DEvelopment SHell Dimitri Bourilkov University of Florida GriPhyN DPF 2004, Riverside, CA, USA, August 30, 2004

D.BourilkovCAVES / CODESH projects2 We already do this, but manually! Virtual Data Most scientific data are not simple “measurements”  produced from increasingly complex computations (e.g. reconstructions, calibrations, selections, simulations, fits etc.) HEP (and other sciences) increasingly CPU/Data intensive: Programs and how-to become a vital intellectual resource of the scientific community  need new ways to collaborate Virtual data are data products with a well defined method of (re)production; “virtuality” with respect to existence  can define data products for future production or record the “history” of products that exist now or have existed in the past Log data provenance by tracking how new data is derived from transformations on other data

D.BourilkovCAVES / CODESH projects3 Virtual Data Motivations Data track-ability and result audit-ability: "Virtual Logbook” In the nature of science Reproducibility of results Tools and data sharing and collaboration (data with “recipe”) Individuals discover other scientists’ work and build from it Different Teams can work in a modular, semi-autonomous fashion: reuse previous data/code/results or entire analysis chains Repair and correction of data – c.f. “make” Workflow management, Performance optimization: data staged-in from remote site OR re-created locally on demand? Transparency with respect to location and existence

D.BourilkovCAVES / CODESH projects4 The Metaphor A cave is a secure place to store stuff Usually you need a key to enter Stuff can be retrieved when needed (and if the temperature is kept constant, usually in good shape) Small caves can be private, larger are usually owned cooperatively When a cave is full, a new one is build To get something, one starts at the local cave and, if needed, widens the search …

D.BourilkovCAVES / CODESH projects5 CAVES / CODESH Projects Concentrate on the interactions between scientists collaborating over extended periods of time Seamlessly log, exchange and reproduce results and the corresponding methods, algorithms and programs Automatic and complete logging and reuse of work or analysis sessions (between checkpoints) Extend the power of users working or performing analyses in their habitual way, giving them virtual data capabilities Build functioning collaboration suites (stay close to users!) First prototypes use popular tools: Python, ROOT and CVS; e.g. all ROOT commands and CAVES commands available

D.BourilkovCAVES / CODESH projects6 CAVES / CODESH Architectures – Scalable and Distributed

D.BourilkovCAVES / CODESH projects7 CAVES / CODESH Architectures Three tier architecture: isolate client from back- end details; different implementations possible Lightweight clients (use ROOT; C++; Python; e.g. CVS API) Back-ends: e.g. CVS pservers (remote stores) with read/write access control; ARCH, Clarens etc Optional MySQL servers for metadata (fast search for large data volumes)

D.BourilkovCAVES / CODESH projects8 CAVES / CODESH Architectures – CVS Server Backend Sandbox programming – work on per session basis CVS provides version control by tagging releases CVS tags act as unique IDs for virtual data products (the namespace can be structured by a collaborating group e.g. one big cave or many barrels in a cave, selected on a session basis) Both local and remote modes of working CVS pservers (secure, efficient remote stores): Only CVS user accounts with password authentication, no UNIX accounts on the server (gridmapfile uses same idea) read/write access control lists (per user & directory)

D.BourilkovCAVES / CODESH projects9 Possible scenarios Case1: Simple User 1 : Does some analysis and produces a result with tag analX_user1. User 2: Browses all current tags in the repository and fetches the session stored with tag analX_user1. Case2: Complex User 1 : Does some analysis and produces a result with tag analX_user1. User 2: Browses all current tags in the repository and fetches the session stored with tag analX_user1. User 2: Does a modification in the program obtained from the session of user1 and stores the same along with a new result with tag analX_user2_mod_code. User 1: Browses the repository, finds that his program was modified and decides to extract that session using the tag analX_user2_mod_code. This scenario can be extended to include an arbitrary number of steps and users in a working group or groups in a collaboration.

D.BourilkovCAVES / CODESH projects10 Annotations Log annotations (& possibly summary results or metadata) in the repository /data equivalence!/ Brief annotations – use cvs –m “Annotation” … Optional complete annotations - possibly MySQL servers for metadata (fast search for large data volumes e.g. with indexing) The users can browse (a subset of) the existing tags, inspect the annotations, and, if interested, reproduce the results (or just get the howto), modify them etc.

D.BourilkovCAVES / CODESH projects11 Extensible Command Set Session commands open close During analysis help browse inspect startlog log annotate extract Administrative tasks copy move delete archive retrieve Implemented / To do CODESH commands: run, shell getenv getalias etc

D.BourilkovCAVES / CODESH projects12 Working Releases - CAVES

D.BourilkovCAVES / CODESH projects13 Working Releases - CODESH Virtual log-book for “shell” sessions Parts can be local (private) or shared Tracks environment variables, aliases etc during a session Reproduce complete working sessions Complex CMS ORCA example operational

D.BourilkovCAVES / CODESH projects14 Working Releases The client codes are easy to download and install (stored in CVS) – checkout and build All you need to run the clients is ROOT 3.05 or higher and CVS (from v 1_0_0 client is fully ROOT-compliant); or Python 1.5 or higher Users can browse the virtual data catalogs and reproduce the examples stored there They can play with e.g. new analyses and store them locally or on our server, or install new servers for groups collaborating on a project

D.BourilkovCAVES / CODESH projects15 Outlook Work in progress – first CAVES and CODESH releases out We are looking forward to user feedback Possible future directions: Different back-ends: web/grid service oriented Extend remote data access: e.g. Clarens, xrootd … Add GSI security Automatically convert session log to workflow Tune on smaller samples, schedule on grid for larger tasks A picture is better than 1000 words: Try out the releases ! CAVES white paper arXiv: physics/ More info at