NA61 Collaboration Meeting CERN, December Predrag Buncic, Mihajlo Mudrinic CERN/PH-SFT Enabling long term data preservation using virtualization technology
NA61 Collaboration Meeting CERN, December New Ideas /Physics Data Taking Quality Control Data Analysis Tuning MC models Publication Detector Upgrade Extension of Expected Life Full Data Preservation Program Established The Collaboration has to be ready to cope with technological challenges ( Hadware/Software) Detector Studies Calibration Determination of calibration constants. Detector performance studies Testing off-line Reconstruction and Simulation chains Understanding detector Collaboration Lifecycle Upgrade of reconstruction and simulation software. Determination of new calibration procedures and constants. Reprocessing of Data and MC. NA49 experience Software written in 1994 data taken in are still being used for physics analysis and publications, most of the raw data lost: obsolete medium (Sony tapes) and tape drives not supported by CERN/ IT → new reconstruction is not possible reconstruction/simulation software written in C, C++ and Fortran very difficult (manpower/time) to maintain reconstruction/simulation software in a view of frequent changes of operating systems and compilers The above significantly limits the possibility to use the NA49 data for future analysis
NA61 Collaboration Meeting CERN, December Data Preservation Problem HEP experiments will run for many years Hardware and software environment will inevitably (possibly dramatically) change in the lifespan of the experiments Physicists like to constantly change their software (algorithms) but do not like when external changes in infrastructure require changes in the their applications In order to assure capability of the experiments some time in the future to (re)analyze their entire data sample we need to 1. Conserve and transfer all data to the media that can be affectively used in that point in time 2. Be able to use exactly the same software as it was used when original files were created and be able to modify algorithms and apply new algorithms them on the same old data How to preserve experiment software and keep it usable and accountable over many years?
NA61 Collaboration Meeting CERN, December To avoid difficulties that NA49 had raw data should be preserved over the whole analysis period software should be made independent of underlying OS and physical hardware the reconstruction/simulation results obtained in 2009 should be reproducible in 20 years from now In such case term “data preservation” means a lot more more than conserving 4- vectors and represents a practical problem seen by many running experiments NA61 Problem NA61 (successor of NA49 in terms of hardware and software) started data taking in 2009, and recorded data (100 TB) for 9 different reactions the data taking should continue over the next 5 years with up 40 reactions (1000 TB) to be recorded, based on the NA49 experience we expect that physics interest in the NA61 data analysis may continue for many years NA61 = 10 x NA49 NA61
NA61 Collaboration Meeting CERN, December Study Group for Data Preservation and Long Term Analysis in HEP Investigate and confront different data models in HEP. Address the hardware and software persistency status. Discus funding programs and other related international initiatives. Publications Jurnals, arXiv, spires. MEANING OF “HEP DATA” Digital Information Experimental Data, MC, database. Software Simulation, reconstruction, analysis. Documentation R&D notes, manuals, slides, meta information like: Hyper-news, forums Expertise(People) Usually decrease of man power during life time of the Collaboration. Good news: You are not alone…
NA61 Collaboration Meeting CERN, December Virtualization Comeback Virtualization is broad term that refers to the abstraction of computer resources Old technology making comeback thanks to breakdown in frequency scaling and appearance of multi and many core CPU technology Enabling technology of Cloud computing Virtualization is here to stay for foreseeable future
NA61 Collaboration Meeting CERN, December OS platform SL3 SL4 SL5 Experiment software x86 x86_64 Hardware platform VM How can virtualization help? Decouples application lifecycle from evolution of system infrastructure OS platform Hardware platform Allows software to be built on well defined minimal platform Remains constant in time
NA61 Collaboration Meeting CERN, December PH/SFT R&D (WP9) Project in Physics Department (SFT Group) the same group that takes care of ROOT & Geant, looks for common projects and seeks synergy between experiments CernVM Project started in 01/01/2007, funded for 4 years Mihajlo joined 01/10/2010 on data preservation and virtualization with focus on NA61
NA61 Collaboration Meeting CERN, December Aims to provide a complete, portable and easy to configure user environment in form of a Virtual Machine for developing and running LHC data analysis locally and on the Grid independent of physical software and hardware platform (Linux, Windows, MacOS) Code check-out, edition, compilation, local small test, debugging, … Grid submission, data access… Event displays, interactive data analysis, … Suspend, resume… Helps to reduce effort to install, maintain and keep up to date the experiment software Web site: 9 CernVM Project
NA61 Collaboration Meeting CERN, December CernVM architecture CernVM downloads only what is really needed and puts it in the cache Does not require persistent network connection (offline mode) Minimal impact on the network Defines common platform that can be used by all experiments/projects
NA61 Collaboration Meeting CERN, December Conary Package Manager Every build and every file installed on the system is automatically versioned and accounted for in database
NA61 Collaboration Meeting CERN, December Appliance Builder Installable CD/DVD Stub Image Raw Filesystem Image Netboot Image Compressed Tar File Demo CD/DVD (Live CD/DVD) Raw Hard Disk Image Vmware ® Virtual Appliance Vmware ® ESX Server Virtual Appliance Microsoft ® VHD Virtual Apliance Xen Enterprise Virtual Appliance Virtual Iron Virtual Appliance Parallels Virtual Appliance Amazon Machine Image Update CD/DVD Appliance Installable ISO Starting from experiment software… …ending with a custom Linux specialised for a given task
NA61 Collaboration Meeting CERN, December Can we build a production environment [using existing technology] such that the experiments can be sure that they can redo their reconstruction/analysis at any time?
NA61 Collaboration Meeting CERN, December Basic premises System must be self contained and scalable Should not depend on any site specific services Applications should run in Virtual Machine that is versioned and can be recreated to exact specifications It is not enough just to store Virtual Machine image Application should always do only local file I/O Interaction with any external services cannot be guaranteed over time The experiment software should be stored and archived externally to avoid need to modify VM whenever software is updated Access only via Web protocols like HTTP and file system (CernVM-FS) No incoming network connectivity should be required Legacy version of OS running in VM may pose site security risk End user API must be provided to submit and monitor jobs If possible scriptable supporting one of the common scripting languages
NA61 Collaboration Meeting CERN, December Computer Center in a Box Common services hosted by front-end node Batch master, NAT gateway, storage and HTTP proxy Each physical node Contributes to common storage pool Runs hypervisor, batch worker Exports storage local storage to common pool Virtual Machines Started by the batch scheduler and runs jobs Only limited outgoing network connectivity via gateway node/HTTP proxy Access to data files via POSIX (file system) layer Software delivered to VMs from Web server repository Built form recipes and components stored in strongly versioned repository Access to external mass storage via storage proxy End user API to submit jobs HTTP Proxy Storage Proxy NAT Gateway Batch Master 1 CernVM Storage Server Batch Worker Hypervisor 1..n MSS 1..n S/WTCP/IPAPI
NA61 Collaboration Meeting CERN, December Existing components 1) CernVM for NA61 2) KVM - Linux native hypervisor 3) Condor - batch system that works well in dynamic environment and supports running jobs in VMs 4) Xrootd server - high performance file server, used by LHC experiments and distributed with ROOT 5) Xrootd redirector - each aggregates up to 64 servers, can cluster up to 64k servers 6) Xrootd supports access to MSS systems using SRM extension 7) Standard HTTP proxy (Squid) 8) CernVM-FS repository for software distribution 9) xcfs - POSIX File System for Xrootd with Castor backend 10) GANGA - user interface and front-end to various batch systems and Grid 11) MonALISA - monitoring and accounting HTTP Proxy Storage Proxy NAT Gateway Batch Master CernVM Storage Server Batch Worker Hypervisor MSS 2 2 S/WTCP/IPAPI
NA61 Collaboration Meeting CERN, December Project Timeline ONDJFMAMJJAS CernVM T M1 T2 T3 T4 T5 T6 M2 M3 M1CernVM /10/2010 T1NA61 release certification in CernVM /11/2010 T2Testing components, designing system 30/11/2010 T3Installing Cluster15/12/2010 M2Cluster installed15/12/2010 T4Developing and testing job submition, Ganga interface, monitoring 31/01/2010 T5Developing and testing xrootd, xcfs 28/02/2010 T6Testing complete system 31/03/2010 M3First usable version 30/10/
NA61 Collaboration Meeting CERN, December Conclusions Long term data preservation without equivalent mechanism for software preservation is of limited usefulness to the experiments Technology exists today to aide this process and is likely to stay with us Virtualization enables Cloud computing and that seems to be where all industry is going By combining existing software components we can build scalable infrastructure Can remove application dependency on OS and physical hardware that evolves with time Compatible with cloud computing paradigm which is gaining the grounds in industry Compatible with ATLAS T3 “in a box” model We plan to build first usable prototype in 6 months from now We need at least one person from NA61 who could help us setup test cases to validate our approach