Progress on NA61/NA49 software virtualisation Dag Toppe Larsen Wrocław 05.03.2012
Outline Quick reminder of CERNVM Tasks Roadmap Input needed Each task in detail Roadmap Input needed 05.03.2012 NA61/NA49 meeting, Wrocław
CERNVM CERNVM is a Linux-distribution Addition software Designed specifically for virtual machines (VMs) Based on SLC (currently SLC5) Compressed image size ~300MB Both 32-bit and 64-bit versions Addition software “Standard” software via Conary package manager Experiment software via CVMFS Contextualisation: images adapted to experiment requirements during boot Data preservation: all images are permanently preserved 05.03.2012 NA61/NA49 meeting, Wrocław
CVMFS Distributed read-only file system for CERNVM (i.e. the same as AFS for LXPLUS) Can also be used by “real” machines (e.g. LXPLUS, grid) Files compressed and distributed via HTTP Global availability Central server, site replication via standard HTTP proxies Files decompressed and cached on (CERNVM) computer Can run without Internet access if all needed files are cached Mainly for experimental software, but also other “static” data (e.g. calibration data) Each experiment has a repository to store all versions of software Common software (e.g. ROOT) available from SFT repository 05.03.2012 NA61/NA49 meeting, Wrocław
Data preservation As technology evolves, no longer possible to run legacy software on modern platforms Must be preserved and accessible: Experiment data Experiment software Operating environment (operating system, libraries, compilers, hardware) Just preserving data and software is not enough Virtualisation may preserve operating environment 05.03.2012 NA61/NA49 meeting, Wrocław
CERNVM data preservation “Solution”: Experiment data stored on Castor Experiment software versions stored on CVMFS HTTP “lasting” technology Operation environments stored as CERNVM image versions Thus, a legacy version of CERNVM can be started as a VM, running a legacy version of experiment software Forward-looking approach (we start preserving now) 05.03.2012 NA61/NA49 meeting, Wrocław
Tasks Make experiment software available Facilitate batch processing Validate outputs On-demand virtual clusters Production reconstruction Reference cloud cluster Data bookkeeping web interface 05.03.2012 NA61/NA49 meeting, Wrocław
Make experiment software available NA61/NA49 software must be available on CVMFS for CERNVM to process data NA61 Legacy software chain installed Changes to be fed back to SVN SHINE Preparing to install Use ROOT from SFT repository Conary package manager to install other dependencies Have to create package for XZ, currently not available Will there be 64-bit version of SHINE, or will it always be 32-bit? Installation expected to be easier than for legacy chain Not “critical” until ready, but good to gain experience, and be prepared NA49 SLC4 development machine and repository set up Need expert support with actual installation 05.03.2012 NA61/NA49 meeting, Wrocław
Facilitate batch processing LXPLUS uses PBS batch system, CERNVM uses Condor New scripts prepared “Philosophical” differences PBS has separate script for each job Condor has common job description file Installation of legacy NA61 reconstruction chain recently completed Issues discovered, which requires modifications to scripts But no big issues 05.03.2012 NA61/NA49 meeting, Wrocław
Validate outputs Data processed on CERNVM/CVMFS have to produce same results as from LXPLUS/AFS A larger data set should be used for this testing As part of processing the data on CERNVM, one can automatically run ds_diff on the newly reconstructed data, and LXPLUS data copied from Castor “Easy” to add to Condor script Output from ds_diff must be checked by hand Make sure same versions of reconstruction software is used 05.03.2012 NA61/NA49 meeting, Wrocław
On-demand virtual clusters On boot, the VMs are set up (contextualised) with the configurations and software needed by the relevant experiment Environment (variables, etc.) Version of experimental software Version of OS image Hardware configuration (e.g. RAM) VMs can be discarded after the data is processed A script will create a virtual cluster with head node and a suitable number of worker nodes Cluster discarded when jobs are finished Initially command-line script Later controlled by data bookkeeping web interface 05.03.2012 NA61/NA49 meeting, Wrocław
Production reconstruction After outputs are validated, production reconstruction next step Cluster of “decent” size needed Need to submit ~50 VMs to process a large data set Reference cloud too small Need to negotiate with IT to use LXCLOUD (not- yet-public CERN cloud) CERN already has a large number of internal virtual machines 05.03.2012 NA61/NA49 meeting, Wrocław
Reference cloud cluster The virtual machines require a cluster of physical hosts A reference cloud cluster has been created Detailed documentation will simplify the process of replicating it at other sites Based on OpenNebula (popular cloud framework) KVM hypervisor Provides Amazon EC2 interface (de facto standard for cloud management) 05.03.2012 NA61/NA49 meeting, Wrocław
Data bookkeeping web interface A web interface for bookkeeping of the data to be created List all existing data with status (e.g. software versions used for processing) Easy selection of data for (re)processing with selected OS and software version A virtual on-demand cluster is created After processing, data written back to Castor Either based on existing frameworks, or on new development Likely using EC2 interface for the cloud management Can allow for great flexibility of processing site 05.03.2012 NA61/NA49 meeting, Wrocław
Roadmap Task Status/done Remaining Expected NA61 software installation Legacy framework SHINE End of March? NA49 software installation Development machine, software repository Software installation Facilitate batch system Condor job scripts Modifications/bug fixes March Validate outputs Small data set Large data set (using batch system) End of March On-demand virtual cluster Cluster creation / destroy scripts Production reconstruction Dependencies mostly ready Remaining tasks, prepare for real reconstruction April Reference cloud cluster Cluster working Documentation June/July Data bookkeeping web interface Initial planing Evaluate frameworks “First” version “Final” version End of October 05.03.2012 NA61/NA49 meeting, Wrocław
Input needed NA49 software installation Eventual SHINE issues Eventual validation issues How to practically arrange for production reconstruction Please keep virtualisation (CERNVM/CVMFS) in mind when making plans ... 05.03.2012 NA61/NA49 meeting, Wrocław