Virtualisation for NA49/NA61 Dag Toppe Larsen UiB/CERN Zagreb, 11.10.2011
Outline Recapitulation Reference cloud NA49/NA61 data processing Why virtualisation? CERNVM CVMFS Reference cloud NA49/NA61 data processing Status Next steps Outlook
Why virtualisation? Data preservation Very flexible Avoids complexity of grid Easy software distribution Processing not constrained to CERN Take advantage of new LXCLOUD Take advantage of commercial clouds, e.g. Amazon EC2 Can develop in same VM as data will be processed on – should reduce failing jobs
CERNVM: introduction Dedicated Linux distribution for virtual machines Currently based on SLC5 Newer and updated versions will be made available for new software Old versions will still be available for legacy analysis software Supports all common hypervisors Supports Amazon EC2 clouds
CERNVM: layout
CVMFS: introduction Distributed file system based on HTTP Read-only Distribution of binary files – no need for local compile & install All libraries & software that can not be expected to be found on “standard” Linux should be distributed Each experiment has one or more persons responsible for providing updates and resolve dependencies
CVMFS: software repositories Several repositories mounted under /cvmfs/ Each repository typically corresponds to one “experiment” (or other “entity”) Experiments have “localised” names, e.g. /cvmfs/na61.cern.ch/ Common software in separate repositories, e.g. ROOT in /cvmfs/sft.cern.ch/ Several versions of software may be distributed in parallel – user can choose version to run
Reference cloud: introduction Small CERNVM reference private cloud Condor batch system OpenNebula management Amazon EC2 interface Reference installation for other clouds Detailed/simple step-by-step instructions for replication at other sites will be provided Attempt to make “uniform” installations Site customisation possible for monitoring, etc.
Reference cloud: OpenNebula framework Popular framework for management of virtual machines Supports most common hypervisors Choice: KVM/QEMU – fast, does not require modifications to OS Amazon EC2 interface Possible to include VMs from other clouds, and provide hosts to other clouds Web management interface
Reference cloud: Amazon EC2 interface EC2 is the commercial cloud offered by Amazon EC2 also describes an interface for managing VMs Has become de-facto interface for all clouds, including private Hence, using the EC2 interface allows for great flexibility in launching VMs on both private and commercial clouds
Reference cloud: public vs. private clouds
Reference cloud: Elasticfox web user interface VM management through browser Can configure/start/stop VM instances, add/remove VM images Through Amazon EC2 interface Similar interface needed for data processing
NA49/NA61 processing: status CVMFS software installation Software for NA61 installed Issues with some set-up file options? Can also be used for processing on LXPLUS/ LXBATCH No need to adapt scripts (except for environment) NA49 software in progress Processing on CERNVM: Currently, reconstruction can be run “by hand” Batch system exists, scripts being adapted
NA49/NA61 processing: NA61 CVMFS installation Available under /cvmfs/na61.cern.ch/ On CERNVM virtual machines On “ordinary” computers having CVMFS installed, including LXPLUS/LXBATCH Script to set up environment: . /cvmfs/na61.cern.ch/library/etc/na61_env.sh
NA49/NA61 processing: next steps Two main tasks: Software validation CERNVM processing set-up Can be largely done in parallel Suggested steps on following slides Use LXBATCH for initial software validation Then validate on CERNVM Then set up production system
NA49/NA61 processing: next steps Step 1a: software validation on LXBATCH Select reference data set already processed by LXBATCH using software on AFS Reprocess the data on LXBATCH, but using software on CVMFS (instead of AFS) Compare output from CVMFS and AFS software Correct eventual problems Decouple issues related to CVMFS software installation from CERNVM set-up Ready to start this step now
NA49/NA61 processing: next steps Step 1b: CERNVM set-up, convert all processing scripts LXPLUS/LXBATCH uses PBS, CERNVM Condor Remove AFS references Castor Kerberos authentication (distribute kinit keytab file?) In progress, soon ready for reconstruction
NA49/NA61 processing: next steps Step 2a: software validation on LXBATCH Use CVMFS software for “normal” processing on LXBATCH (instead of AFS) Step 2b, CERNVM set-up: Select reference data set already processed by LXBATCH using software on AFS Reprocess the same data using CERNVM on test/ reference cloud, using software on CVMFS Compare output from CERNVM and LXBATCH Correct eventual problems CVMFS issues should already be found in step 1a
NA49/NA61 processing: next steps Step 3: production processing on LXCLOUD LXCLOUD is new cloud service offered by CERN IT Experience from reference cloud directly transferable Possible to set up processing facilities at other sites than CERN, based on reference cloud, if needed
NA49/NA61 processing: what is needed To successfully adapt processing to CERNVM, some input is needed Overview of (all) processing types performed on LXBATCH Scripts in use How to set-up/configure Also analysis (not only reconstruction)? Reference data sets to compare output Who is responsible for the various (software) components – who can/wants to participate? Carrying out the steps on the previous slides
NA49/NA61 processing: web user interface Web user interface for managing VM instances/ images exists Needed: processing-centred web user interface What data to process and what type of analysis List of available data and status, request processing Specify versions of software and VM Specify requirements for processing nodes Both VM and processing management in the same interface, or two separate interfaces? Generic experiment interface?
NA49/NA61 processing: web user interface “Step 4” CERNVM processing does not depend on it But will make the user experience much improved Considering to extend an existing tool for managing VMs to also manage data/processing
Status/outlook Reference cloud up and running NA61 software available on CVMFS NA49 software soon available on CVMFS Software validation ready to begin Data processing on CERNVM: Currently by hand, using batch soon Needed: better understanding of different processing task, who is responsible for what Needed: processing-centred web interface
Backup
Data preservation: motivation Preserve historic record Even after experiment end-of-life, data reprocessing might be desirable if future experiments reach incompatible results Many past experiments have already lost this possibility
Data preservation: challenges Two parts: data & software Data: preserve by migration to newer storage technologies Software: more complicated Just preserving source code/binaries not enough Strongly coupled to OS/library/compiler version (software environment) Software environment strongly coupled to hardware, platform will eventually become unavailable Porting to new platform requires big effort
Data preservation: solution Possible solution: virtualisation “Freeze” hardware in software Can run legacy analysis software on legacy versions of Linux they were originally developed for in VMs Software environment preserved, no need to modify code Comes for free if processing is already done on VMs
CERNVM: use cases Two main use cases: Computing centre Images for head and batch nodes Includes Condor batch system Personal computers Desktop (GUI) and basic (CL) images “Personal” use Code can be developed (desktop image) in similar environment/platform it will be processed (batch node image)
CERNVM: contextualisation All CERNVM instances initially identical Experiment specific software configuration/set- up introduced via contextualisation Two types CD-ROM image – mainly site specific configuration EC2 user data – mainly experiment specific Executed during start-up of VM
CVMFS: design Compressed files on HTTP server Downloaded, decompressed and cached locally on first use Possible to run software without Internet connection A hierarchy of standard HTTP proxy servers distribute the load Can also be used by non-VMs, e.g. LXPLUS/LXBATCH, other clusters, personal laptops
Reference cloud: virtual distributed Condor cluster Based on VMs in cloud Can be distributed over several sites Even if nodes are on different sites, they will appear to be in the same cluster A tier 1 can include VMs provided by tier 2s in its virtual Condor cluster Can be very work-saving, as the tier 2s do not need to set up job management themselves Other possibility: local CERNVM batch system run local jobs (like normal cluster)