Download presentation
Presentation is loading. Please wait.
1
Dag Toppe Larsen UiB/CERN CERN, 16.06.2012
CernVM for NA49/NA61 Dag Toppe Larsen UiB/CERN CERN,
2
Why virtualisation? Data preservation Very flexible
Avoids complexity of grid Easy software distribution Processing not constrained to CERN Take advantage of new LXCLOUD Take advantage of commercial clouds, e.g. Amazon EC2 Can develop in same VM as data will be processed on – should reduce failing jobs
3
Data preservation: motivation
Preserve historic record Even after experiment end-of-life, data reprocessing might be desirable if future experiments reach incompatible results Many past experiments have already lost this possibility
4
Data preservation: challenges
Two parts: data & software Data: preserve by migration to newer storage technologies Software: more complicated Just preserving source code/binaries not enough Strongly coupled to OS/library/compiler version (software environment) Software environment strongly coupled to hardware, platform will eventually become unavailable Porting to new platform requires big effort
5
Data preservation: solution
Possible solution: virtualisation “Freeze” hardware in software Can run legacy analysis software on legacy versions of Linux they were originally developed for in VMs Software environment preserved, no need to modify code Comes for “free” if processing is already done on VMs
6
CERNVM: introduction Dedicated Linux distribution for virtual machines
Currently based on SLC5 Newer and updated versions will be made available for new software Old versions will still be available for legacy analysis software Supports all common hypervisors Supports Amazon EC2 clouds
7
CERNVM: layout
8
CERNVM: use cases Two main use cases:
Computing centre Images for head and batch nodes Includes Condor batch system Personal computers Desktop (GUI) and basic (CL) images “Personal” use Code can be developed (desktop image) in similar environment/platform it will be processed (batch node image)
9
CERNVM: contextualisation
All CERNVM instances initially identical Experiment specific software configuration/set- up introduced via contextualisation Two types CD-ROM image – mainly site specific configuration EC2 user data – mainly experiment specific Executed during start-up of VM
10
CVMFS: introduction Distributed file system based on HTTP
Read-only Distribution of binary files – no need for local compile & install All libraries & software that can not be expected to be found on “standard” Linux should be distributed Each experiment has one or more persons responsible for providing updates and resolve dependencies
11
CVMFS: software repositories
Several repositories mounted under /cvmfs/ Each repository typically corresponds to one “experiment” (or other “entity”) Experiments have “localised” names, e.g. /cvmfs/na61.cern.ch/ Common software in separate repositories, e.g. ROOT in /cvmfs/sft.cern.ch/ Several versions of software may be distributed in parallel – user can choose version to run
12
CVMFS: design Compressed files on HTTP server
Downloaded, decompressed and cached locally on first use Possible to run software without Internet connection A hierarchy of standard HTTP proxy servers distribute the load Can also be used by non-VMs, e.g. LXPLUS/LXBATCH, other clusters, personal laptops
13
Reference cloud: introduction
Small CERNVM reference private cloud Condor batch system OpenNebula management Amazon EC2 interface Reference installation for other clouds Detailed/simple step-by-step instructions for replication at other sites will be provided Attempt to make “uniform” installations Site customisation possible for monitoring, etc.
14
Reference cloud: virtual distributed Condor cluster
Based on VMs in cloud Can be distributed over several sites Even if nodes are on different sites, they will appear to be in the same cluster A tier 1 can include VMs provided by tier 2s in its virtual Condor cluster Can be very work-saving, as the tier 2s do not need to set up job management themselves Other possibility: local CERNVM batch system run local jobs (like normal cluster)
15
Reference cloud: OpenNebula framework
Popular framework for management of virtual machines Supports most common hypervisors Choice: KVM/QEMU – fast, does not require modifications to OS Amazon EC2 interface Possible to include VMs from other clouds, and provide hosts to other clouds Web management interface
16
Reference cloud: Amazon EC2 interface
EC2 is the commercial cloud offered by Amazon EC2 also describes an interface for managing VMs Has become de-facto interface for all clouds, including private Hence, using the EC2 interface allows for great flexibility in launching VMs on both private and commercial clouds
17
Reference cloud: public vs. private clouds
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.