DPHEP Workshop CERN, December 7 2009 - 1 Predrag Buncic (CERN/PH-SFT) CernVM R&D Project Portable Analysis Environments using Virtualization.

Slides:



Advertisements
Similar presentations
Single Sign-On with GRID Certificates Ernest Artiaga (CERN – IT) GridPP 7 th Collaboration Meeting July 2003 July 2003.
Advertisements

Ljubomir Ivaniš CPU d.o.o.
Introduction to Virtualization
Chapter 2 Introduction to Systems Architecture. Chapter goals Discuss the development of automated computing Describe the general capabilities of a computer.
Introduction to Systems Analysis and Design
Paper on Best implemented scientific concept for E-Governance Virtual Machine By Nitin V. Choudhari, DIO,NIC,Akola By Nitin V. Choudhari, DIO,NIC,Akola.
Virtualization and Open source Software Mr. Lau Ka Lun – Lai King Catholic Secondary School Date: 9 th, 21 st, 22 nd March, 2011.
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
1 The Virtual Reality Virtualization both inside and outside of the cloud Mike Furgal Director – Managed Database Services BravePoint.
Cloud Computing WG (initiative in AFACT) Institute For Information Industry.
Paper on Best implemented scientific concept for E-Governance projects Virtual Machine By Nitin V. Choudhari, DIO,NIC,Akola.
INTRODUCTION TO CLOUD COMPUTING CS 595 LECTURE 7 2/23/2015.
Framework for Automated Builds Natalia Ratnikova CHEP’03.
1 port BOSS on Wenjing Wu (IHEP-CC)
Cloud Computing 1. Outline  Introduction  Evolution  Cloud architecture  Map reduce operation  Platform 2.
INTRODUCTION TO CLOUD COMPUTING CS 595 LECTURE 2.
Introduction to Cloud Computing
Installing Windows Vista Lesson 2. Skills Matrix Technology SkillObjective DomainObjective # Performing a Clean Installation Set up Windows Vista as the.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
Predrag Buncic (CERN/PH-SFT) CernVM - a virtual software appliance for LHC applications C. Aguado-Sanchez 1), P. Buncic 1), L. Franco 1), A. Harutyunyan.
Predrag Buncic (CERN/PH-SFT) CernVM - a virtual software appliance for LHC applications C. Aguado-Sanchez 1), P. Buncic 1), L. Franco 1), S. Klemer 1),
Predrag Buncic (CERN/PH-SFT) WP9 - Workshop Summary
ALICE Offline Week | CERN | November 7, 2013 | Predrag Buncic AliEn, Clouds and Supercomputers Predrag Buncic With minor adjustments by Maarten Litmaath.
Chapter 2 Introduction to Systems Architecture. Chapter goals Discuss the development of automated computing Describe the general capabilities of a computer.
Presentation Name / 1 Visual C++ Builds and External Dependencies NAME.
VMWare Workstation Installation. Starting Vmware Workstation Go to the start menu and start the VMware Workstation program. *Note: The following instructions.
Predrag Buncic (CERN/PH-SFT) Introduction to WP9 Portable Analysis Environment Using Virtualization Technology IBM-VM 360, CERNVM,
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
Microsoft Management Seminar Series SMS 2003 Change Management.
NA61/NA49 virtualisation: status and plans Dag Toppe Larsen CERN
WLCG Overview Board, September 3 rd 2010 P. Mato, P.Buncic Use of multi-core and virtualization technologies.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) Giuseppe Andronico INFN Sez. CT / Consorzio COMETA Beijing,
© Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP Restricted Module 7.
2012 Objectives for CernVM. PH/SFT Technical Group Meeting CernVM/Subprojects The R&D phase of the project has finished and we continue to work as part.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
Predrag Buncic (CERN/PH-SFT) Virtualizing LHC Applications.
CS223: Software Engineering Lecture 2: Introduction to Software Engineering.
SPI NIGHTLIES Alex Hodgkins. SPI nightlies  Build and test various software projects each night  Provide a nightlies summary page that displays all.
The CernVM Infrastructure Insights of a paradigmatic project Carlos Aguado Sanchez Jakob Blomer Predrag Buncic.
Predrag Buncic (CERN/PH-SFT) Software Packaging: Can Virtualization help?
2nd ASPERA Workshop May 2011, Barcelona, Spain P. Mato /CERN.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
Predrag Buncic (CERN/PH-SFT) Virtualization – the road ahead.
The CernVM Project A new approach to software distribution Carlos Aguado Jakob Predrag
36 th LHCb Software Week Pere Mato/CERN.  Provide a complete, portable and easy to configure user environment for developing and running LHC data analysis.
NA61 Collaboration Meeting CERN, December Predrag Buncic, Mihajlo Mudrinic CERN/PH-SFT Enabling long term data preservation.
Predrag Buncic (CERN/PH-SFT) CernVM Status. CERN, 24/10/ Virtualization R&D (WP9)  The aim of WP9 is to provide a complete, portable and easy.
EGI-InSPIRE RI EGI Webinar EGI-InSPIRE RI Porting your application to the EGI Federated Cloud 17 Feb
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Journées Informatiques de l'IN2P May 2010, Aussois, France P. Mato /CERN.
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
Predrag Buncic (CERN/PH-SFT) Virtualization R&D (WP9) Status Report.
Fermilab Scientific Computing Division Fermi National Accelerator Laboratory, Batavia, Illinois, USA. Off-the-Shelf Hardware and Software DAQ Performance.
Predrag Buncic, CERN/PH-SFT The Future of CernVM.
CHEP 2010 Taipei, 19 October Predrag Buncic Jakob Blomer, Carlos Aguado Sanchez, Pere Mato, Artem Harutyunyan CERN/PH-SFT.
Canadian Bioinformatics Workshops
Unit 3 Virtualization.
Let's talk about Linux and Virtualization in 'vLAMP'
Virtualisation for NA49/NA61
Blueprint of Persistent Infrastructure as a Service
Dag Toppe Larsen UiB/CERN CERN,
Progress on NA61/NA49 software virtualisation Dag Toppe Larsen Wrocław
Dag Toppe Larsen UiB/CERN CERN,
Virtualisation for NA49/NA61
CernVM Status Report Predrag Buncic (CERN/PH-SFT).
Traditional Virtualized Infrastructure
Chapter-1 Computer is an advanced electronic device that takes raw data as an input from the user and processes it under the control of a set of instructions.
Presentation transcript:

DPHEP Workshop CERN, December Predrag Buncic (CERN/PH-SFT) CernVM R&D Project Portable Analysis Environments using Virtualization technology

DPHEP Workshop CERN, December Overview Introduction  Software packaging and distribution problem  Job execution environment problem  Long term software preservation and accountability Can virtualization technology help us solve all this problems at the same time? CernVM Project  Status and perspective Conclusions

DPHEP Workshop CERN, December Problem #1 LHC  Millions of lines of code C++, C, Fortran, python, perl…  Different packaging and software distribution models Complicated software installation/update/configuration procedure  Needs an expert knowledge and lots of patience by the end user (physicist) to keep up And many just give up and use shared resources like lxplus Recent technology trends  Moore’s law will still hold but instead of faster CPU we will get multi and many cores in the same package How can profit from (multi and many core) CPU power of user workstations and use them at least as development platform?

DPHEP Workshop CERN, December Problem #2 Distributed nature of data processing (Grid)  meant to be a solution and not part of the problem Heterogeneous set of resources  In terms of OS, QoS, configuration  Independent administration domains and evolution cycles  Potential source of many runtime errors Very complex and difficult to maintain middleware software stack  Trying to create illusion of Grid as homogeneous resource  Long and slow validation and certification process focused on middleware and not on applications, difficult to do OS upgrades How to decouple application and infrastructure lifecycles and assure a homogeneous job execution environment compatible with one in which application was developed?

DPHEP Workshop CERN, December Problem #3 LHC experiments will run for many years  Hardware and software environment will inevitably (possibly dramatically) change in the lifespan of the experiments  Physicists like to constantly change their software (algorithms) but do not like when external changes in infrastructure require changes in the their applications In order to assure capability of the experiments some time in the future to (re)analyze their entire data sample we need to 1. Conserve and transform all data on the media that can be affectively used in that point in time 2. Be able to use exactly the same software as it was used when original files were created and be able to modify algorithms and apply new algorithms them on the same old data How to preserve experiment software and keep it usable and accountable over many years?

DPHEP Workshop CERN, December An example: NA49/NA61 NA49 experience  Software written in 1994  data taken in are still being used for physics analysis and publications,  most of the raw data lost: obsolete medium (Sony tapes) and tape drives not supported by CERN/ IT → new reconstruction is not possible  reconstruction/simulation software written in C, C++ and Fortran  very difficult (manpower/time) to maintain reconstruction/simulation software in a view of frequent changes of operating systems and compilers The above significantly limits the possibility to use the NA49 data for future analysis

DPHEP Workshop CERN, December An example: NA49/NA61 NA61 (successor of NA49 in terms of hardware and software) started data taking in 2009, and recorded data (100 TB) for 9 different reactions  the data taking should continue over the next 5 years with up 40 reactions (1000 TB) to be recorded,  based on the NA49 experience we expect that physics interest in the NA61 data analysis  may continue for many years To avoid difficulties that NA49 had  raw data should be preserved over the whole analysis period  software should be made independent of underlying OS and physical hardware  the reconstruction/simulation results obtained in 2009 should be reproducible in 20 years from now In such case term “data preservation” means a lot more more than conserving 4-vectors but represents a practical problem seen by the running experiment

DPHEP Workshop CERN, December Can virtualization technology help us to solve all these problems at the same time?

DPHEP Workshop CERN, December R&D Project in PH Department (WP9), started in 2007, 4 years Aims to provide a complete, portable and easy to configure user environment for developing and running LHC data analysis locally and on the Grid independent of physical software and hardware platform (Linux, Windows, MacOS)  Code check-out, edition, compilation, local small test, debugging, …  Grid submission, data access…  Event displays, interactive data analysis, …  Suspend, resume… Decouple application lifecycle from evolution of system infrastructure Reduce effort to install, maintain and keep up to date the experiment software Web site: 9 CernVM Project

DPHEP Workshop CERN, December Application Deployment Today OS Application Libraries Tools Databases Hardware Traditional model  Horizontal layers  Independently developed  Maintained by different groups  Different lifecycle Application is deployed on top of the stack  Breaks if any layer changes  Needs to be certified every time when something changes  Results in deployment nightmare Incompatible with “long term stability and accountability” requirement

DPHEP Workshop CERN, December Virtual Machine CernVM Approach OS Libraries Tools Databases Application Vertical integration  Starting from the application  Finding its requirements and dependencies  Adding required tools and libraries  Building minimal OS  Bundling all this into Virtual Machine image Application is in control  Virtual Machines can be versioned like applications Virtualization technology  Enables decoupling of OS running in VM from the physical hardware

DPHEP Workshop CERN, December Virtualizing Applications  Installable CD/DVD  Stub Image  Raw Filesystem Image  Netboot Image  Compressed Tar File  Demo CD/DVD (Live CD/DVD)  Raw Hard Disk Image  Vmware ® Virtual Appliance  Vmware ® ESX Server Virtual Appliance  Microsoft ® VHD Virtual Apliance  Xen Enterprise Virtual Appliance  Virtual Iron Virtual Appliance  Parallels Virtual Appliance  Amazon Machine Image  Update CD/DVD  Appliance Installable ISO Starting from experiment software… …ending with a custom Linux specialised for a given task

DPHEP Workshop CERN, December Conary Package Manager Every build and every file installed on the system is automatically versioned and accounted for in database

DPHEP Workshop CERN, December

DPHEP Workshop CERN, December CernVM architecture The experiments are packaging gigabytes of code per release  but really use only fraction of it at runtime CernVM downloads only what is really needed and puts it in the cache  Does not require persistent network connection (offline mode)  Minimal impact on the network  Defines common platform that can be used by all experiments/projects

DPHEP Workshop CERN, December ~1500 different IP addresses CernVM users

DPHEP Workshop CERN, December Login to Web interface 2. Create user account 3. Select experiment, appliance flavor and preferences As easy as 1,2,3

DPHEP Workshop CERN, December CernVM as job hosting environment on Cloud/Grid  Ideally, users would like to run their applications on the grid (or cloud) infrastructure in exactly the same conditions in which they were originally developed CernVM already provides development environment and can be deployed on cloud (EC2)  One image supports all four LHC experiments  Easily extensible to other communities In this model, it is essential to allow experiments run their own virtual images (=applications) on Grid/Cloud

DPHEP Workshop CERN, December Conclusions Long term data preservation without equivalent mechanism for software preservation is of limited usefulness to the experiments Technology exists today to aide this process and is likely to stay with us  Virtualization enables Cloud computing and that seems to be where all industry is going Using virtualization to achieve long term software preservation requires to  Make end users (physicist) feel comfortable to use virtual machine on their desktop/laptop to carry out daily work on developing and using experiment software  Convince resource providers to accept this technology and run virtual machines in places of batch jobs that they run today  Give application central role and derive OS from it while keeping track of all system components in an external version control system CernVM Software Appliance  Used by ATLAS, LHCb and to lesser extent ALICE and CMS  Already deployable on the cloud (EC2, Nimbus)  Can be deployed on unmanaged infrastructure like BOINC  Based on innovative second generation package manager  Provides versioning of every build and build products installed on the system  It is sufficient to record only a full version string of top level package group to be able to reconstruct a full VM