CernVM-FS Operations in the CERN IT Storage Group Dan van der Ster (CERN IT-ST) CernVM Users Workshop 6-8 June 2016 - D. van der Ster2.

Slides:



Advertisements
Similar presentations
Tableau Software Australia
Advertisements

NGAS – The Next Generation Archive System Jens Knudstrup NGAS The Next Generation Archive System.
Low-Cost Data Deduplication for Virtual Machine Backup in Cloud Storage Wei Zhang, Tao Yang, Gautham Narayanasamy University of California at Santa Barbara.
Copyright 2009 FUJITSU TECHNOLOGY SOLUTIONS PRIMERGY Servers and Windows Server® 2008 R2 Benefit from an efficient, high performance and flexible platform.
Testing as a Service with HammerCloud Ramón Medrano Llamas CERN, IT-SDC
CERN Cloud Infrastructure Report 2 Bruno Bompastor for the CERN Cloud Team HEPiX Spring 2015 Oxford University, UK Bruno Bompastor: CERN Cloud Report.
Experiences Deploying Xrootd at RAL Chris Brew (RAL)
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
Windows Azure Conference 2014 Running Docker on Windows Azure.
Status of WLCG Tier-0 Maite Barroso, CERN-IT With input from T0 service managers Grid Deployment Board 9 April Apr-2014 Maite Barroso Lopez (at)
INTRODUCTION TO CLOUD COMPUTING CS 595 LECTURE 2.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Pakiti.
Ceph Storage in OpenStack Part 2 openstack-ch,
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS From data management to storage services to the next challenges.
The Data Bridge Laurence Field IT/SDC 6 March 2015.
Introduction to CVMFS A way to distribute HEP software on cloud Tian Yan (IHEP Computing Center, BESIIICGEM Cloud Computing Summer School.
| nectar.org.au NECTAR TRAINING Module 9 Backing up & Packing up.
Microsoft Azure Storage. Networking Compute Storage Virtual Machine Operating System Applications Data & Access Runtime Provision.
Cloud Status Laurence Field IT/SDC 09/09/2014. Cloud Date Title 2 SaaS PaaS IaaS VMs on demand.
Platform & Engineering Services CERN IT Department CH-1211 Geneva 23 Switzerland t PES AI’s user access, OpenStack security groups and firewall.
GAAIN Virtual Appliances: Virtual Machine Technology for Scientific Data Analysis Arihant Patawari USC Stevens Neuroimaging and Informatics Institute July.
Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Usage of virtualization in gLite certification Andreas Unterkircher.
The Process Manager in the ATLAS DAQ System G. Avolio, M. Dobson, G. Lehmann Miotto, M. Wiesmann (CERN)
DoC Private IaaS Cloud Thomas Joseph Cloud Manager
NA61/NA49 virtualisation: status and plans Dag Toppe Larsen CERN
CERN IT Department CH-1211 Geneva 23 Switzerland t CF Computing Facilities Agile Infrastructure Monitoring CERN IT/CF.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
CERN IT Department CH-1211 Genève 23 Switzerland t IT Configuration Activities Gavin McCance Online Cross-experiment Meeting, 14 June 2012.
Stairway to the cloud or can we take the highway? Taivo Liik.
Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Tools and techniques for managing virtual machine images Andreas.
2012 Objectives for CernVM. PH/SFT Technical Group Meeting CernVM/Subprojects The R&D phase of the project has finished and we continue to work as part.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
Arne Wiebalck -- VM Performance: I/O
| nectar.org.au NECTAR TRAINING Module 9 Backing up & Packing up.
Catalin Condurache STFC RAL Tier-1 GridPP OPS meeting, 10 March 2015.
EGEE is a project funded by the European Union under contract IST GLite Integration Infrastructure Integration Team JRA1.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Upcoming Features and Roadmap Ricardo Rocha ( on behalf of the.
MND review. Main directions of work  Development and support of the Experiment Dashboard Applications - Data management monitoring - Job processing monitoring.
The CernVM Infrastructure Insights of a paradigmatic project Carlos Aguado Sanchez Jakob Blomer Predrag Buncic.
CERN IT Department CH-1211 Genève 23 Switzerland t Migration from ELFMs to Agile Infrastructure CERN, IT Department.
Daniele Spiga PerugiaCMS Italia 14 Feb ’07 Napoli1 CRAB status and next evolution Daniele Spiga University & INFN Perugia On behalf of CRAB Team.
T3g software services Outline of the T3g Components R. Yoshida (ANL)
Feedback from CMS Andrew Lahiff STFC Rutherford Appleton Laboratory Contributions from Christoph Wissing, Bockjoo Kim, Alessandro Degano CernVM Users Workshop.
Platform & Engineering Services CERN IT Department CH-1211 Geneva 23 Switzerland t PES AI Images, flavours and partitions Vítor Gouveia,
36 th LHCb Software Week Pere Mato/CERN.  Provide a complete, portable and easy to configure user environment for developing and running LHC data analysis.
Predrag Buncic (CERN/PH-SFT) CernVM Status. CERN, 24/10/ Virtualization R&D (WP9)  The aim of WP9 is to provide a complete, portable and easy.
EMI is partially funded by the European Commission under Grant Agreement RI EMI SA2 Report Andres ABAD RODRIGUEZ, CERN SA2.4, Task Leader EMI AHM,
The RAL PPD Tier 2/3 Current Status and Future Plans or “Are we ready for next year?” Chris Brew PPD Christmas Lectures th December 2007.
10/18/01Linux Reconstruction Farms at Fermilab 1 Steven C. Timm--Fermilab.
Advancing CernVM-FS and its Development Infrastructure José Molina Colmenero CERN EP-SFT.
Platform & Engineering Services CERN IT Department CH-1211 Geneva 23 Switzerland t PES Agile Infrastructure Project Overview : Status and.
Considerations on Using CernVM-FS for Datasets Sharing Within Various Research Communities Catalin Condurache STFC RAL UK ISGC, Taipei, 18 March 2016.
Geant4 GRID production Sangwan Kim, Vu Trong Hieu, AD At KISTI.
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
Farming Andrea Chierici CNAF Review Current situation.
Alessandro De Salvo Mayuko Kataoka, Arturo Sanchez Pineda,Yuri Smirnov CHEP 2015 The ATLAS Software Installation System v2 Alessandro De Salvo Mayuko Kataoka,
Hepix spring 2012 Summary SITE:
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
DECTRIS Ltd Baden-Daettwil Switzerland Continuous Integration and Automatic Testing for the FLUKA release using Jenkins (and Docker)
St. Petersburg, 2016 Openstack Disk Storage vs Amazon Disk Storage Computing Clusters, Grids and Cloud Erasmus Mundus Master Program in PERCCOM Author:
CVMFS Alessandro De Salvo Outline  CVMFS architecture  CVMFS usage in the.
Course: Cluster, grid and cloud computing systems Course author: Prof
EOS-FUSE in CERN's DevOps Infrastructure
Stress Free Deployments with Octopus Deploy
Introduction to Distributed Platforms
Virtualization in the gLite Grid Middleware software process
Cloud based Open Source Backup/Restore Tool
Lab 1 introduction, debrief
Overview Introduction VPS Understanding VPS Architecture
Azure Container Service
Presentation transcript:

CernVM-FS Operations in the CERN IT Storage Group Dan van der Ster (CERN IT-ST) CernVM Users Workshop 6-8 June D. van der Ster2

Outline The new ops team Status in numbers Our (mostly virtual) architecture Evolving requirements Potential future architectures Client matters - D. van der Ster3

Introductions CERN Stratum 0/1 and Squid Services now part of the IT Storage Group CASTOR, EOS, AFS, NFS, Ceph, now CVMFS Team: Dan van der Ster Hervé Rousseau We inherited a flexible, clean service from Steve Traylen. - D. van der Ster4

Stratum Zero Numbers 30 repositories across 19 release manager machines

Stratum Zeroes REPOSIZE aleph.cern.ch594M alice.cern.ch370G alice-ocdb.cern.ch1.1T ams.cern.ch2.5T atlasbuilds.cern.ch709M atlas.cern.ch1.1T bbp.epfl.ch637M belle.cern.ch76G REPOSIZE boss.cern.ch25G clicdp.cern.ch384K cms.cern.ch2.4T cms-opendata-conddb.cern.ch60G compass.cern.ch512K cvmfs-config.cern.ch384K delphi.cern.ch12G fcc.cern.ch585M Numbers according to ZFS - D. van der Ster6

Stratum Zeroes Plus a few zeroes not operated by CERN IT: cms-ib.cern.ch, atlas-nightlies.cern.ch, cernvm-prod.cern.ch REPOSIZE ganga.cern.ch1.1G geant4.cern.ch86G grid.cern.ch26G lhcb.cern.ch1.1T lhcbdev.cern.ch742G moedal.cern.ch512K na49.cern.ch392M REPOSIZE na61.cern.ch9.1G na62.cern.ch362M opal.cern.ch384K sft.cern.ch492G test.cern.ch31G wlcg-clouds.cern.ch384K Numbers according to ZFS - D. van der Ster7

Architecture

CERN Stratum Zero Architecture Fully virtual stratum-0 cvmfs-stratum-zero : alias to best of a few small stateless VMs zero*.cern.ch: Apache reverse proxy mod_proxy to hide the release manager machines. lxcvmfs*.cern.ch: Aka cvmfs-.cern.ch Large-ish VM with attached Ceph block storage Release managers work here - D. van der Ster11

Virtual Release Manager Machines /var/spool/cvmfs/.cern.ch is Ceph block storage Flexible, reliable, durable Tunable QoS a.k.a. IOPS/throughput Thinly provisioned and resizable /var/spool/cvmfs/.cern.ch is ZFS Snapshotting, incremental backups via replication to Wigner Good performance, data integrity - D. van der Ster12

CERN Stratum One cvmfs-backend.cern.ch Single large physical disk server: reliable and fast enough Single large md raid10 ext4 fs, will need to expand it rather shortly /dev/md5 11T 9.5T 743G 93% /srv Is a single point of failure cvmfs-stratum-one.cern.ch Squids with around 300GB of attached Ceph volumes (could grow the size of these) - D. van der Ster13

“OurProxy” Service Squids in Meyrin and Wigner with attached Ceph volumes (~200GB each) Quite reliable, no problems Working with Analysis WG to send all squid logs to HDFS In the meantime Hervé wrote a simple graphite probe to plot the squid_status metrics - D. van der Ster14

Future

Moving out of AFS Motivated by the general decline in the community/project No hard deadline, but pushing things as much as we can. Hoping to clean up during LS2 (2019) No single replacement product, but CVMFS is part of the solution. We expect growing CVMFS usage in the size and number of repositories ATLAS already requested a ~20TB stratum 0 for nightlies Total AFS project space for ATLAS is ~60TB (CMS around ~10TB) Dedup in CVMFS will hopefully decrease this space requirement substantially There are >250 “project spaces” in AFS Unclear which of these will end up in /cvmfs or /eos (or something else…) Some repos will need restricted access – secure CVMFS development? - D. van der Ster18

Pain points as we grow the service Too many repos: Operating many more lxcvmfs* nodes will become a burden  need to scale up the machines to put more repos per node, or do something completely different (S3, NFS, CephFS, etc…) Signing the whitelists is time consuming and error prone Typing the same PIN 30 times twice a month  The release manager nodes are almost equivalent to lxplus Would be nice to separate the stratum zero servers from the interactive nodes Size of the repos: Scaling the storage itself won’t be a problem at CERN But puts new requirements on the stratum-1s The “nightly build” use-cases will have a high rate of change Maybe we shouldn’t replicate these to stratum-1s - D. van der Ster19

Faster Nightly Stratum-Zeros Prototyping a new architecture for the nightlies use-cases Biggest VM we can get: 16 CPUs, 32GB RAM, local fast SSD Attach a huge Ceph volume (20TB in case of ATLAS) Use part of the local SSD as a ZFS write-ahead ZIL Disable gc at publish time (run every few days instead) Clients mount stratum-zero directly (perhaps via a squid) If this doesn’t work, we’ll have to evaluate other backend storage tech (S3, CephFS, …) - D. van der Ster20

Clients

CERNOPs-CvmFS Puppet Module Currently version 2.0.0, next version will deprecate all explicit hiera calls. Supports clients and stratum 0 well, stratum 1 support should be improved. Now used as part of puppet on desktop project at CERN Desktops to get cvmfs easier - D. van der Ster22

Client Monitoring at CERN - syslog to ES & Kibana

CVMFS on CERN Desktops A replacement of lcm (Local Configuration Manager) Quattor- based tool for CERN CentOS 7 is entering the early test stage. The new tool called - locmap - LO{cal) C(onfiguration) with MA(sterless) P(uppet) can be installed (and used) in parallel with existing lcm installation. It configures the same components as lcm using puppet modules, but we added a new cvmfs component. Ongoing: Distribute CVMFS and locmap in standard CERN repositories. ETA: Q3 Replace lcm/ncm components by locmap/puppet modules for default CC7 installation. ETA: when it's ready but not before CC7.3 # locmap --list [Available Modules] Module name : sudo [enabled] Module name : sendmail [enabled] Module name : ntp [enabled] Module name : kerberos [enabled] Module name : cvmfs [disabled] Module name : ssh [enabled] Module name : nscd [enabled] Module name : afs [enabled] # locmap --enable cvmfs # locmap --configure cvmfs - D. van der Ster24

CVMFS Docker Volume Plugin Docker volume plugins are supported since 1.8 Providing integration with external storage systems CERN Cloud team provides a CVMFS plugin Manages bind mounts between host and containers Nicer interface on volume creation and deletion Packaging provided for CentOS7 Instructions for Debian / Ubuntu Available by default in the CERN Cloud Container Service - D. van der Ster25

CVMFS Docker Volume Plugin Creating a volume > docker volume create -d cvmfs --name cms.cern.ch cms.cern.ch > docker volume ls DRIVER VOLUME NAME cvmfs cms.cern.ch > docker volume rm cms.cern.ch cms.cern.ch Launching a new container with a volume > docker run -it --volume-driver cvmfs -v cms.cern.ch:/cms centos:7 /bin/bash /]# ls /cms/ bootstrap_slc5_amd64_gcc462.log cmssw.git D. van der Ster26

Conclusion New caretakers of CERN Stratum 0/1s + Squids Ramping up our knowledge/experience of this service The (virtual) infrastructure runs well, and AFS decline means we’ll see growth in CVMFS Investigating improvements in repo publishing speed as well as scalability (size + number of repos) Interesting work on client side to integrate with new platforms. - D. van der Ster27