Elastic CNAF Datacenter extension via opportunistic resources INFN-CNAF.

Slides:



Advertisements
Similar presentations
INFN-T1 site report Giuseppe Misurelli On behalf of INFN-T1 staff HEPiX Spring 2015.
Advertisements

Luca dell’Agnello INFN-CNAF FNAL, May
CERN IT Department CH-1211 Geneva 23 Switzerland t T0 report WLCG operations Workshop Barcelona, 07/07/2014 Maite Barroso, CERN IT.
Tier-1 experience with provisioning virtualised worker nodes on demand Andrew Lahiff, Ian Collier STFC Rutherford Appleton Laboratory, Harwell Oxford,
Copyright © 2010 Platform Computing Corporation. All Rights Reserved.1 The CERN Cloud Computing Project William Lu, Ph.D. Platform Computing.
Status of WLCG Tier-0 Maite Barroso, CERN-IT With input from T0 service managers Grid Deployment Board 9 April Apr-2014 Maite Barroso Lopez (at)
ALICE data access WLCG data WG revival 4 October 2013.
Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.
José M. Hernández CIEMAT Grid Computing in the Experiment at LHC Jornada de usuarios de Infraestructuras Grid January 2012, CIEMAT, Madrid.
Building a distributed software environment for CDF within the ESLEA framework V. Bartsch, M. Lancaster University College London.
BINP/GCF Status Report BINP LCG Site Registration Oct 2009
INFN-T1 site report Andrea Chierici On behalf of INFN-T1 staff HEPiX Spring 2014.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
1 Evolution of OSG to support virtualization and multi-core applications (Perspective of a Condor Guy) Dan Bradley University of Wisconsin Workshop on.
INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities.
100G R&D at Fermilab Gabriele Garzoglio (for the High Throughput Data Program team) Grid and Cloud Computing Department Computing Sector, Fermilab Overview.
Take on messages from Lecture 1 LHC Computing has been well sized to handle the production and analysis needs of LHC (very high data rates and throughputs)
WLCG Service Report ~~~ WLCG Management Board, 24 th November
Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.
1. Maria Girone, CERN  Q WLCG Resource Utilization  Commissioning the HLT for data reprocessing and MC production  Preparing for Run II  Data.
WNoDeS – Worker Nodes on Demand Service on EMI2 WNoDeS – Worker Nodes on Demand Service on EMI2 Local batch jobs can be run on both real and virtual execution.
Support in setting up a non-grid Atlas Tier 3 Doug Benjamin Duke University.
Atlas Tier 3 Virtualization Project Doug Benjamin Duke University.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Exploiting Virtualization & Cloud Computing in ATLAS 1 Fernando H. Barreiro.
1 INFN-T1 site report Andrea Chierici On behalf of INFN-T1 staff 28 th October 2009.
INFN TIER1 (IT-INFN-CNAF) “Concerns from sites” Session LHC OPN/ONE “Networking for WLCG” Workshop CERN, Stefano Zani
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.
INFN-T1 site report Andrea Chierici On behalf of INFN-T1 staff HEPiX Fall 2013.
Status Report of WLCG Tier-1 candidate for KISTI-GSDC Sang-Un Ahn, for the GSDC Tier-1 Team GSDC Tier-1 Team ATHIC2012, Busan,
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.
INFN-T1 site report Andrea Chierici, Vladimir Sapunenko On behalf of INFN-T1 staff HEPiX spring 2012.
Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.
A Silvio Pardi on behalf of the SuperB Collaboration a INFN-Napoli -Campus di M.S.Angelo Via Cinthia– 80126, Napoli, Italy CHEP12 – New York – USA – May.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
Workload management, virtualisation, clouds & multicore Andrew Lahiff.
Eygene Ryabinkin, on behalf of KI and JINR Grid teams Russian Tier-1 status report May 9th 2014, WLCG Overview Board meeting.
INFN-T1 site report Luca dell’Agnello On behalf ot INFN-T1 staff HEPiX Spring 2013.
ELASTIC LSF EXTENSION AT CNAF. credits Vincenzo Ciaschini Stefano Dal Pra Andrea Chierici Vladimir Sapunenko Tommaso Boccali.
1 Cloud Services Requirements and Challenges of Large International User Groups Laurence Field IT/SDC 2/12/2014.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
CernVM-FS Infrastructure for EGI VOs Catalin Condurache - STFC RAL Tier1 EGI Webinar, 5 September 2013.
INFN-T1 site report Andrea Chierici On behalf of INFN-T1 staff HEPiX Fall 2015.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
CERN IT Department CH-1211 Geneva 23 Switzerland t ES 1 how to profit of the ATLAS HLT farm during the LS1 & after Sergio Ballestrero.
IT-INFN-CNAF Status Update LHC-OPN Meeting INFN CNAF, December 2009 Stefano Zani 10/11/2009Stefano Zani INFN CNAF (TIER1 Staff)1.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Opportunistic Computing Only Knocks Once: Processing at SDSC Ian Fisk FNAL On behalf of the CMS Collaboration.
PADME Kick-Off Meeting – LNF, April 20-21, DAQ Data Rate - Preliminary estimate Tentative setup: all channels read with Fast ADC 1024 samples, 12.
Considerations on Using CernVM-FS for Datasets Sharing Within Various Research Communities Catalin Condurache STFC RAL UK ISGC, Taipei, 18 March 2016.
WNoDeS – a Grid/Cloud Integration Framework Elisabetta Ronchieri (INFN-CNAF) for the WNoDeS Project
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
INFN/IGI contributions Federated Clouds Task Force F2F meeting November 24, 2011, Amsterdam.
Farming Andrea Chierici CNAF Review Current situation.
EMI is partially funded by the European Commission under Grant Agreement RI Elisabetta Ronchieri, INFN CNAF Munich, 29 March 2012 WNoDeS Tutorial.
PRIN STOA-LHC: STATUS BARI BOLOGNA-18 GIUGNO 2014 Giorgia MINIELLO G. MAGGI, G. DONVITO, D. Elia INFN Sezione di Bari e Dipartimento Interateneo.
S. Pardi Computing R&D Workshop Ferrara 2011 – 4 – 7 July SuperB R&D on going on storage and data access R&D Storage Silvio Pardi
Dynamic Extension of the INFN Tier-1 on external resources
Extending the farm to external sites: the INFN Tier-1 experience
DPM at ATLAS sites and testbeds in Italy
Daniele Cesini – INFN-CNAF - 19/09/2017
INFN Computing infrastructure - Workload management at the Tier-1
Andrea Chierici On behalf of INFN-T1 staff
INFN CNAF TIER1 and TIER-X network infrastructure
Update on Plan for KISTI-GSDC
Статус ГРИД-кластера ИЯФ СО РАН.
Vladimir Sapunenko On behalf of INFN-T1 staff HEPiX Spring 2017
Xiaomei Zhang On behalf of CEPC software & computing group Nov 6, 2017
Presentation transcript:

Elastic CNAF Datacenter extension via opportunistic resources INFN-CNAF

INFN National Institute for Nuclear Physics (INFN) is a research institute funded by the Italian government Composed by several units – 20 units dislocated in the main Italian University Physics Departments – 4 Laboratories – 3 National Centers dedicated to specific tasks CNAF is a National Center dedicated to computing applications 2 ISGC 2016

The Tier-1 at INFN-CNAF ● WLCG Grid site dedicated to HEP computing for LHC experiments (ATLAS, CMS, LHCb, ALICE) works with ~30 other scientific groups ● WNs, computing slots, 200k HS06 and counting. ● LSF as current Batch System, Condor migration foreseen ● 22PB SAN disk (GPFS), 27PB on tape (TSM) integrated as an HSM ● Also supporting LTDP for CDF experiment ● Dedicated network channel (LHC OPN, 20Gb/s) with CERN Tier-0 and T1s, plus 20GB/s (LHC ONE) with most of the T2s ● 100Gbps connection in 2017 ● Member of HNSciCloud European project for testing hybrid clouds for scientific computing ISGC

NEXUSCisco7600 RAL SARA PIC TRIUMPH BNL FNAL TW-ASGC NDGF LHC ONE LHC OPN General IP 40Gb/s 20Gb/s 40 Gb Physical Link (4x10Gb) Shared by LHCOPN and LHCONE. 10Gb/s 20 Gb/s For General IP Connectivity GARR Bo1 GARR Mi1 GARR BO1 IN2P3 Main Tier-2s RRC-KI JINR KR-KISTI CNAF TIER-1 ISGC

Extension use-cases Elastic opportunistic computing with transient Aruba resources. CMS selected for test&setup ReCaS/Bari: extension and management of remote resources – These will become pledged resources for CNAF ISGC 2016 Bologna Bari Arezzo 5

Use-case 1: Aruba

Pros of Opportunistic computing ● CMS ● Take advantage of (much) more computing resources. ● CONS: transient availability ● ARUBA ● Study case in order to provide unused resources to an “always hungry” customer ● INFN-T1 ● Test transparent utilization of remote resources for HEP (proprietary or opportunistic) ISGC

Aruba ● One of the main Italian resource providers – Web, host, mail, cloud... ● Main datacenter in Arezzo (near Florence) ISGC

The CMS Experiment at INFN-T1 ● 48k HS06 of CPU power, 4PB of online Disk storage and 12PB of tape ● Implemented all majors computing activities ● Monte Carlo simulations ● Reconstruction ● End-user analysis The 4 LHC experiments are close enough in requests / workflows – extension to the other 3 under development ISGC

The use-case ● Early agreement CNAF - Aruba ● ARUBA provides an amount of Virtual resources (CPU cycles, RAM, DISK) to deploy a remote testbed ● VMWare dashboard ● When Aruba customers require more resources, the CPU Freq. of the provided VMs in the testbed is lowered down to a few MHz (not destroyed!) ● Goal ● Transparently join these external resources “as if they were” in the local cluster, and have LSF dispatching jobs there when available ● Tied to CMS-only specifications for the moment ● Once fully tested and verified, extension to other experiments is ● Trivial for other LHC experiments ● To be studied for non-LHC VOs ISGC

VM Management via VMWare Proved to be rock solid and extremely versatile Imported seamlessly a WN image from our WN- on-demand system (WNoDeS) Adapted and contextualized ISGC 2016 Resources allocated to our Data center 11

The CMS workflow at CNAF ● Grid pilot jobs submitted to CREAM CEs ● Late binding: we cannot know in advance what kind of activity it's going to perform ● Multicore only ● 8 core (or 8 slot) jobs: CNAF dedicates a dynamic partition of WNs to such jobs ● SQUID proxy for Software and Condition DB ● Input files on local GPFS disk, fallback via Xrootd, O(GB) file size ● Output file staged through SRM (StoRM) at CNAF. ISGC

The dynamic Multicore partition ISGC 2016 CMS jobs run in a dynamic subset of hosts dedicated to multicore-only jobs. Elastic resources shall be member of this subset. 13

Adapting CMS for Aruba Main idea: transparent extension Remote WN join the LSF cluster at boot “as if” local to the cluster Problems: Remote Virtual WN need read-only access to the cluster shared fs (/usr/share/lsf) VMs have private IP, are behind NAT & FW, outbound connectivity only, but have to be reachable by LSF LSF needs host resolution (IP ↔ hostname) but no DNS available for such hosts ISGC

Adapting CMS for Aruba Solutions: Read-only access to the cluster shared fs Provided through GPFS/AFM Host resolution LSF has his own version of /etc/hosts This requires to declare a fixed set of Virtual nodes Networking problems solved using dynfarm: Service developed at CNAF to provide integration between LSF and virtualized computing resources. ISGC

Remote data access via GPFS AFM GPFS AFM A cache providing geographic replica of a file system manages RW access to cache Two sides Home - where the information lives Cache Data written to the cache is copied back to home as quickly as possible Data is copied to the cache when requested Configured as Read-only for site extension ISGC

Dynfarm concepts The VM at boot connects to a OpenVPN based service at CNAF It authenticates the connection (X.509) Delivers parameters to setup a tunnel with (only) the required services at CNAF (LSF, CEs, Argus) Routes are defined on each server to the private IPs of the VMs (GRE Tunnels) Other traffic flows through general network ISGC

Dynfarm deployment VPN Server side, two RPMs: dynfarm-server, dynfarm-client-server In the VPN server at CNAF. First install creates one dynfarm_cred.rpm which must be present in the VMs VM side, two RPMs: dynfarm_client, dynfarm_cred (contains CA certificate used by VPN server and a key used by dynfarm-server) Management: remote_control ISGC

Dynfarm workflow ISGC

Results Early successful attempts from Jun 2015 Different configurations (tuning) have followed ISGC

Results 160GHz total amount of CPU (Intel 2697-v3). – Assuming 2GHz/core → 10 x 8-cores VMs (possible overbooking) ISGC

Results Currently the remote VM run the very same jobs delivered to CNAF by GlideinWMS Job efficiency on elastic resources can be very good for certain type of jobs (MC) Special configuration at GlideIN can specialize delivery for these resources. ISGC 2016 QueueSiteNjobsAvg_effMax_effAvc_wctAvg_cpt CMS_mcAR CMS_mcT

Use-case 2: ReCaS/Bari

Remote extension to ReCaS/Bari ~17.5k HS06, ~30WN, 64 core, 256GB RAM 1 core / 1 slot, 4GB/slot, 8,53 HS06/slot (546HS06/WN) Dedicated network connection with CNAF: VPN lev. 3, 20Gb/s Routing through CNAF, IP of remote hosts in the same network range (plus x.y for ipmi access) Similar to CERN/Wigner extension Direct and transparent access from CNAF ISGC

Deployment Two infrastructure VMs to offload network link: CVMFS and Frontier SQUID (used by ATLAS and CMS) SQUID requests are redirected to the local VMs Cache storage GPFS/AFM 2 server, 10 Gbit 330TB (Atlas, CMS, LHCb) LSF shared file system also replicated ISGC

Network traffic (4 weeks) ISGC

Current issues and tuning Latencies in the shared fs can cause troubles – Intense I/O can lead to timeout : ba-3-x-y: Feb 8 22:56:51 ba kernel: nfs: server nfs- ba.cr.cnaf.infn.it not responding, timed out CMS: fallback to Xrootd (excessive load on the AFM cache) ISGC

Comparative Results QueueNodetypeNjobsAvg_effMax_effAvg_wctAvg_cpt Cms_mcAR AliceT Atlas_scT Cms_mcT LhcbT Atlas_mcT AliceBA AtlasBA Cms_mcoreBA LhcbBA Atlas_scBA ISGC

Conclusions

Aruba Got the opportunity to test our setup on a pure commercial cloud provider Developed dynfarm to extend our network setup Core dynfarm concept should be adaptable to other Batch Systems Gained experience on yet another Cloud Infrastructure: Vmware Job efficiency encouraging Even better when we will be able to forward to Aruba only non-IO intensive jobs Scale of the test quite small, did not reach any bottleneck Tested with CMS, other LHC experiments may join in future Accounting problematic due to possible GHz reduction Good exercise for HNSciCloud too ISGC

ReCaS/Bari T1-Bari farm extension “similar” to CERN-Wigner Job efficiency (compared to native T1) highly depending on storage usage – Better efficiency means job on WN is mainly CPU bound (or input file already in cache before start) General scalability limited by the width of dedicated T1→BA link (20Gb/s) Assistance on faulty nodes somehow problematic ISGC