Extending the farm to external sites: the INFN Tier-1 experience

Extending the farm to external sites: the INFN Tier-1 experience
Luca dell’Agnello, Andrea Chierici INFN-CNAF HEPiX Fall 2016

The Tier-1 at INFN-CNAF INFN-T1 started as a computing center for LHC experiments Nowadays provides services and resources to ~30 other scientific collaborations ~1.000 WNs , ~ computing slots, ~200 kHS06 Small HPC cluster available with IBA (~33 TFlops) 22 PB SAN disk (GPFS), 43 PB on tape (TSM) integrated as an HSM Supporting LTDP for CDF experiment Dedicated network channel (60 Gb/s) for LHC OPN + LHC ONE 20 Gb/s reserved for LHC ONE Upgrade to 100 Gb/s connection in 2017 HEPiX Fall 2016 20 oct 2016

Computing resource usage
Computing resources always completely used at INFN-T1 with a large amount of waiting jobs (50% of the running jobs) Expected huge resource request (and increase) in the next years mostly coming from LHC experiments INFN Tier-1 farm usage in 2016 INFN Tier-1 resource increase HEPiX Fall 2016 20 oct 2016

Toward a (semi-)elastic Data Center?
Planning to upgrade the Data Center to host resources at least until the end of LHC Run 3 (2023) Extending (dynamically) the farm to remote hosts as a complementary solution Cloud bursting on commercial provider Tests of “opportunistic computing” on Cloud providers Static allocation of remote resources First production use case: part of 2016 pledged resources for WLCG experiments at INFN-T1 are in Bari-ReCaS Participating in HNSCicloud PCP project HEPiX Fall 2016 20 oct 2016

Cloud bursting on commercial provider

Opportunistic computing on Aruba
One of the main Italian commercial resource providers Web, host, mail, cloud, … Main datacenter in Arezzo (near Florence) Small scale test VMs (160 GHz in total) running on VMWare Use of idle CPU cycles When a customer requires a resource we are using, the CPU clock speed of “our” VMs is decreased to a few MHz (not destroyed!) Only CMS mcore jobs tested No storage on site: remote data access via Xrootd Use of GPN (no dedicated NREN infrastructure) HEPiX Fall 2016 20 oct 2016

Implementation Developed in house solution, dynfarm
Authenticates connection requests coming from remote hosts and delivers the information needed to create a VPN tunnel Communication enabled only through remote WNs and local CEs, LSF and argus All other traffic goes through its default route Shared file system access through a R/O GPFS cache (AFM, see later for details) HEPiX Fall 2016 20 oct 2016

Some figures One week aggregated load CPU Capping at 160Ghz
HEPiX Fall 2016 20 oct 2016

Results The remote VMs run the very same CMS jobs running at INFN-T1
Ad hoc configuration at GlideIN could specialize delivery for these resources Job efficiency (CPT/WCT) depends on type of job Very good for MC Low on average (0.49 vs. 0.80) Guaranteed network bandwidth and/or a caching system could improve these figures We foresee additional tests installing a caching system HEPiX Fall 2016 20 oct 2016

Static allocation of remote resources

Remote extension to Bari ReCaS
48 WNs (~26 kHS06) and ~330 TB of disk allocated to INFN-T1 farm for WLCG experiments in Bari-ReCaS data center Bari-ReCaS hosts a Tier-2 for CMS and Alice 10% of CNAF total resources, 13% of resources pledged to WLCG experiments Goal: direct and transparent access from INFN-T1 Our model is Wigner Dist: 600km RTT: 10ms HEPiX Fall 2016 20 oct 2016

BARI – INFN-T1 connectivity
Commissioning tests Bari-ReCaS WNs to be considered as if on CNAF LAN L3 VPN configured 2x10 Gb/s, MTU=9000 CNAF /22 subnet allocated to BARI WNs Service network (i.e. for IPMI) accessible Bari WNs routing through CNAF Including LHCONE, LHCOPN and GPN Layout of CNAF-BARI VPN Distance: ~600 Km RTT: ~10 ms HEPiX Fall 2016 20 oct 2016

Farm extension setup INFN-T1 LSF master dispatches jobs also to Bari-ReCaS WNs Remote WNs considered (and really appear) as local resources CEs and other service nodes only at INFN-T1 Auxiliary services configured in Bari-ReCaS CVMFS Squid servers (for software distribution) Frontier Squid servers (used by ATLAS and CMS for condition db) HEPiX Fall 2016 20 oct 2016

Data Access Data at CNAF organized in GPFS file-systems
Local access through Posix, Gridftp, Xrootd, and http Remote fs mount from CNAF unfeasible (x100 RTT) Jobs expect to access data the same way as if running at INFN-T1 Not all experiments can use a fallback protocol Posix cache for data required in Bari-ReCaS Alice uses Xrootd only (no cache needed) Cache implemented with AFM (GPFS native feature) HEPiX Fall 2016 20 oct 2016

Remote data access via GPFS AFM
A cache providing geographic replica of a file system Manages R/W access to cache Two sides Home - where the information lives Remote – implemented as a cache Data written to the cache is copied back to home as quickly as possible Data is copied to the cache when requested AFM configured R/O for Bari-ReCaS Several tuning and reconfigurations required! In any case, decided to avoid submission of high throughput jobs to remote host HEPiX Fall 2016 20 oct 2016

AFM cache layout BARI-CNAF data access CNAF BARI GPFS Cluster
cNFS Cluster BARI-CNAF data access ATLAS (ddn9900) cNFS-LHC Ds-001 Wn-ba Ds-002 Wn-ba LHCB (ddn10k) Wn-ba Wn-ba AFM NFS NFS-BA CMS (ddn12k) Ds-ba-01 LSF cNFS 330 TB Ds-ba-02 Ds-01-11 AFM LSF ATLAS LHCB CMS LSF (DotHill) Ds-01-12 CNAF BARI GPFS (md38) HEPiX Fall 2016 20 oct 2016

Results: Bari-ReCaS (1)
Job Bari-ReCaS equivalent (or even better!) In general jobs at CNAF use WNs shared among several VOs during summer one of these (non WLCG), submitted misconfigured jobs, affecting efficiency of all other VOs Atlas submits only low I/O jobs on Bari-ReCaS Alice uses only XrootD, no cache “intense” WAN usage also from CNAF jobs Network appears not to be an issue We can work without cache if data comes via Xrootd Cache mandatory for some experiments Experiment NJobs Efficiency Alice 105109 0,87 Atlas 366999 0,94 CMS 34626 0,80 LHCb 39310 0,92 Job Experiment NJobs Efficiency Alice 536361 0,86 Atlas 0,87 CMS 326891 0,76 LHCb 263376 0,88 Job Efficiency = CPT/WCT Cpt=current processing time HEPiX Fall 2016 20 oct 2016

Results: Bari-ReCaS (2)
Production quality since June 2016 ~550 k jobs (~8% CNAF) ~2000 HS06 added HEPiX Fall 2016 20 oct 2016

Conclusions An extension of INFN Tier-1 center with remote resources is possible 2 different implementations INFN remote resources Commercial cloud provider Even if we are planning to upgrade the Data Center to host enough resources at least to guarantee local execution of LHC Run 3 (2023), testing (elastic) extension is important Infrastructure will not scale indefinitely External extensions could address temporary peak requests HEPiX Fall 2016 20 oct 2016

Future works Implement a generic caching system
Add Openstack cloud layer to standardize access to remote resources Aruba setup now differs from Bari-ReCaS one Test other external (cloud) providers HEPiX Fall 2016 20 oct 2016

Extending the farm to external sites: the INFN Tier-1 experience

Similar presentations

Presentation on theme: "Extending the farm to external sites: the INFN Tier-1 experience"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Extending the farm to external sites: the INFN Tier-1 experience

Similar presentations

Presentation on theme: "Extending the farm to external sites: the INFN Tier-1 experience"— Presentation transcript:

Similar presentations

About project

Feedback