Helge Meinhard, CERN-IT Grid Deployment Board 04-Nov-2015

Slides:



Advertisements
Similar presentations
ATLAS Tier-3 in Geneva Szymon Gadomski, Uni GE at CSCS, November 2009 S. Gadomski, ”ATLAS T3 in Geneva", CSCS meeting, Nov 091 the Geneva ATLAS Tier-3.
Advertisements

T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.
IHEP Site Status Jingyan Shi, Computing Center, IHEP 2015 Spring HEPiX Workshop.
12. March 2003Bernd Panzer-Steindel, CERN/IT1 LCG Fabric status
CERN IT Department CH-1211 Geneva 23 Switzerland t T0 report WLCG operations Workshop Barcelona, 07/07/2014 Maite Barroso, CERN IT.
Tier-1 experience with provisioning virtualised worker nodes on demand Andrew Lahiff, Ian Collier STFC Rutherford Appleton Laboratory, Harwell Oxford,
CERN IT Department CH-1211 Genève 23 Switzerland t Next generation of virtual infrastructure with Hyper-V Michal Kwiatek, Juraj Sucik, Rafal.
CERN Cloud Infrastructure Report 2 Bruno Bompastor for the CERN Cloud Team HEPiX Spring 2015 Oxford University, UK Bruno Bompastor: CERN Cloud Report.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
Status Report on Tier-1 in Korea Gungwon Kang, Sang-Un Ahn and Hangjin Jang (KISTI GSDC) April 28, 2014 at 15th CERN-Korea Committee, Geneva Korea Institute.
Status of WLCG Tier-0 Maite Barroso, CERN-IT With input from T0 service managers Grid Deployment Board 9 April Apr-2014 Maite Barroso Lopez (at)
Ceph Storage in OpenStack Part 2 openstack-ch,
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
CERN IT Department CH-1211 Genève 23 Switzerland t Tier0 Status - 1 Tier0 Status Tony Cass LCG-LHCC Referees Meeting 18 th November 2008.
Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.
CCRC’08 Weekly Update Jamie Shiers ~~~ LCG MB, 1 st April 2008.
1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008.
Agile Infrastructure IaaS Compute Jan van Eldik CERN IT Department Status Update 6 July 2012.
Hadoop IT Services Hadoop Users Forum CERN October 7 th,2015 CERN IT-D*
Handling ALARMs for Critical Services Maria Girone, IT-ES Maite Barroso IT-PES, Maria Dimou, IT-ES WLCG MB, 19 February 2013.
Arne Wiebalck -- VM Performance: I/O
Oracle for Physics Services and Support Levels Maria Girone, IT-ADC 24 January 2005.
Automated virtualisation performance framework 1 Tim Bell Sean Crosby (Univ. of Melbourne) Jan van Eldik Ulrich Schwickerath Arne Wiebalck HEPiX Fall 2015.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
Computing infrastructures for the LHC: current status and challenges of the High Luminosity LHC future Worldwide LHC Computing Grid (WLCG): Distributed.
HEPiX spring 2013 report HEPiX Spring 2013 CNAF Bologna / Italy Helge Meinhard, CERN-IT Contributions by Arne Wiebalck / CERN-IT Grid Deployment Board.
Grid Operations in Germany T1-T2 workshop 2015 Torino, Italy Kilian Schwarz WooJin Park Christopher Jung.
Ian Bird LHCC Referees; CERN, 2 nd June 2015 June 2,
LHCb Computing 2015 Q3 Report Stefan Roiser LHCC Referees Meeting 1 December 2015.
ATLAS – statements of interest (1) A degree of hierarchy between the different computing facilities, with distinct roles at each level –Event filter Online.
Servizi core INFN Grid presso il CNAF: setup attuale
Dynamic Extension of the INFN Tier-1 on external resources
WLCG IPv6 deployment strategy
CERN Cloud Service Update
Ian Bird WLCG Workshop San Francisco, 8th October 2016
Resource Provisioning Services Introduction and Plans
IT Services Katarzyna Dziedziniewicz-Wojcik IT-DB.
LCG Service Challenge: Planning and Milestones
Report from WLCG Workshop 2017: WLCG Network Requirements GDB - CERN 12th of July 2017
Virtualisation for NA49/NA61
U.S. ATLAS Grid Production Experience
Progress on NA61/NA49 software virtualisation Dag Toppe Larsen Wrocław
Operations and plans - Polish sites
HEPiX Spring 2014 Annecy-le Vieux May Martin Bly, STFC-RAL
Andrea Chierici On behalf of INFN-T1 staff
Yaodong CHENG Computing Center, IHEP, CAS 2016 Fall HEPiX Workshop
Vanderbilt Tier 2 Project
How to enable computing
EPAM Cloud Orchestration
Moving from CREAM CE to ARC CE
Update on Plan for KISTI-GSDC
The CREAM CE: When can the LCG-CE be replaced?
Virtualisation for NA49/NA61
Castor services at the Tier-0
Olof Bärring LCG-LHCC Review, 22nd September 2008
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
ATLAS Sites Jamboree, CERN January, 2017
Статус ГРИД-кластера ИЯФ СО РАН.
PES Lessons learned from large scale LSF scalability tests
Project Status Report Computing Resource Review Board Ian Bird
Simulation use cases for T2 in ALICE
FCT Follow-up Meeting 31 March, 2017 Fernando Meireles
Helge Meinhard, CERN-IT Grid Deployment Board 10-May-2017
ALICE Computing Upgrade Predrag Buncic
February WLC GDB Short summaries
Haiyan Meng and Douglas Thain
CHIPP - CSCS F2F meeting CSCS, Lugano January 25th , 2018.
Short to middle term GRID deployment plan for LHCb
The LHCb Computing Data Challenge DC06
Presentation transcript:

Helge Meinhard, CERN-IT Grid Deployment Board 04-Nov-2015 Tier-0 Update Helge Meinhard, CERN-IT Grid Deployment Board 04-Nov-2015 04-Nov-2015 Helge Meinhard (at) CERN .ch

Helge Meinhard (at) CERN .ch Outline Cloud Databases Data and storage Network Platform services Infrastructure 04-Nov-2015 Helge Meinhard (at) CERN .ch

Helge Meinhard (at) CERN .ch Cloud 04-Nov-2015 Helge Meinhard (at) CERN .ch

CERN Cloud in Numbers (1) 4’800 hypervisors in production (1y ago: 3000) Majority qemu/kvm now on CC7 (~150 Hyper-V hosts) (SLC6) ~2’000 HVs at Wigner in Hungary (batch, compute, services) (batch) 250 HVs on critical power 130k Cores (64k) 250 TB RAM (128TB) ~15’000 VMs (8’000) To be increased in 2016! 04-Nov-2015 Helge Meinhard (at) CERN .ch

CERN Cloud in Numbers (2) Every 10s a VM gets created or deleted in our cloud! 2’000 images/snapshots (1’100) Glance on Ceph 1’500 volumes (600) Cinder on Ceph (& NetApp) 04-Nov-2015 Helge Meinhard (at) CERN .ch

Ongoing service improvements Adding hardware Today: ~4800 compute nodes (130K cores) Underway: 25K cores Spring 2016: 60K cores Retirements: being clarified Operating system upgrades on the hypervisors 3900 * CC7 hypervisors 700 * SLC6 -> to be upgraded to CC7 in the next 6 months 34 * RHEL6 -> to be upgraded to RHEL7 04-Nov-2015 Helge Meinhard (at) CERN .ch

Helge Meinhard (at) CERN .ch CPU performance VM sizes (cores) Before After 4x 8 7.8% 3.3% (batch WN) 2x 16 16% 4.6% (batch WN) 1x 24 20% 5.0% (batch WN) 1x 32 20.4% 3-6% (SLC6 … WN) Performance - Dependence of optimisations from hardware types and other optimisations - NUMA, pinning, huge pages, EPT Pre-deployment testing not always sufficient - Small issues can have major impact Performance monitoring - Need continuous benchmarks to detect performance changes Requires OpenStack Kilo version of Nova Planned for mid-November Details presented in previous GDB/HEPiX https://indico.cern.ch/event/384358/session/12/contribution/15/attachments/1170139/1689493/Optimisations_of_Compute_Resources_in_the_CERN_Cloud_Service_-_HEPiX14OCT2015.pdf 04-Nov-2015 Helge Meinhard (at) CERN .ch

WIP: Container integration Started to look into integration of containers with our OpenStack deployment Initially triggered by the prospect of low performance overheads LXC due to the lack of an upstream Docker driver (not suitable for general purpose) We’ve setup a test cell Performance looks good OpenStack patches for AFS & CVMFS done AFS in containers: kernel access, multiple containers, tokens, … Operational issues still to be understood Started to look into OpenStack Magnum Container orchestration via Docker or Kubernetes become first class OpenStack resources 04-Nov-2015 Helge Meinhard (at) CERN .ch

Helge Meinhard (at) CERN .ch Databases 04-Nov-2015 Helge Meinhard (at) CERN .ch

Hadoop, Scale-Out Databases Working together with ATLAS on optimizing the design and performance for Event Index application based on Hadoop infrastructure Started the Hadoop Users’ Forum, with fortnightly meetings aimed at sharing experience on using Hadoop components at CERN, between users and with IT service providers 04-Nov-2015 Helge Meinhard (at) CERN .ch

Data and Storage Services 04-Nov-2015 Helge Meinhard (at) CERN .ch

Data services for Tier0 (1) CASTOR Located in Meyrin, notably the 120 PB tape archive EOS 70 PB disk space available; evenly distributed across Meyrin and Wigner About 80% of the disk resources for Tier0 (remaining 20%: CASTOR) 04-Nov-2015 Helge Meinhard (at) CERN .ch

Data services for Tier0 (2) Different T0 workflows ATLAS and CMS  EOS EOS is the distribution centre for T1 export, batch processing and tape (EOSCASTOR) ALICE and LHCb kept the Run 1 model Experiment  CASTOR (which is the distribution centre for RAW data for T1 export and batch processing). EOS used only for analysis Tier0 rates: EOS reading often exceeding ~30 GB/s (Tier0 + analysis) CASTOR in beam max ~15 GB/s (input dominated by raw from LHC, output mainly tape writing) Smooth data taking Important reconfiguration to prepare for the ALICE PbPb run (provide better isolation between recording and raw reconstruction) 04-Nov-2015 Helge Meinhard (at) CERN .ch

Helge Meinhard (at) CERN .ch Networking 04-Nov-2015 Helge Meinhard (at) CERN .ch

Helge Meinhard (at) CERN .ch Networking Will reduce number of servers exposed to the LHCOPN and LHCONE Requires major reconfiguration of the datacentre network Foreseen for end of February 2016 (during a scheduled technical stop) Tier1s to update their configuration (see http://cern.ch/go/6Whh) Third 100 Gbps link between Geneva and Budapest expected to go into production in December 2015 or January 2016 PoP established by NORDUnet at CERN Geneva Peered with CERN with two 10 Gbps links More network redundancy and capacity to the Nordic countries 04-Nov-2015 Helge Meinhard (at) CERN .ch

Platform and Infrastructure Services 04-Nov-2015 Helge Meinhard (at) CERN .ch

Helge Meinhard (at) CERN .ch Batch (1) LSF being upgraded from 7 to 9 Master: First attempt on 13 October failed; this morning: all successfully completed within less than 1.5 hours Worker/submission nodes to follow now (usual QA procedure) HTCondor in production since 02-Nov-2015 2 HTCondor CEs 2 ARC CEs obsolete; one to be retired next week, one before end 2015 04-Nov-2015 Helge Meinhard (at) CERN .ch

Helge Meinhard (at) CERN .ch Batch (2) Transition from LSF to HTCondor HTCondor: currently 4% of total batch Will move capacity from LSF to HTCondor to cover basic Grid submission load (20…25%) Weekly steps of 500…1000 cores Kerberos ticket handling: prototype before end 2015, production to be defined Documentation, training, consultancy to help migrating locally submitted jobs Capacity to be moved further once users can submit jobs locally Deadline: No LSF-based services at end of Run 2 04-Nov-2015 Helge Meinhard (at) CERN .ch

Other Platform Services Middleware: Some recent issues with FTS and ARGUS Good collaboration with developers Some ARGUS outages due to deployment choices or bad input Lxplus: frequent crashes understood and fixed Tracked down to incompatibility between cgroups and a range of kernels Volunteer computing: Good progress by CMS, continued high level of ATLAS usage 04-Nov-2015 Helge Meinhard (at) CERN .ch

Infrastructure Services GitLab-based service set up at CERN Very fast take-up: more than 1’500 projects already Complements GitHub for projects requiring restricted access or tight integration with CERN’s environment Plan to phase out git(olite) and SVN New repositories require moderator approval Plan to be aggressive on git, but more subtle on SVN (rundown targeted some time during LS2) Continuous integration very popular 04-Nov-2015 Helge Meinhard (at) CERN .ch

Helge Meinhard (at) CERN .ch Acknowledgements Tony Cass Eva Dafonte Perez Massimo Lamanna Jan van Eldik Arne Wiebalck 04-Nov-2015 Helge Meinhard (at) CERN .ch

Helge Meinhard (at) CERN .ch Questions? 04-Nov-2015 Helge Meinhard (at) CERN .ch