Development of a Tier-1 computing cluster at National Research Centre 'Kurchatov Institute' Igor Tkachenko on behalf of the NRC-KI Tier-1 team National.

Slides:

Advertisements

Similar presentations

Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.

Advertisements

T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.

Southgrid Status Pete Gronbech: 27th June 2006 GridPP 16 QMUL.

Institute for High Energy Physics ( ) NEC’2007 Varna, Bulgaria, September Activities of IHEP in LCG/EGEE.

Sergey Belov, LIT JINR 15 September, NEC’2011, Varna, Bulgaria.

Site Report US CMS T2 Workshop Samir Cury on behalf of T2_BR_UERJ Team.

Agenda Network Infrastructures LCG Architecture Management

March 27, IndiaCMS Meeting, Delhi1 T2_IN_TIFR of all-of-us, for all-of-us, by some-of-us Tier-2 Status Report.

Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.

Status Report on Tier-1 in Korea Gungwon Kang, Sang-Un Ahn and Hangjin Jang (KISTI GSDC) April 28, 2014 at 15th CERN-Korea Committee, Geneva Korea Institute.

COMSATS Institute of Information Technology, Islamabad PK-CIIT Grid Operations in Pakistan COMSATS Ali Zahir Site Admin / Faculty Member ALICE T1/2 March.

E-Infrastructure hierarchy Networking and Computational facilities in Armenia ASNET AM Network Armenian National Grid Initiative Armenian ATLAS site (AM-04-YERPHI)

Task 6.1 Installing and testing components of the LCG infrastructure to achieve full-scale functionality CERN-INTAS , 25 June, 2006, Dubna V.A.

BINP/GCF Status Report BINP LCG Site Registration Oct 2009

Oxford Update HEPix Pete Gronbech GridPP Project Manager October 2014.

GRID development in Russia 1) Networking for science and higher eductation 2) Grid for HEP 3) Digital Divide V. Ilyin SINP MSU.

ICHEP06, 29 July 2006, Moscow RDIG The Russian Grid for LHC physics analysis V.A. Ilyin, SINP MSU V.V. Korenkov, JINR A.A. Soldatov, RRC KI LCG.

14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.

Spending Plans and Schedule Jae Yu July 26, 2002.

Status Report of WLCG Tier-1 candidate for KISTI-GSDC Sang-Un Ahn, for the GSDC Tier-1 Team GSDC Tier-1 Team 12 th CERN-Korea.

Institute of High Energy Physics ( ) NEC’2005 Varna, Bulgaria, September Participation of IHEP in EGEE.

KISTI-GSDC SITE REPORT Sang-Un Ahn, Jin Kim On the behalf of KISTI GSDC 24 March 2015 HEPiX Spring 2015 Workshop Oxford University, Oxford, UK.

V.Ilyin, V.Gavrilov, O.Kodolova, V.Korenkov, E.Tikhonenko Meeting of Russia-CERN JWG on LHC computing CERN, March 14, 2007 RDMS CMS Computing.

1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.

NORDUnet Nordic Infrastructure for Research & Education Workshop Introduction - Finding the Match Lars Fischer LHCONE Workshop CERN, December 2012.

Status Report of WLCG Tier-1 candidate for KISTI-GSDC Sang-Un Ahn, for the GSDC Tier-1 Team GSDC Tier-1 Team ATHIC2012, Busan,

IHEP(Beijing LCG2) Site Report Fazhi.Qi, Gang Chen Computing Center,IHEP.

Automatic Resource & Usage Monitoring Steve Traylen/Flavia Donno CERN/IT.

LCG LCG-1 Deployment and usage experience Lev Shamardin SINP MSU, Moscow

Integration center of the cyberinfrastructure of NRC “KI” Dubna, 16 july 2012 V.E. Velikhov V.A. Ilyin E.A. Ryabinkin.

JINR WLCG Tier 1 for CMS CICC comprises 2582 Core Disk storage capacity 1800 TB Availability and Reliability = 99% 49% 44% JINR (Dubna)End of.

Participation of JINR in CERN- INTAS project ( ) Korenkov V., Mitcin V., Nikonov E., Oleynik D., Pose V., Tikhonenko E. 19 march 2004.

Eygene Ryabinkin, on behalf of KI and JINR Grid teams Russian Tier-1 status report May 9th 2014, WLCG Overview Board meeting.

A. Mohapatra, T. Sarangi, HEPiX-Lincoln, NE1 University of Wisconsin-Madison CMS Tier-2 Site Report D. Bradley, S. Dasu, A. Mohapatra, T. Sarangi, C. Vuosalo.

IHEP Computing Site Report Shi, Jingyan Computing Center, IHEP.

Data transfers and storage Kilian Schwarz GSI. GSI – current storage capacities vobox LCG RB/CE GSI batchfarm: ALICE cluster (67 nodes/480 cores for batch.

CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.

INRNE's participation in LCG Elena Puncheva Preslav Konstantinov IT Department.

Western Tier 2 Site at SLAC Wei Yang US ATLAS Tier 2 Workshop Harvard University August 17-18, 2006.

Status of GSDC, KISTI Sang-Un Ahn, for the GSDC Tier-1 Team

Operations in R ussian D ata I ntensive G rid Andrey Zarochentsev SPbSU, St. Petersburg Gleb Stiforov JINR, Dubna ALICE T1/T2 workshop in Tsukuba, Japan,

Dominique Boutigny December 12, 2006 CC-IN2P3 a Tier-1 for W-LCG 1 st Chinese – French Workshop on LHC Physics and associated Grid Computing IHEP - Beijing.

IHEP Computing Center Site Report Gang Chen Computing Center Institute of High Energy Physics 2011 Spring Meeting.

Global Science experimental Data hub Center April 25, 2016 Seo-Young Noh Status Report on KISTI’s Computing Activities.

Activities and Perspectives at Armenian Grid site The 6th International Conference "Distributed Computing and Grid- technologies in Science and Education"

KIT – Universität des Landes Baden-Württemberg und nationales Forschungszentrum in der Helmholtz-Gemeinschaft Steinbuch Centre for Computing

1 1 – Statistical information about our resource centers; ; 2 – Basic infrastructure of the Tier-1 & 2 centers; 3 – Some words about the future.

The status of IHEP Beijing Site WLCG Asia-Pacific Workshop Yaodong CHENG IHEP, China 01 December 2006.

UK Status and Plans Catalin Condurache – STFC RAL ALICE Tier-1/Tier-2 Workshop University of Torino, February 2015.

Pledged and delivered resources to ALICE Grid computing in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week.

Grid Operations in Germany T1-T2 workshop 2015 Torino, Italy Kilian Schwarz WooJin Park Christopher Jung.

Tier-1 Data Storage Challenges Extreme Data Workshop Andrew Sansum 20 th April 2012.

Dynamic Extension of the INFN Tier-1 on external resources

on behalf of the NRC-KI Tier-1 team

LCG Service Challenge: Planning and Milestones

Operations and plans - Polish sites

Andrea Chierici On behalf of INFN-T1 staff

October 28, 2013 at 14th CERN-Korea Committee, Geneva

Yaodong CHENG Computing Center, IHEP, CAS 2016 Fall HEPiX Workshop

Kolkata Status and Plan

Update on Plan for KISTI-GSDC

Russian Regional Center for LHC Data Analysis

KISTI-GSDC Tier-1 SITE REPORT

Proposal for obtaining installed capacity

Clouds of JINR, University of Sofia and INRNE Join Together

Luca dell’Agnello INFN-CNAF

on behalf of the NRC-KI Tier-1 team

RDIG for ALICE today and in future

Christof Hanke, HEPIX Spring Meeting 2008, CERN

IPv6 update Duncan Rand Imperial College London

Presentation transcript:

Development of a Tier-1 computing cluster at National Research Centre 'Kurchatov Institute' Igor Tkachenko on behalf of the NRC-KI Tier-1 team National Research Centre "Kurchatov Institute“ Moscow, Russian Federation

Current resources Total 536 Tb of disk-based storage Job slots: – 1440 HT job slots with good network – 2880 HT job slots with slow network 2Pb of tape-based storage ALICE 720 HT job slots with good network 156 Tb of disk-based storage 0 Tb of tape-based storage  2

Networking 2.5 Gbit/sec to GEANT, 1 Gbit/sec to Gloriad and 10+ Gbit/sec as the local connectivity within RU Grid. LHCOPN channel to CERN being set up: – MOS-AMS circuit is already established and we’re solving last-mile problems. – MOS-BUD circuit is in the process of being established: there are some problems with DWDM equipment colocation at Wigner DC, but I believe they will be solved. It is said that Gloriad connection from US to Amsterdam will be bumped to 10 Gbit/sec this summer, so we will have an opportunity to have more bandwidth to US. This is TBD. We’re in the process of talking with DANTE/GEANT about getting direct connectivity for RU WLCG community directly from GEANT (resurrecting GEANT PoP in Russia); it is slow and we do believe that it will appear only next year (if we will be able to agree on political and monetary conditions with DANTE). 3

Networking (continued) Direct peering with all major RU Grid sites: JINR, IHEP (Protvino), ITEP and PNPI. And we created our own peering equipment at M9 (largest traffic exchange site in Moscow), so other sites can peer with us without any problems. We had also doubled our networking capacity to M9 to accommodate bandwidth requirements. We’re currently organizing LHCONE connectivity for all our sites. US peering should come first (circuit is ready, we’re in the process of setting up BGP); NORDUnet VRF connectivity is on the radar. GEANT VRF will depend on the outcome from agreement between us and DANTE. 4

Network connectivity LHCOPN -- 10Gbit/sec with 5Gbit/sec reservation – in process. 2.5Gbit/sec is ready, full capacity will be in the end of March; GEANT -- 5Gbit/sec for all sites of Kurcatov Institute (RRC_KI+RRC_KI_T1, IHEP, ITEP and PNPI) with rates 2:2:0.5:0.5 Gbit/sec per site. USA connectivity -- 1 Gbit/sec via Gloriad for all site of Kurchatov Institue. Direct peering and channels between RRC_KI_T1(+RRC_KI) and JINR, ITEP, IHEP, MSU is ready and work. Direct channel between RRC_KI_T1 and PNPI is planned. LHCONE have low priority. RRC_KI_T1 try to connect to CERN, Holland and USA via LHCONE in test mode. 5

Storage Disk based EOS based installation 6 pool nodes XFS file system Hardware raid6 Tape based (in developing) dCache + enstore installation Native xrootd support Up to 800Tb of tapes for first time 6 cache nodes 3 tape drives 6

Jobs Torque with Maui as scheduler CVMFS based VO software area with cluster of proxy servers SL6 at all WNs, mostly EMI-3 as the middleware. 24h fair share rates 2Gb of RAM per core 6Gb of vmem per core 24h limit of job execution time 7

Cluster management system Puppet-based management – About 5000 rows of puppet code – Disabled automatically manifests appliance – implemented 40 own classes to support our resources Shell-based + kickstart-based system installation pdsh for task automation and parallel execution on many machines 8

Resource allocation policies CPU – easy to give, easy to take back. As it possible allocation Disk + tape easy give, hard to take back. Allocation on demand: new resources may be allocated when current resource usage > 80% 9

Monitoring Current workflow: – Alexandra Berezhnaya leads the team that walks through all defined monitoring endpoints, determines and classifies problems; – For known problems there are described ways to handle them, so it is done in almost "don’t think, act" mode; – For new problems and ones that can’t be processed by first line, they are passed to the free administrators who handle them to the end (or, at least, are trying to); – Problems on our and foreign resources are the same for us: if we see the latter, we’re trying to understand it and, possibly open GGUS ticket or alike. We believe that this provides early alerting and some help to other sites as well as training for our own people. Workflow for 24x7 staff: basically, it will be the same, but current "Operation Bible" should be updated and expanded to allow for less educated staff to handle the majority of a known problems successfully. 10

IPv6 readiness in KI April 2014 – start of IPv6 implementation April-July 2014 – IPv6 test in non Grid networks August 2014 – IPv6 will be ready in KI. Are Alice and all other VOs will be ready to IPv6? 11

Nearest future development Tapes for Alice – to the end of March - April Good network for all nodes – to the end of Jun Additional storage nodes – to the end of August 12

Resources perspective (for all VO) About 10% of sum of all Tier-1 – About 6000 job slots – About 6,3Pb of disk storage – About 7,5Pb of tape storage 13

Questions, comments? 14