Site report: Tokyo Tomoaki Nakamura ICEPP, The University of Tokyo 2014/12/10Tomoaki Nakamura1.

Slides:



Advertisements
Similar presentations
Exporting Raw/ESD data from Tier-0 Tier-1s Wrap-up.
Advertisements

Current Testbed : 100 GE 2 sites (NERSC, ANL) with 3 nodes each. Each node with 4 x 10 GE NICs Measure various overheads from protocols and file sizes.
ATLAS Tier-3 in Geneva Szymon Gadomski, Uni GE at CSCS, November 2009 S. Gadomski, ”ATLAS T3 in Geneva", CSCS meeting, Nov 091 the Geneva ATLAS Tier-3.
ASGC Updated HSIN-YEN CHEN LHCONE workshop CERN 10 Feb
Wahid Bhimji Andy Washbrook And others including ECDF systems team Not a comprehensive update but what ever occurred to me yesterday.
ATLAS computing in Geneva Szymon Gadomski, NDGF meeting, September 2009 S. Gadomski, ”ATLAS computing in Geneva", NDGF, Sept 091 the Geneva ATLAS Tier-3.
ATLAS computing in Geneva 268 CPU cores (login + batch) 180 TB for data the analysis facility for Geneva group grid batch production for ATLAS special.
INFN-T1 site report Giuseppe Misurelli On behalf of INFN-T1 staff HEPiX Spring 2015.
Status report from Tokyo Tier-2 for the one year operation after whole scale system upgrade 2014/5/20Tomoaki Nakamura ICEPP, UTokyo1 Tomoaki Nakamura on.
Outline Network related issues and thinking for FAX Cost among sites, who has problems Analytics of FAX meta data, what are the problems  The main object.
CERN IT Department CH-1211 Geneva 23 Switzerland t T0 report WLCG operations Workshop Barcelona, 07/07/2014 Maite Barroso, CERN IT.
ALICE Tier-2 at Hiroshima Toru Sugitate of Hiroshima University for ALICE-Japan GRID Team LHCONE workshop at the APAN 38 th.
Data oriented job submission scheme for the PHENIX user analysis in CCJ Tomoaki Nakamura, Hideto En’yo, Takashi Ichihara, Yasushi Watanabe and Satoshi.
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
RAL Site Report HEPiX Fall 2013, Ann Arbor, MI 28 Oct – 1 Nov Martin Bly, STFC-RAL.
Status Report on Tier-1 in Korea Gungwon Kang, Sang-Un Ahn and Hangjin Jang (KISTI GSDC) April 28, 2014 at 15th CERN-Korea Committee, Geneva Korea Institute.
Status of WLCG Tier-0 Maite Barroso, CERN-IT With input from T0 service managers Grid Deployment Board 9 April Apr-2014 Maite Barroso Lopez (at)
José M. Hernández CIEMAT Grid Computing in the Experiment at LHC Jornada de usuarios de Infraestructuras Grid January 2012, CIEMAT, Madrid.
Oxford Update HEPix Pete Gronbech GridPP Project Manager October 2014.
LCG Service Challenge Phase 4: Piano di attività e impatto sulla infrastruttura di rete 1 Service Challenge Phase 4: Piano di attività e impatto sulla.
WLCG Service Report ~~~ WLCG Management Board, 24 th November
Site Report: Tokyo Tomoaki Nakamura ICEPP, The University of Tokyo 2013/12/13Tomoaki Nakamura ICEPP, UTokyo1.
Data transfer over the wide area network with a large round trip time H. Matsunaga, T. Isobe, T. Mashimo, H. Sakamoto, I. Ueda International Center for.
CCRC’08 Weekly Update Jamie Shiers ~~~ LCG MB, 1 st April 2008.
Status Report of WLCG Tier-1 candidate for KISTI-GSDC Sang-Un Ahn, for the GSDC Tier-1 Team GSDC Tier-1 Team 12 th CERN-Korea.
UKI-SouthGrid Update Hepix Pete Gronbech SouthGrid Technical Coordinator April 2012.
MW Readiness Verification Status Andrea Manzi IT/SDC 21/01/ /01/15 2.
Analysis in STEP09 at TOKYO Hiroyuki Matsunaga University of Tokyo WLCG STEP'09 Post-Mortem Workshop.
Status of ATLAS T2 in Tokyo
INFN TIER1 (IT-INFN-CNAF) “Concerns from sites” Session LHC OPN/ONE “Networking for WLCG” Workshop CERN, Stefano Zani
Alberto Aimar CERN – LCG1 Reliability Reports – May 2007
Status Report of WLCG Tier-1 candidate for KISTI-GSDC Sang-Un Ahn, for the GSDC Tier-1 Team GSDC Tier-1 Team ATHIC2012, Busan,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.
Xrootd Monitoring and Control Harsh Arora CERN. Setting Up Service  Monalisa Service  Monalisa Repository  Test Xrootd Server  ApMon Module.
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
Site Report: Prague Jiří Chudoba Institute of Physics, Prague WLCG GridKa+T2s Workshop.
LCG Report from GDB John Gordon, STFC-RAL MB meeting February24 th, 2009.
Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.
WLCG Service Report ~~~ WLCG Management Board, 18 th September
Eygene Ryabinkin, on behalf of KI and JINR Grid teams Russian Tier-1 status report May 9th 2014, WLCG Overview Board meeting.
ICEPP regional analysis center (TOKYO-LCG2) ICEPP, The University of Tokyo 2013/5/15Tomoaki Nakamura ICEPP, Univ. of Tokyo1.
Connect communicate collaborate LHCONE European design & implementation Roberto Sabatino, DANTE LHCONE Meeting, Washington, June
WLCG Operations Coordination report Maria Alandes, Andrea Sciabà IT-SDC On behalf of the WLCG Operations Coordination team GDB 9 th April 2014.
Summary of SC4 Disk-Disk Transfers LCG MB, April Jamie Shiers, CERN.
Dominique Boutigny December 12, 2006 CC-IN2P3 a Tier-1 for W-LCG 1 st Chinese – French Workshop on LHC Physics and associated Grid Computing IHEP - Beijing.
ATLAS Computing Model Ghita Rahal CC-IN2P3 Tutorial Atlas CC, Lyon
UK Status and Plans Catalin Condurache – STFC RAL ALICE Tier-1/Tier-2 Workshop University of Torino, February 2015.
ATLAS Computing: Experience from first data processing and analysis Workshop TYL’10.
Dynamic Extension of the INFN Tier-1 on external resources
WLCG IPv6 deployment strategy
COMPUTING FOR ALICE IN THE CZECH REPUBLIC in 2016/2017
LHCOPN lambda and fibre routing Episode 4 (the path to resilience…)
LHCOPN/LHCONE status report pre-GDB on Networking CERN, Switzerland 10th January 2017
“A Data Movement Service for the LHC”
LCG Service Challenge: Planning and Milestones
Status Report on LHC_2 : ATLAS computing
LHCOPN update Brookhaven, 4th of April 2017
Update on SINET5 implementation for ICEPP (ATLAS) and KEK (Belle II)
HEPiX Spring 2014 Annecy-le Vieux May Martin Bly, STFC-RAL
INFN Computing infrastructure - Workload management at the Tier-1
October 28, 2013 at 14th CERN-Korea Committee, Geneva
LCG Deployment in Japan
Update on Plan for KISTI-GSDC
Deployment of IPv6-only CPU on WLCG – an update from the HEPiX IPv6 WG
Deployment of IPv6-only CPU on WLCG – an update from the HEPiX IPv6 WG
ATLAS Sites Jamboree, CERN January, 2017
Project: COMP_01 R&D for ATLAS Grid computing
PK-CIIT Grid Operations in Pakistan
Tony Cass, Edoardo Martelli
Presentation transcript:

Site report: Tokyo Tomoaki Nakamura ICEPP, The University of Tokyo 2014/12/10Tomoaki Nakamura1

Update from the last year 2014/12/10Tomoaki Nakamura2 No HW upgrade from the last year for Grid resources CPU cores (18.03 HS06/core) -RAM (2GB/core for 1280CPU, 4GB/core for 1280CPU) -No memory upgrade until the end of 2015 (considered at last year) -2000PB for pledged Disk (2014) and ~600TB for LocalGroupDisk All service instance have been migrated to EMI3 -CREAM, DPM, BDII (site/top), Arugus, gLexec-WN, APEL -WMS, LB, MyProxy: can be decommissioned for ATLAS The other service instance -perfSONAR (latency 1G, bandwidth 1G, bandwidth 10G) -Squid (condDB x 2 + CVMFS x 2) Services for ATLAS have been deployed -DPM-WebdDAV: used for Rucio renaming, will be used for central deletion -DPM-XrootD and FAX setup: connected with Asia redirector -Multi core queuex: 512 cores, 20% of resources, 64 static 8-core slots No HW upgrade from the last year for Grid resources CPU cores (18.03 HS06/core) -RAM (2GB/core for 1280CPU, 4GB/core for 1280CPU) -No memory upgrade until the end of 2015 (considered at last year) -2000PB for pledged Disk (2014) and ~600TB for LocalGroupDisk All service instance have been migrated to EMI3 -CREAM, DPM, BDII (site/top), Arugus, gLexec-WN, APEL -WMS, LB, MyProxy: can be decommissioned for ATLAS The other service instance -perfSONAR (latency 1G, bandwidth 1G, bandwidth 10G) -Squid (condDB x 2 + CVMFS x 2) Services for ATLAS have been deployed -DPM-WebdDAV: used for Rucio renaming, will be used for central deletion -DPM-XrootD and FAX setup: connected with Asia redirector -Multi core queuex: 512 cores, 20% of resources, 64 static 8-core slots

FAX remote access 2014/12/10Tomoaki Nakamura3 4TB / day = ~46 MB / sec

ASAP (all data) 2014/12/10Tomoaki Nakamura4 (ATLAS Site Availability Performance) 99.77%

Pledge for the next year and beyond 2014/12/10Tomoaki Nakamura5 For FY2015 -Increase 400TB to pledge -528TB (8 servers) will be added to DPM by the end of Mar Total DPM capacity: 3168TB (~750TB for LocalGroupDisk) End of End of this system -Procurement work will start from the next spring -If we can get 6TB HDD, total storage capacity can be doubled at 4th system For FY2015 -Increase 400TB to pledge -528TB (8 servers) will be added to DPM by the end of Mar Total DPM capacity: 3168TB (~750TB for LocalGroupDisk) End of End of this system -Procurement work will start from the next spring -If we can get 6TB HDD, total storage capacity can be doubled at 4th system

International network for Tokyo 2014/12/10Tomoaki Nakamura6 TOKYO ASGC BNL TRIUMF NDGF RAL CCIN2P3 CERN CANF PIC SARA NIKEF LA Pacific Atlantic 10Gbps WIX New line (10Gbps) since May OSAKA 40Gbps 10x3 Gbps 10 Gbps Amsterdam Geneva Dedicated line Frankfurt

Configuration for the LHCONE evaluation 2014/12/10Tomoaki Nakamura7 MLXe32 (10G) Dell8024 (10G) Dell 5448 (1G) Catalyst 6500 (10G) Catalyst 3750 (10G) NY DC LA Dell8024 (10G) UI (Gridftp) perfSONAR (Latency) perfSONAR (Latency) perfSONAR (Bandwidth) perfSONAR (Bandwidth) perfSONAR (Latency/Bandwidth) perfSONAR (Latency/Bandwidth) UI (Gridftp) ICEPP (production) /21 UTnet SINET IPv4/v6 LHCONE BGP peering ICEPP (LHCONE evaluation) /24 10Gbps 1Gbps

Stability on packet loss (CC-IN2P3) 2014/12/10Tomoaki Nakamura8 Directly affect to transfer rate.

Fraction of packet loss (NY vs. DC) 2014/12/10Tomoaki Nakamura9 Comparable each other.

Minimum latency (CC-IN2P3) 2014/12/10Tomoaki Nakamura10 Useful to know the typical latency and stability.

Minimum latency (CC-IN2P3) 2014/12/10Tomoaki Nakamura11 Originating from other group in Univ. of Tokyo.

Distribution of Minimum latency (CC-IN2P3) 2014/12/10Tomoaki Nakamura12

Distribution of Minimum latency (CC-IN2P3) 2014/12/10Tomoaki Nakamura13 originating from other group.miss measurement.

Maximum latency (CC-IN2P3) 2014/12/10Tomoaki Nakamura14 Useful to find problems.

Maximum latency (CC-IN2P3) 2014/12/10Tomoaki Nakamura15 Also have spikes. Additional periodic noise.

Distribution of Maximum latency (CC-IN2P3) 2014/12/10Tomoaki Nakamura16

Distribution of Maximum latency (CC-IN2P3) 2014/12/10Tomoaki Nakamura17 Discrepancy due to the periodic noise.

Also for the other sites 2014/12/10Tomoaki Nakamura18 (US) (FR) One of the perfsonar instance in Tokyo seems to fall into the busy state once in a day. It is independent of source sites. But, no significant errors in system and service logs.

Maximum latency (masked by time) 2014/12/10Tomoaki Nakamura19 Periodic nose can be cleaned up.

Maximum latency by mask (CC-IN2P3) 2014/12/10Tomoaki Nakamura20 Still remaining, but comparable.

Bandwidth measurement (CC-IN2P3 and CNAF) 2014/12/10Tomoaki Nakamura21 Asymmetric ~38 MB/s (incoming) ~28 MB/s (outgoing) Symmetric, but unstable ~34 MB/s (incoming) ~35 MB/s (outgoing)

Minimum latency (CC-IN2P3 in 2014) 2014/12/10Tomoaki Nakamura22

Minimum latency (CC-IN2P3 in 2014) 2014/12/10Tomoaki Nakamura23 Spikes were gone. Average value is split.

Latency in one day (CC-IN2P3) 2014/12/10Tomoaki Nakamura24 Both production line via NY Incoming Outgoing Load balancing somewhere in NY or GEANT?

Maximum latency (CC-IN2P3, 2014) 2014/12/10Tomoaki Nakamura25 Some improvement in FR-Geneva?

Bandwidth measurement (latest data) 2014/12/10Tomoaki Nakamura26 Still asymmetric ~35 MB/s (incoming) ~24 MB/s (outgoing) Symmetric, and very stable ~32 MB/s (incoming) ~30 MB/s (outgoing)

Configuration for the LHCONE evaluation 2014/12/10Tomoaki Nakamura27 MLXe32 (10G) Dell8024 (10G) Dell 5448 (1G) Catalyst 6500 (10G) Catalyst 3750 (10G) NY DC LA Dell8024 (10G) UI (Gridftp) perfSONAR (Latency) perfSONAR (Latency) perfSONAR (Bandwidth) perfSONAR (Bandwidth) perfSONAR (Latency/Bandwidth) perfSONAR (Latency/Bandwidth) UI (Gridftp) ICEPP (production) /21 UTnet SINET IPv4/v6 LHCONE BGP peering ICEPP (LHCONE evaluation) /24 10Gbps 1Gbps

LHCONE (EU sites) for all production servers 2014/12/10Tomoaki Nakamura28 MLXe32 (10G) Dell8024 (10G) Dell 5448 (1G) Catalyst 6500 (10G) Catalyst 3750 (10G) NY DC LA Dell8024 (10G) UI (Gridftp) perfSONAR (Latency) perfSONAR (Latency) perfSONAR (Bandwidth) perfSONAR (Bandwidth) perfSONAR (Latency/Bandwidth) perfSONAR (Latency/Bandwidth) UI (Gridftp) ICEPP (production) /21 UTnet SINET IPv4/v6 LHCONE BGP peering ICEPP (LHCONE evaluation) /24 10Gbps 1Gbps

Nov. 11, 2014 (latency for CCIN2P3) 2014/12/10Tomoaki Nakamura29

Nov. 11, 2014 (latency for CNAF) 2014/12/10Tomoaki Nakamura30

Nov. 11 (throughput for CCIN2P3) 2014/12/10Tomoaki Nakamura31

Nov. 11 (throughput for CNAF) 2014/12/10Tomoaki Nakamura32

Dec. 7, 2014 (incoming B.W. is saturated) 2014/12/10Tomoaki Nakamura33 User subscription of AOD via DaTri physics.Egampa, 8TeV all period: ~150TB Still on going today (continuously several days)

Breakdown from GridFTP log 2014/12/10Tomoaki Nakamura34 Part of LHCONE contribution Mainly FTS3 and direct transfer from multiple sites 10 min. bin 1 min. bin

Near future and Concerns 2014/12/10Tomoaki Nakamura35 LHCONE -Next for US and Canada -And then, for Asisa (ASGC, IHEP) Network Bandwidth -2015: more 10G from ICEPP to SINET? UTokyo is offering, but depends on them. -JFY2016: SINET will be upgraded (SINET5) 100G for US (LA) 20G for EU (reverse around) EMI3 -End of full support April 30, End of standard update October 31, End of security update April 30, 2015 Batch job system Troque/Maui, no more support, not effective dynamic multi-core allocation HTCondor, SLURM or the other commercial product (UNIVA GE, LSF) LHCONE -Next for US and Canada -And then, for Asisa (ASGC, IHEP) Network Bandwidth -2015: more 10G from ICEPP to SINET? UTokyo is offering, but depends on them. -JFY2016: SINET will be upgraded (SINET5) 100G for US (LA) 20G for EU (reverse around) EMI3 -End of full support April 30, End of standard update October 31, End of security update April 30, 2015 Batch job system Troque/Maui, no more support, not effective dynamic multi-core allocation HTCondor, SLURM or the other commercial product (UNIVA GE, LSF)