Site report: Tokyo Tomoaki Nakamura ICEPP, The University of Tokyo 2014/12/10Tomoaki Nakamura1
Update from the last year 2014/12/10Tomoaki Nakamura2 No HW upgrade from the last year for Grid resources CPU cores (18.03 HS06/core) -RAM (2GB/core for 1280CPU, 4GB/core for 1280CPU) -No memory upgrade until the end of 2015 (considered at last year) -2000PB for pledged Disk (2014) and ~600TB for LocalGroupDisk All service instance have been migrated to EMI3 -CREAM, DPM, BDII (site/top), Arugus, gLexec-WN, APEL -WMS, LB, MyProxy: can be decommissioned for ATLAS The other service instance -perfSONAR (latency 1G, bandwidth 1G, bandwidth 10G) -Squid (condDB x 2 + CVMFS x 2) Services for ATLAS have been deployed -DPM-WebdDAV: used for Rucio renaming, will be used for central deletion -DPM-XrootD and FAX setup: connected with Asia redirector -Multi core queuex: 512 cores, 20% of resources, 64 static 8-core slots No HW upgrade from the last year for Grid resources CPU cores (18.03 HS06/core) -RAM (2GB/core for 1280CPU, 4GB/core for 1280CPU) -No memory upgrade until the end of 2015 (considered at last year) -2000PB for pledged Disk (2014) and ~600TB for LocalGroupDisk All service instance have been migrated to EMI3 -CREAM, DPM, BDII (site/top), Arugus, gLexec-WN, APEL -WMS, LB, MyProxy: can be decommissioned for ATLAS The other service instance -perfSONAR (latency 1G, bandwidth 1G, bandwidth 10G) -Squid (condDB x 2 + CVMFS x 2) Services for ATLAS have been deployed -DPM-WebdDAV: used for Rucio renaming, will be used for central deletion -DPM-XrootD and FAX setup: connected with Asia redirector -Multi core queuex: 512 cores, 20% of resources, 64 static 8-core slots
FAX remote access 2014/12/10Tomoaki Nakamura3 4TB / day = ~46 MB / sec
ASAP (all data) 2014/12/10Tomoaki Nakamura4 (ATLAS Site Availability Performance) 99.77%
Pledge for the next year and beyond 2014/12/10Tomoaki Nakamura5 For FY2015 -Increase 400TB to pledge -528TB (8 servers) will be added to DPM by the end of Mar Total DPM capacity: 3168TB (~750TB for LocalGroupDisk) End of End of this system -Procurement work will start from the next spring -If we can get 6TB HDD, total storage capacity can be doubled at 4th system For FY2015 -Increase 400TB to pledge -528TB (8 servers) will be added to DPM by the end of Mar Total DPM capacity: 3168TB (~750TB for LocalGroupDisk) End of End of this system -Procurement work will start from the next spring -If we can get 6TB HDD, total storage capacity can be doubled at 4th system
International network for Tokyo 2014/12/10Tomoaki Nakamura6 TOKYO ASGC BNL TRIUMF NDGF RAL CCIN2P3 CERN CANF PIC SARA NIKEF LA Pacific Atlantic 10Gbps WIX New line (10Gbps) since May OSAKA 40Gbps 10x3 Gbps 10 Gbps Amsterdam Geneva Dedicated line Frankfurt
Configuration for the LHCONE evaluation 2014/12/10Tomoaki Nakamura7 MLXe32 (10G) Dell8024 (10G) Dell 5448 (1G) Catalyst 6500 (10G) Catalyst 3750 (10G) NY DC LA Dell8024 (10G) UI (Gridftp) perfSONAR (Latency) perfSONAR (Latency) perfSONAR (Bandwidth) perfSONAR (Bandwidth) perfSONAR (Latency/Bandwidth) perfSONAR (Latency/Bandwidth) UI (Gridftp) ICEPP (production) /21 UTnet SINET IPv4/v6 LHCONE BGP peering ICEPP (LHCONE evaluation) /24 10Gbps 1Gbps
Stability on packet loss (CC-IN2P3) 2014/12/10Tomoaki Nakamura8 Directly affect to transfer rate.
Fraction of packet loss (NY vs. DC) 2014/12/10Tomoaki Nakamura9 Comparable each other.
Minimum latency (CC-IN2P3) 2014/12/10Tomoaki Nakamura10 Useful to know the typical latency and stability.
Minimum latency (CC-IN2P3) 2014/12/10Tomoaki Nakamura11 Originating from other group in Univ. of Tokyo.
Distribution of Minimum latency (CC-IN2P3) 2014/12/10Tomoaki Nakamura12
Distribution of Minimum latency (CC-IN2P3) 2014/12/10Tomoaki Nakamura13 originating from other group.miss measurement.
Maximum latency (CC-IN2P3) 2014/12/10Tomoaki Nakamura14 Useful to find problems.
Maximum latency (CC-IN2P3) 2014/12/10Tomoaki Nakamura15 Also have spikes. Additional periodic noise.
Distribution of Maximum latency (CC-IN2P3) 2014/12/10Tomoaki Nakamura16
Distribution of Maximum latency (CC-IN2P3) 2014/12/10Tomoaki Nakamura17 Discrepancy due to the periodic noise.
Also for the other sites 2014/12/10Tomoaki Nakamura18 (US) (FR) One of the perfsonar instance in Tokyo seems to fall into the busy state once in a day. It is independent of source sites. But, no significant errors in system and service logs.
Maximum latency (masked by time) 2014/12/10Tomoaki Nakamura19 Periodic nose can be cleaned up.
Maximum latency by mask (CC-IN2P3) 2014/12/10Tomoaki Nakamura20 Still remaining, but comparable.
Bandwidth measurement (CC-IN2P3 and CNAF) 2014/12/10Tomoaki Nakamura21 Asymmetric ~38 MB/s (incoming) ~28 MB/s (outgoing) Symmetric, but unstable ~34 MB/s (incoming) ~35 MB/s (outgoing)
Minimum latency (CC-IN2P3 in 2014) 2014/12/10Tomoaki Nakamura22
Minimum latency (CC-IN2P3 in 2014) 2014/12/10Tomoaki Nakamura23 Spikes were gone. Average value is split.
Latency in one day (CC-IN2P3) 2014/12/10Tomoaki Nakamura24 Both production line via NY Incoming Outgoing Load balancing somewhere in NY or GEANT?
Maximum latency (CC-IN2P3, 2014) 2014/12/10Tomoaki Nakamura25 Some improvement in FR-Geneva?
Bandwidth measurement (latest data) 2014/12/10Tomoaki Nakamura26 Still asymmetric ~35 MB/s (incoming) ~24 MB/s (outgoing) Symmetric, and very stable ~32 MB/s (incoming) ~30 MB/s (outgoing)
Configuration for the LHCONE evaluation 2014/12/10Tomoaki Nakamura27 MLXe32 (10G) Dell8024 (10G) Dell 5448 (1G) Catalyst 6500 (10G) Catalyst 3750 (10G) NY DC LA Dell8024 (10G) UI (Gridftp) perfSONAR (Latency) perfSONAR (Latency) perfSONAR (Bandwidth) perfSONAR (Bandwidth) perfSONAR (Latency/Bandwidth) perfSONAR (Latency/Bandwidth) UI (Gridftp) ICEPP (production) /21 UTnet SINET IPv4/v6 LHCONE BGP peering ICEPP (LHCONE evaluation) /24 10Gbps 1Gbps
LHCONE (EU sites) for all production servers 2014/12/10Tomoaki Nakamura28 MLXe32 (10G) Dell8024 (10G) Dell 5448 (1G) Catalyst 6500 (10G) Catalyst 3750 (10G) NY DC LA Dell8024 (10G) UI (Gridftp) perfSONAR (Latency) perfSONAR (Latency) perfSONAR (Bandwidth) perfSONAR (Bandwidth) perfSONAR (Latency/Bandwidth) perfSONAR (Latency/Bandwidth) UI (Gridftp) ICEPP (production) /21 UTnet SINET IPv4/v6 LHCONE BGP peering ICEPP (LHCONE evaluation) /24 10Gbps 1Gbps
Nov. 11, 2014 (latency for CCIN2P3) 2014/12/10Tomoaki Nakamura29
Nov. 11, 2014 (latency for CNAF) 2014/12/10Tomoaki Nakamura30
Nov. 11 (throughput for CCIN2P3) 2014/12/10Tomoaki Nakamura31
Nov. 11 (throughput for CNAF) 2014/12/10Tomoaki Nakamura32
Dec. 7, 2014 (incoming B.W. is saturated) 2014/12/10Tomoaki Nakamura33 User subscription of AOD via DaTri physics.Egampa, 8TeV all period: ~150TB Still on going today (continuously several days)
Breakdown from GridFTP log 2014/12/10Tomoaki Nakamura34 Part of LHCONE contribution Mainly FTS3 and direct transfer from multiple sites 10 min. bin 1 min. bin
Near future and Concerns 2014/12/10Tomoaki Nakamura35 LHCONE -Next for US and Canada -And then, for Asisa (ASGC, IHEP) Network Bandwidth -2015: more 10G from ICEPP to SINET? UTokyo is offering, but depends on them. -JFY2016: SINET will be upgraded (SINET5) 100G for US (LA) 20G for EU (reverse around) EMI3 -End of full support April 30, End of standard update October 31, End of security update April 30, 2015 Batch job system Troque/Maui, no more support, not effective dynamic multi-core allocation HTCondor, SLURM or the other commercial product (UNIVA GE, LSF) LHCONE -Next for US and Canada -And then, for Asisa (ASGC, IHEP) Network Bandwidth -2015: more 10G from ICEPP to SINET? UTokyo is offering, but depends on them. -JFY2016: SINET will be upgraded (SINET5) 100G for US (LA) 20G for EU (reverse around) EMI3 -End of full support April 30, End of standard update October 31, End of security update April 30, 2015 Batch job system Troque/Maui, no more support, not effective dynamic multi-core allocation HTCondor, SLURM or the other commercial product (UNIVA GE, LSF)