Download presentation
Presentation is loading. Please wait.
Published byBertina Carroll Modified over 8 years ago
1
IHEP Computing Site Report Shi, Jingyan (shi.jingyan@ihep.ac.cn) Computing Center, IHEP
2
Outline Local Cluster Local Cluster LCG Site LCG Site Network Network Infrastructure Upgrade Infrastructure Upgrade Summary Summary Shi,Jingyan CC--IHEP 2016-3-6 - 2
3
Local Cluster 1000+ users, 200+ active ones 1000+ users, 200+ active ones For BES,YBJ,DYB experiments For BES,YBJ,DYB experiments 6500+ job slots (include 1500 slots new added)6500+ job slots (include 1500 slots new added) Storage:Storage: 3PB+ lustre 5PB+ tape lib Scheduler: Torque + mauiScheduler: Torque + maui Shi,Jingyan CC--IHEP 2016-3-6 - 3
4
Trouble in Lustre Lustre had been running well with high performance Lustre had been running well with high performance MDS problem happened by the end of Sept. MDS problem happened by the end of Sept. Big task to save data from the damaged Lustre Big task to save data from the damaged Lustre Shi,Jingyan CC--IHEP 2016-3-6 - 4
5
Trouble in Lustre (cont.) New rules need to be established to regulate storage usage New rules need to be established to regulate storage usage Limit users’ small size filesLimit users’ small size files Data file and user file should be kept in separated storageData file and user file should be kept in separated storage Any suggestion are warmly welcomeAny suggestion are warmly welcome Shi,Jingyan CC--IHEP 2016-3-6 - 5
6
LCG Tier II Site For CMS, ATLAS experiments For CMS, ATLAS experiments 1000+ Job slots 1000+ Job slots Storage: Storage: 320TB dCache320TB dCache 320TB dpm320TB dpm 1T disks will be replaced by 2T disks1T disks will be replaced by 2T disks 50T extra space will be added50T extra space will be added Shi,Jingyan CC--IHEP 2016-3-6 - 6
7
BEIJING-LCG2 Site report Shi,Jingyan CC--IHEP 2016-3-6 - 7
8
BEIJING-LCG2 Site report Shi,Jingyan CC--IHEP 2016-3-6 - 8
9
Reliability and Availability Shi,Jingyan CC--IHEP 2016-3-6 - 9
10
10 * IHEP Campus(Office) Network Star structure 10G Backbone WIFI Covered Over 3000 Users IPv4/IPv6 available for Users 10G IPv4 & IPv6 Link to CSTNet Shi,Jingyan CC--IHEP 2016-3-6 - 10
11
Orient + Network connection Daya Bay Beijing CSTNet Hong Kong IHEP USA GLORIAD 10G ASGC IPv4 10G IPv6 Beijing Tsinghua YBJ EUR. 2.5G 155M 2.5G Others EDU.CN 10G Shi,Jingyan CC--IHEP 2016-3-6 - 11
12
Perfsonar @ihep Two hosts for perfsonar Two hosts for perfsonar Perfsonar.ihep.ac.cn for Bandwidth testPerfsonar.ihep.ac.cn for Bandwidth test Perfsonar2.ihep.ac.cn for Latency testPerfsonar2.ihep.ac.cn for Latency test Network performance tuning is in progress between IHEP and Eur. Sites Network performance tuning is in progress between IHEP and Eur. Sites http://twiki.ihep.ac.cn/twiki/bin/v iew/InternationalConnectivity/IHE P-CCIN2P3http://twiki.ihep.ac.cn/twiki/bin/v iew/InternationalConnectivity/IHE P-CCIN2P3http://twiki.ihep.ac.cn/twiki/bin/v iew/InternationalConnectivity/IHE P-CCIN2P3http://twiki.ihep.ac.cn/twiki/bin/v iew/InternationalConnectivity/IHE P-CCIN2P3 Discussing with related people about the possibility in connecting IHEP to LHCONE Discussing with related people about the possibility in connecting IHEP to LHCONE Shi,Jingyan CC--IHEP 2016-3-6 - 12
13
Upgrade for Data Center Network Device Expansion Device Expansion PerformancePerformance The 10G(2Gbps->10Gbps) firewall is ready(based on Linux & iptables ) The lack of 10G portsThe lack of 10G ports Some devices are under test FROCE10 Z9000/4810 FROCE10 Z9000/4810 Arista 7148/7508 Arista 7148/7508 Topology Upgrade Topology Upgrade The Grid Area is isolatedThe Grid Area is isolated Arista 7148: for the area core switch Shi,Jingyan CC--IHEP 2016-3-6 - 13
14
Before the Upgrade Before the Upgrade Power consumption reached 90% of total capacityPower consumption reached 90% of total capacity Power supply of per rack can not support high density blade serversPower supply of per rack can not support high density blade servers Single-phase supply can not meet the needs of power systemSingle-phase supply can not meet the needs of power system Infrastructure Upgrade ——Power System Upgrade Shi,Jingyan CC--IHEP 2016-3-6 - 14
15
Add one power transformer Add one power transformer Power Capacity: 800kw -> 1800kwPower Capacity: 800kw -> 1800kw Increase the power supply of one rack Increase the power supply of one rack Power Supply Mode : Single-phase supply Three-phase supplyPower Supply Mode : Single-phase supply Three-phase supply Power supply for one rack : 10kw 28kwPower supply for one rack : 10kw 28kw Power System Upgrade Shi,Jingyan CC--IHEP 2016-3-6 - 15
16
Before the Upgrade Before the Upgrade Reached 80% of its total capacity Reached 80% of its total capacity Limited space limit the increase of cooling system Limited space limit the increase of cooling system Air cooling conditioner can not support high density blade servers Air cooling conditioner can not support high density blade servers ——Overheated island caused by high density blade servers ——Overheated island caused by high density blade servers Infrastructure Upgrade -- Cooling System Upgrade Shi,Jingyan CC--IHEP 2016-3-6 - 16
17
Water cooling rack Water cooling rack Inter-row air conditioning Inter-row air conditioning Cooling capacity per rack reaches 28kw Cooling capacity per rack reaches 28kw Cooling System Upgrade Shi,Jingyan CC--IHEP 2016-3-6 - 17
18
Sound barrier screen of outdoor unit Sound barrier screen of outdoor unit Reduce running noise Reduce running noise Cooling air partition need to be built Cooling air partition need to be built Improve cooling efficiency Improve cooling efficiency Monitoring System Monitoring System Infrastructure Upgrade ——Unfinished Work Shi,Jingyan CC--IHEP 2016-3-6 - 18
19
Outdoor Unit installation Shi,Jingyan CC--IHEP 2016-3-6 - 19
20
Outdoor Pipeline Installation Shi,Jingyan CC--IHEP 2016-3-6 - 20
21
Power Distribution Cabinet Installation Shi,Jingyan CC--IHEP 2016-3-6 - 21
22
Water cooling Rack Installation Shi,Jingyan CC--IHEP 2016-3-6 - 22
23
System Tuning Shi,Jingyan CC--IHEP 2016-3-6 - 23
24
Most part of computing environment running well Most part of computing environment running well Trouble in Storage Trouble in Storage Infrastructure upgrade meet its aim Infrastructure upgrade meet its aim Power supply 800kw 1800kwPower supply 800kw 1800kw Eliminate overheated islandEliminate overheated island Outlet air temperature of servers : 40 ℃ 27 ℃Outlet air temperature of servers : 40 ℃ 27 ℃ Summary Shi,Jingyan CC--IHEP 2016-3-6 - 24
25
Thank you! Questions? Shi,Jingyan– Kan, Bowen/CC/IHEP 2016-3-6 - 25
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.