Download presentation
Presentation is loading. Please wait.
Published byKerry Payne Modified over 8 years ago
1
IHEP Computing Center Site Report Shi, Jingyan (shi.jingyan@ihep.ac.cn) Computing Center, IHEP
2
Shi,Jingyan/CC/IHEP 2016-11-19 - 2 IHEP at a Glance ~1000 staffs, 2/3 scientists and engineers ~1000 staffs, 2/3 scientists and engineers The largest fundamental research center in China with research fields: The largest fundamental research center in China with research fields: Experimental particle physicsExperimental particle physics Theoretical particle physicsTheoretical particle physics Astrophysics and cosmic raysAstrophysics and cosmic rays Accelerator technology and applicationsAccelerator technology and applications Synchrotron radiation and applicationsSynchrotron radiation and applications Nuclear analysis techniqueNuclear analysis technique Computing and Network applicationComputing and Network application
3
Shi,Jingyan/CC/IHEP 2016-11-19 - 3 Computing Environment in IHEP
4
Shi,Jingyan/CC/IHEP 2016-11-19 - 4 Computing Resources ~8000 cpu-cores ~8000 cpu-cores will reach 10000 cores in two months will reach 10000 cores in two months SL5.5 (64 bit) for WLCG,BES-III, YBJ,DaYa BaySL5.5 (64 bit) for WLCG,BES-III, YBJ,DaYa Bay SL4.5 (32 bit) for BES-IIISL4.5 (32 bit) for BES-III Will be upgraded in two monthsWill be upgraded in two months Blade system, IBM/HP/Dell Blade system, IBM/HP/Dell Blade links with GigE/IBBlade links with GigE/IB Chassis links to central switch with 10GigEChassis links to central switch with 10GigE Force10 E1200 Central Switch PC farm built with blades
5
Torque 2.5.5 + maui 3.2.6p21 Torque 2.5.5 + maui 3.2.6p21 Merge torque & AFS Merge torque & AFS Client / Server architecture Client / Server architecture Fake tokens dispatched by the server Fake tokens dispatched by the server Active MQ message passing Active MQ message passing Batch System
6
File system - Lustre 3 MDSs, 31 OSSs, 300+OSTs, 800 client nodes, 100 million files Lustre Version: 1.8.5 ( upgraded in July) Capacity: 1.7PB (slight change since May) All login clients has been upgraded to 64bit, get fewer crashes of login nodes IHEP is considering binding Lustre with CASTOR 1.7 using the HSM function provide by Lustre 2.x
7
Shi,Jingyan/CC/IHEP 2016-11-19 - 7 HSM Deployment Hardware Hardware Two IBM 3584 tape librariesTwo IBM 3584 tape libraries ~5800 slots , with 26 LTO-4 tape drivers~5800 slots , with 26 LTO-4 tape drivers 10 tape servers and 10 disk servers with 200TB disk pool10 tape servers and 10 disk servers with 200TB disk pool Software Software Customized version based on CASTOR 1.7.1.5Customized version based on CASTOR 1.7.1.5 Support the new types of hardwareSupport the new types of hardware Optimize the performance of tape read and write operationOptimize the performance of tape read and write operation Stager was re-writtenStager was re-written Network Network 10Gbps link between disk servers and tape servers10Gbps link between disk servers and tape servers
8
Shi,Jingyan/CC/IHEP 2016-11-19 - 8 Network connection Daya Bay Shen Zhen Beijing CSTNet Hong Kong IHEP USA GLORIAD 10G ASGC TEIN3 IPv4 10G IPv6 Beijing Tsinghua YBJ EUR. 2.5G 1G 45M 155M 2.5G Others EDU.CN 10G
9
Document Management Web Content Management 8000 CPU/Cores 5PB Tape Lib IaaS/PaaS/SaaS 2PB+ Storage
10
BEIJING-LCG2 Site Report
11
BEIJING-LCG2 Site report
12
Reliability and Availability
13
dCache Migration dCache In IHEP Total capacity was 320TB. 3 head nodes and 8 pool nodes dCache server version 1.9.5-25. Migrated from pnfs to chimera It takes about 40 hours to migrate.
14
DPM Upgrade DPM in IHEP DPM version : 1.7.4 update to 1.8 Total capacity was 320TB. 1 head node and 8 pool nodes Upgraded DPM Server OS from SL4 to SL5 Reinstalled DPM and restore the dpns database
15
CVMFS Deployed in IHEP Deployed cvmfs client on all the work nodes Setup a squid server as http proxy for the client Client version : 2.0.3-1 Supported VO : Atlas, CMS, BES
16
Cooling System Air Cooling system reached 75% of capacity Cool air partition was built in 2009 and 2010 New machines are coming
17
Cooling System Monitoring Blade racks are very hot due to the heavy duty of jobs
18
Cooling system upgrade Under going Water cooling rack: for blade server racks running Power Capacity: 800kW -> 800kW *2 Power supply for one row (10 racks): 100kW -> 270kW 6 companies have entered the bid Will be finished by the end of the year
19
Conclusion Farm works fine but more machines are coming Cooling system needs to be upgraded as soon as possible 32 bit OS (unstable) will be abandoned More machines => new problems?
20
Thank you!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.