IHEP Computing Center Site Report Gang Chen Computing Center Institute of High Energy Physics 2011 Spring Meeting.

Slides:



Advertisements
Similar presentations
BEIJING-LCG2 Tire2 Grid. Outline  Introduction about IHEP Grid  Fabric infrastructure of BeiJing-Lcg2  Middleware grid-service and supported VO’s.
Advertisements

IHEP Site Status Jingyan Shi, Computing Center, IHEP 2015 Spring HEPiX Workshop.
CASTOR Project Status CASTOR Project Status CERNIT-PDP/DM February 2000.
Smart Storage and Linux An EMC Perspective Ric Wheeler
SUMS Storage Requirement 250 TB fixed disk cache 130 TB annual increment for permanently on- line data 100 TB work area (not controlled by SUMS) 2 PB near-line.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
Peter Stefan, NIIF 29 June, 2007, Amsterdam, The Netherlands NIIF Storage Services Collaboration on Storage Services.
Mass RHIC Computing Facility Razvan Popescu - Brookhaven National Laboratory.
Lustre at Dell Overview Jeffrey B. Layton, Ph.D. Dell HPC Solutions |
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
Eos Center-wide File Systems Chris Fuson Outline 1 Available Center-wide File Systems 2 New Lustre File System 3 Data Transfer.
Farm Management D. Andreotti 1), A. Crescente 2), A. Dorigo 2), F. Galeazzi 2), M. Marzolla 3), M. Morandin 2), F.
D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.
Jefferson Lab Site Report Kelvin Edwards Thomas Jefferson National Accelerator Facility Newport News, Virginia USA
BesIII Computing Environment Computer Centre, IHEP, Beijing. BESIII Computing Environment.
Data transfer over the wide area network with a large round trip time H. Matsunaga, T. Isobe, T. Mashimo, H. Sakamoto, I. Ueda International Center for.
03/03/09USCMS T2 Workshop1 Future of storage: Lustre Dimitri Bourilkov, Yu Fu, Bockjoo Kim, Craig Prescott, Jorge L. Rodiguez, Yujun Wu.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
São Paulo Regional Analysis Center SPRACE Status Report 22/Aug/2006 SPRACE Status Report 22/Aug/2006.
10/22/2002Bernd Panzer-Steindel, CERN/IT1 Data Challenges and Fabric Architecture.
Site Report BEIJING-LCG2 Wenjing Wu (IHEP) 2010/11/21.
Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April
ITEP computing center and plans for supercomputing Plans for Tier 1 for FAIR (GSI) in ITEP  8000 cores in 3 years, in this year  Distributed.
EGEE is a project funded by the European Union under contract IST HellasGrid Hardware Tender Christos Aposkitis GRNET EGEE 3 rd parties Advanced.
IHEP Computing Center Site Report Shi, Jingyan Computing Center, IHEP.
Storage and Storage Access 1 Rainer Többicke CERN/IT.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
ATLAS Tier 1 at BNL Overview Bruce G. Gibbard Grid Deployment Board BNL 5-6 September 2006.
IDE disk servers at CERN Helge Meinhard / CERN-IT CERN OpenLab workshop 17 March 2003.
VMware vSphere Configuration and Management v6
Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Implementation of a reliable and expandable on-line storage for compute clusters Jos van Wezel.
PC clusters in KEK A.Manabe KEK(Japan). 22 May '01LSCC WS '012 PC clusters in KEK s Belle (in KEKB) PC clusters s Neutron Shielding Simulation cluster.
Derek Ross E-Science Department DCache Deployment at Tier1A UK HEP Sysman April 2005.
The CMS CERN Analysis Facility (CAF) Peter Kreuzer (RWTH Aachen) - Stephen Gowdy (CERN), Jose Afonso Sanches (UERJ Brazil) on behalf.
IHEP(Beijing LCG2) Site Report Fazhi.Qi, Gang Chen Computing Center,IHEP.
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 CERN.ch.
LCG LCG-1 Deployment and usage experience Lev Shamardin SINP MSU, Moscow
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,
January 30, 2016 RHIC/USATLAS Computing Facility Overview Dantong Yu Brookhaven National Lab.
Florida Tier2 Site Report USCMS Tier2 Workshop Livingston, LA March 3, 2009 Presented by Yu Fu for the University of Florida Tier2 Team (Paul Avery, Bourilkov.
W.A.Wojcik/CCIN2P3, Nov 1, CCIN2P3 Site report Wojciech A. Wojcik IN2P3 Computing Center URL:
CASTOR project status CASTOR project status CERNIT-PDP/DM October 1999.
IHEP Site Status Qiulan Huang Computing Center, IHEP,CAS HEPIX FALL 2015.
IHEP Computing Site Report Shi, Jingyan Computing Center, IHEP.
CERN IT Department CH-1211 Genève 23 Switzerland t The Tape Service at CERN Vladimír Bahyl IT-FIO-TSI June 2009.
ASCC Site Report Eric Yen & Simon C. Lin Academia Sinica 20 July 2005.
Data transfers and storage Kilian Schwarz GSI. GSI – current storage capacities vobox LCG RB/CE GSI batchfarm: ALICE cluster (67 nodes/480 cores for batch.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
1 5/4/05 Fermilab Mass Storage Enstore, dCache and SRM Michael Zalokar Fermilab.
Building Virtual Scientific Computing Environment with Openstack Yaodong Cheng, CC-IHEP, CAS ISGC 2015.
The RAL PPD Tier 2/3 Current Status and Future Plans or “Are we ready for next year?” Chris Brew PPD Christmas Lectures th December 2007.
StoRM + Lustre Proposal YAN Tian On behalf of Distributed Computing Group
By Harshal Ghule Guided by Mrs. Anita Mahajan G.H.Raisoni Institute Of Engineering And Technology.
Dynamic and Scalable Distributed Metadata Management in Gluster File System Huang Qiulan Computing Center,Institute of High Energy Physics,
Dominique Boutigny December 12, 2006 CC-IN2P3 a Tier-1 for W-LCG 1 st Chinese – French Workshop on LHC Physics and associated Grid Computing IHEP - Beijing.
Patrick Gartung 1 CMS 101 Mar 2007 Introduction to the User Analysis Facility (UAF) Patrick Gartung - Fermilab.
The status of IHEP Beijing Site WLCG Asia-Pacific Workshop Yaodong CHENG IHEP, China 01 December 2006.
Grid Computing 4 th FCPPL Workshop Gang Chen & Eric Lançon.
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
IHEP Computing Center Site Report Shi, Jingyan Computing Center, IHEP.
Compute and Storage For the Farm at Jlab
Status of BESIII Distributed Computing
The Beijing Tier 2: status and plans
Yaodong CHENG Computing Center, IHEP, CAS 2016 Fall HEPiX Workshop
Luca dell’Agnello INFN-CNAF
BEIJING-LCG2 Site Report
The INFN Tier-1 Storage Implementation
Production Manager Tools (New Architecture)
Presentation transcript:

IHEP Computing Center Site Report Gang Chen Computing Center Institute of High Energy Physics 2011 Spring Meeting

Gang Chen/CC/IHEP CC-IHEP at a Glance The Computing Center was created in 1980’s The Computing Center was created in 1980’s Provided computing service to BES, the experiment on BEPCProvided computing service to BES, the experiment on BEPC Rebuilt in 2005 for the new projects: Rebuilt in 2005 for the new projects: BES-III on BEPC-IIBES-III on BEPC-II Tier-2’s for ATLAS, CMSTier-2’s for ATLAS, CMS Cosmic ray experimentsCosmic ray experiments 35 FTEs, half of them for computing facility 35 FTEs, half of them for computing facility

Gang Chen/CC/IHEP Computing Resources ~6600 CPU-cores ~6600 CPU-cores SL5.5 (64 bit) for WLCGSL5.5 (64 bit) for WLCG SL4.5 (32 bit) for BES-III, Migrating to SL5.5SL4.5 (32 bit) for BES-III, Migrating to SL5.5 Toque: torque-server-2.4Toque: torque-server-2.4 Maui: maui-server-3.2.6Maui: maui-server Blade system, IBM/HP/Dell Blade system, IBM/HP/Dell Blade links with GigE/IBBlade links with GigE/IB Chassis links to central switch with 10GigEChassis links to central switch with 10GigE PC farm built with blades Force10 E1200 Central Switch

Gang Chen/CC/IHEP Resources used per VO CPU hours From to

Gang Chen/CC/IHEP Storage Architecture Computing nodes … Shared File systems (Lustre, NFS, …) Shared File systems (Lustre, NFS, …) HSM ( CASTOR ) HSM ( CASTOR ) Storage system mds OSS Disk pool Name Server Tape pool HSM hardware 10G 1G

Gang Chen/CC/IHEP Version: Version: I/O servers, each attached with 4 SATA Disk Arrays32 I/O servers, each attached with 4 SATA Disk Arrays Storage capacity: 1.7 PBStorage capacity: 1.7 PB Name Space: 3 mount points (for different experiments)Name Space: 3 mount points (for different experiments) Lustre System MDS(sub ) Computing Farms Failover SATA Disk Array RAID 6 ( Main ) 10Gb Ethernet MDS ( Main ) OSS 1 OSS N SATA Disk Array RAID 6 ( extended )

Gang Chen/CC/IHEP Lustre Performance Peak throughput of data analysis: 800MB/s per I/O server. Peak throughput of data analysis: 800MB/s per I/O server. Total throughput ~25GB/s Total throughput ~25GB/s

Gang Chen/CC/IHEP Lustre Lessons Low-Memory runs out may cause the system crash Low-Memory runs out may cause the system crash Move to 64-bit OSMove to 64-bit OS Optimize the patterns of read/writeOptimize the patterns of read/write Security and user-based ACL Security and user-based ACL recompilation of source code is needed to add certain modulesrecompilation of source code is needed to add certain modules

Gang Chen/CC/IHEP HSM Deployment Hardware Hardware Two IBM 3584 tape librariesTwo IBM 3584 tape libraries ~5800 slots , with 26 LTO-4 tape drivers~5800 slots , with 26 LTO-4 tape drivers 10 tape servers and 10 disk servers with 200TB disk pool10 tape servers and 10 disk servers with 200TB disk pool Software Software Customized version based on CASTOR Customized version based on CASTOR Support the new types of hardwareSupport the new types of hardware Optimize the performance of tape read and write operationOptimize the performance of tape read and write operation Stager was re-writtenStager was re-written Network Network 10Gbps link between disk servers and tape servers10Gbps link between disk servers and tape servers

Gang Chen/CC/IHEP All Data ~1.3PB All Data ~1.3PB All file number ~1 million All file number ~1 million BESIII Data ~810TB BESIII Data ~810TB BESIII File NO. ~540K BESIII File NO. ~540K YBJ File NO. ~400k YBJ File NO. ~400k YBJ Data ~301TB YBJ Data ~301TB

Gang Chen/CC/IHEP Realtime Monitoring of Castor

Gang Chen/CC/IHEP File Reservation for Castor The File Reservation component is a add-on component for Castor 1.7. The File Reservation component is a add-on component for Castor 1.7. Developed to prevent the reserved files from migrating to tape when disk usage is over certain level. Developed to prevent the reserved files from migrating to tape when disk usage is over certain level. Provides a command line Interface and a web Interface. Through these two Interfaces, user can: Provides a command line Interface and a web Interface. Through these two Interfaces, user can: Browse mass storage name space with a directory treeBrowse mass storage name space with a directory tree Make file-based,dataset-based and tape-based reservationMake file-based,dataset-based and tape-based reservation Browse, modify and delete reservation.Browse, modify and delete reservation.

Gang Chen/CC/IHEP File Reservation System for Castor

Gang Chen/CC/IHEP Global Networking Via ORIENT/TEIN3 to Europe Via Gloriad to US

Gang Chen/CC/IHEP ATLAS Data transfer between Lyon and Beijing > 130 TB of data transferred from Lyon to Beijing in 2010 > 35 TB of data transferred from Lyon to Beijing in 2010

Gang Chen/CC/IHEP CMS Data transfer from/to Beijing ~290 TB transferred from elsewhere to Beijing in 2010 ~110 TB transferred from Beijing elsewhere in 2010

Gang Chen/CC/IHEP Cooling System Air Cooling system reached 70% of capacity Air Cooling system reached 70% of capacity Cool air partition was built in 2009 and 2010 Cool air partition was built in 2009 and 2010 Water cooling is being discussed Water cooling is being discussed

Gang Chen/CC/IHEP Conclusion CPU farms work fine, but must migrate the 32-bit system to 64-bit as soon as possible. CPU farms work fine, but must migrate the 32-bit system to 64-bit as soon as possible. Lustre is the major storage system at IHEP with acceptable performance but also some trivial problems. Lustre is the major storage system at IHEP with acceptable performance but also some trivial problems. Resources, CPU and storage, increase much faster than what we expected, which cause problems: system stability, batch system scalability, cooling, etc. Resources, CPU and storage, increase much faster than what we expected, which cause problems: system stability, batch system scalability, cooling, etc.

Gang Chen/CC/IHEP Thank you