Status of BESIII Computing

Slides:



Advertisements
Similar presentations
ATLAS Tier-3 in Geneva Szymon Gadomski, Uni GE at CSCS, November 2009 S. Gadomski, ”ATLAS T3 in Geneva", CSCS meeting, Nov 091 the Geneva ATLAS Tier-3.
Advertisements

Status of BESIII Distributed Computing BESIII Workshop, Mar 2015 Xianghu Zhao On Behalf of the BESIII Distributed Computing Group.
IHEP Site Status Jingyan Shi, Computing Center, IHEP 2015 Spring HEPiX Workshop.
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
ATLAS computing in Geneva Szymon Gadomski, NDGF meeting, September 2009 S. Gadomski, ”ATLAS computing in Geneva", NDGF, Sept 091 the Geneva ATLAS Tier-3.
Data Centers and IP PBXs LAN Structures Private Clouds IP PBX Architecture IP PBX Hosting.
Distributed Computing for CEPC YAN Tian On Behalf of Distributed Computing Group, CC, IHEP for 4 th CEPC Collaboration Meeting, Sep ,
Existing Network Study CPIT 375 Data Network Designing and Evaluation.
US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.
◦ What is an Operating System? What is an Operating System? ◦ Operating System Objectives Operating System Objectives ◦ Services Provided by the Operating.
YAN, Tian On behalf of distributed computing group Institute of High Energy Physics (IHEP), CAS, China CHEP-2015, Apr th, OIST, Okinawa.
Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002.
BESIII Production with Distributed Computing Xiaomei Zhang, Tian Yan, Xianghu Zhao Institute of High Energy Physics, Chinese Academy of Sciences, Beijing.
Looking Ahead: A New PSU Research Cloud Architecture Chuck Gilbert - Systems Architect and Systems Team Lead Research CI Coordinating Committee Meeting.
BOSS Business Objects Shared Service Steve Rademacher – June 2009.
Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April
Distributed Computing for CEPC YAN Tian On Behalf of Distributed Computing Group, CC, IHEP for 4 th CEPC Collaboration Meeting, Sep , 2014 Draft.
IHEP Computing Center Site Report Shi, Jingyan Computing Center, IHEP.
Evolution of a High Performance Computing and Monitoring system onto the GRID for High Energy Experiments T.L. Hsieh, S. Hou, P.K. Teng Academia Sinica,
IHEP(Beijing LCG2) Site Report Fazhi.Qi, Gang Chen Computing Center,IHEP.
6. Juli 2015 Dietrich Liko Physics Computing 114. Vorstandssitzung.
Status of BESIII Distributed Computing BESIII Workshop, Sep 2014 Xianghu Zhao On Behalf of the BESIII Distributed Computing Group.
IHEP Site Status Qiulan Huang Computing Center, IHEP,CAS HEPIX FALL 2015.
IHEP Computing Site Report Shi, Jingyan Computing Center, IHEP.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
Virtual Cluster Computing in IHEPCloud Haibo Li, Yaodong Cheng, Jingyan Shi, Tao Cui Computer Center, IHEP HEPIX Spring 2016.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
StoRM + Lustre Proposal YAN Tian On behalf of Distributed Computing Group
ATLAS Computing Wenjing Wu outline Local accounts Tier3 resources Tier2 resources.
IHEP Computing Center Site Report Gang Chen Computing Center Institute of High Energy Physics 2011 Spring Meeting.
Grid Operations in Germany T1-T2 workshop 2015 Torino, Italy Kilian Schwarz WooJin Park Christopher Jung.
IHEP Computing Center Site Report Shi, Jingyan Computing Center, IHEP.
CEPC software & computing study group report
Daniele Bonacorsi Andrea Sciabà
Dynamic Extension of the INFN Tier-1 on external resources
WLCG IPv6 deployment strategy
Status of BESIII Distributed Computing
The advances in IHEP Cloud facility
Distributed Computing in IHEP
The Beijing Tier 2: status and plans
AWS Integration in Distributed Computing
Virtualization and Clouds ATLAS position
Report from WLCG Workshop 2017: WLCG Network Requirements GDB - CERN 12th of July 2017
Overview of the Belle II computing
Belle II Physics Analysis Center at TIFR
California Institute of Technology
Diskpool and cloud storage benchmarks used in IT-DSS
Report of Dubna discussion
ATLAS Cloud Operations
ALICE Monitoring
HEPiX Spring 2014 Annecy-le Vieux May Martin Bly, STFC-RAL
Heterogeneous Computation Team HybriLIT
Yaodong CHENG Computing Center, IHEP, CAS 2016 Fall HEPiX Workshop
Database Services at CERN Status Update
Update on Plan for KISTI-GSDC
Readiness of ATLAS Computing - A personal view
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
ATLAS Sites Jamboree, CERN January, 2017
Computing at CEPC Xiaomei Zhang Xianghu Zhao
PES Lessons learned from large scale LSF scalability tests
Discussions on group meeting
The Scheduling Strategy and Experience of IHEP HTCondor Cluster
Xiaomei Zhang On behalf of CEPC software & computing group Nov 6, 2017
Getting Started.
Getting Started.
11th France China Particle Physics Laboratory workshop (FCPPL2018)
Status and prospects of computing at IHEP
Proposal for a DØ Remote Analysis Model (DØRAM)
This work is supported by projects Research infrastructure CERN (CERN-CZ, LM ) and OP RDE CERN Computing (CZ /0.0/0.0/1 6013/ ) from.
Presentation transcript:

Status of BESIII Computing Yan Xiaofei IHEP-CC

Outline 1 Local Cluster and Distributing Computing 2 Storage Status 3 Network Status 4 Others

HTCondor Cluster Upgrade new HTCondor version Problems (after upgrade) “hung” problem, leading scheduler process to be restart, disappeared. Upgrade new HTCondor version From 8.4.3 to 8.4.11 Bugs fixed and better performance provided Completed in summer maintenance Problems (after upgrade) “/ihepbatch/bes” with heavy pressure, leading jobs “hold” Some users run a lot of analysis jobs in the directory Cause low performance Bugs about maximum jobs in the queue found in the new version Cause the scheduler unstable Limit 10,000 jobs per user

HTCondor Cluster New feature ---- submit bulk jobs Submit bulk jobs quickly Less submitting time Decrease submitting failure rate Commands for BES users “boss.condor” for batch jobs Job option files should exist $ boss.condor -n 300 jobopt%{ProcId}.txt

Jobs statistics (Jun,2017 - Sep,2017) BES jobs occupied most of CPU time Experiment Job count Walltime(hr) User count BES 6,334,436 14,164,509.61 305 JUNO 2,754,221 1,729,473.14 51 DYW 861,264 1,136,054.19 39 HXMT 614,257 793,408.08 11 LHAASO 624,279 699,813.88 41 CEPC 1,208,345 592,382.61 24 ATLAS 406,818 202,126.59 23 CMS 1,096,388 142,736.3 15 YBJ 323,144 140,068.56 36 OTHER 311,202 56,856.52 8

SLURM Cluster Resources Jobs(2017.6~2017.8) Research 1 master node 1 accounting & monitoring node 16 login nodes 131 work nodes: 2,752 CPU cores, 8 GPU cards Jobs(2017.6~2017.8) # Jobs : 1,834 CPU hours : 2,001,800 Research hep_job tools : unified User Interface with the HTCondor Cluster Malva : a resource monitoring system Jasmine : a multi-purpose job test suite

BESIII distributed computing(1)

BESIII distributed computing(2) In the last three months, about 67.5K BESIII jobs have been done in the platform 11 sites joined the production The central DIRAC server has been successfully upgraded from v6r15 to v6r18 during summer maintenance Accordingly the distributed computing client in AFS has been upgraded to the latest version All the changes are transparent to end users

BESIII distributed computing(3) The distributed computing client is expected to migrate from AFS to CVMFS soon More convenient for users outside IHEP, better performance than AFS The BOSS software repository used for sites has been successfully migrated to newly built CVMFS infrastructure which is with latest version and new hardware The CNAF CVMFS Stratum1 server in Italy has replicated IHEP Stratum0 server This help facilitate access to BOSS for users from Europe collaborations OSG S1 CNAF S1 IHEP S1 IHEP S0 CERN S0 Proxy Europe Asia America

Outline 1 Local Cluster and Distributing Computing 2 Storage Status 3 Network Status 4 Others

user/group data, raw data Storage Usage Guide Age(year) 1 2 5 5,0 Usage public data user/group data, raw data user data temporary user data BESIII dedicated Yes No Quota 50GB 5GB,50k files 500GB Backup

Disk Usage During Last 3 months BESIII computing is the major source of disk throughput in IHEP Cluster, >90% R/W throughput, peak read 15GB/s, peak write 6 GB/s /besfs increased 200 TB disk usage after summer maintenance Summer Maintenance /besfs write

Storage Evaluation The density of disk increase quickly in last few years 8 TB/10TB disk is the mainstream product Saves budget for better storage hardware The Storage capacity managed by single server increase Evolutions of new hardware and storage architecture SSD disk array (better IOPS ) Storage Area Network (Flexibility and Sharing of high-end SSD disk array) Bonding of 10 Gbit Ethernet (Upgrade the Network of each server from 10Gb to 20Gb or more) Multipath of disk Fiber channel (Upgrade the theoretical channel throughput from 16Gb to 32Gb) 1. Switch Single path to multiple paths . 2. aims: a. more stable without single failure point , b. double the bandwidth from 10 Gb to 20Gb which increase the I/O performance

Outline 1 Local Cluster and Distributing Computing 2 Storage Status 3 Network Status 4 Others

Network-WAN IHEP-USA(Internet2) IHEP-EUR(GEANT) IHEP-Asia IHEP-CSTNet-CERNET-USA Bandwidth: 10Gbps IHEP-EUR(GEANT) IHEP-CSTNet-CERNET-LONDON-Euro IHEP-Asia IHEP-CSTNet-HONGKONG-Asia Bandwidth: 2.5Gbps IHEP-*.edu.cn IHEP-CSTNet-CERNET-University Bandwidth:10Gbps 国际国内链路; 1、(加入lhc1 qfz),进展情况; 2、中欧链路问题;

New Network Architecture Separate wireless network from cable network Add an interconnect switch Collect separated networks Dual Firewall supported Both in data center network and campus network Add two DMZs in campus network OA services Public cloud services IPv6 supported Both in campus network and data center network The Grid data transfer is currently 8770 更换问题;

Outline 1 Local Cluster and Distributing Computing 2 Storage Status 3 Network Status 4 Others

New Monitoring at IHEP Motivation Goal Various cluster monitoring tools are independent Integration of the multiple monitoring data can provide more information Improve the availability of computing cluster Goal Correlation among the monitoring sources Analyze various sources monitoring data Unified display system provides health status from multiple levels Show the trend of error and abnormity. 改字体

Remote Computing Sites Service Maintenance services for remote computing center Ustc, Buaa, and Clas sites are running well USTC site increased 448 cores and 1.1P lustre storage Site Name CPU Cores Storage(TB) Ustc 1088 1843 buaa 160 81 clas 224 37 We setup central morntorning system for remote site,

Public services SSO IHEP APP IHEPBox Vidyo Total users:7036 Bes3:642 (IHEP:227, others:415) IHEP APP Support phone book, can call directly News、Academic events、Media… Support Personal agenda later IHEPBox Updated to the latest version Total users: 1000+ Space usage: 3T/154T Vidyo Users:692 Conferences:1059

Others Password self reset: Helpdesk http://202.38.128.100:86/ccapply/userfindpasswd.action Helpdesk http://helpdesk.ihep.ac.cn Email: helpdesk@ihep.ac.cn Tel: 88236855

Summary Computing platform keeps running well. New feature added for users. New storage architecture is under evaluation. Unified monitoring system developed. New Network architecture is ready.

Thank you! Question?