Presentation is loading. Please wait.

Presentation is loading. Please wait.

BESIII Production with Distributed Computing Xiaomei Zhang, Tian Yan, Xianghu Zhao Institute of High Energy Physics, Chinese Academy of Sciences, Beijing.

Similar presentations


Presentation on theme: "BESIII Production with Distributed Computing Xiaomei Zhang, Tian Yan, Xianghu Zhao Institute of High Energy Physics, Chinese Academy of Sciences, Beijing."— Presentation transcript:

1 BESIII Production with Distributed Computing Xiaomei Zhang, Tian Yan, Xianghu Zhao Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, P. R. China Introduction System II: Cloud Computing Model System I: Jobs & Data MC Production with the System Problems and Plans BESIII is an experiment studying tau-charm physics by electron positron collider. 3 PB data has been generated in last 5 years. It has about 400 collaboration members come from 52 countries. The distributed computing system for BESIII has been established in 2012, bashed on DIRAC. Now it collected about 3000 CPU cores and 400 TB storage from 10 sites. Three large scale MC production tasks have been completed, with more than 150,000 jobs completed successfully. Raw data is taking at IHEP. IHEP is the central site for raw data processing, bulk reconstruction and analysis. Remote sites are for MC production and anlysis. Randomtrg data is transferred to remote sites by data transfer system between SEs. Simulation data produced in remote sites are directly written to IHEP SE by jobs. DST data is transferred to remote site for particular analysis Problems: small sites are difficult to maintain robust SEs, where jobs have to download random trigger data from the center, which cause high load of central SE and inefficiency of jobs; frequent access of random trigger data in the split-by-event jobs cause heavy load of file system in remote sites; lack of man power in sites, monitoring is important for both cloud and normal sites; Plans: study on cloud storage to provide a high-performance and robust central storage for common data access among sites; develop a user friendly monitoring system to ease usage and administrations; BatchTimeTypeEventsJobs 12012.11Psi(3770)200 M~12,000 22013.09Jpsi inclusive800 M~40,000 32013.12Psi(3770)1350 M~100,000 Three large-scale MC production More than150k jobs are done 10 site joined the production Max running jobs reached 1400 About 15 TB output data are generated and transferred back to IHEP  Job Mangement System DIRAC as middleware BESDIRAC as VO specific extension jobs are pulled to remote site by pilot agents user frontend is ganga with BOSS extension  Data Management System use DFC for file and metadata catalog based dataset functionality a dataset-based transfer system is developed. The transfers reach 10TB/day when deploying random trigger data to remote site’s SE StoRM+Lustre is chosen to be the Central Storage Solution for integration of grid and local data SiteCPU CoresSE CapacitySiteCPU CoresSE Capacity IHEP.cn264214 TBUMN.us76850 TB UCAS.cn152JINR.ru100~20030 TB USTC.cn200~128024 TBINFN-Torino.it25030 TB PKU.cn100SJTU.cn100 WHU.cn100~30039 TBSDU.cn150 Cloud resources are integrated into the system using VMDIRAC The virtual machines are scheduled by VMDIRAC and each virtual machine will build a DIRAC job execution environment automatically through contextualization Virtual machines are created if there are job requests. VMs are destroyed if there are no more jobs running Everything is transparent to the end user and user jobs are running properly on cloud sites 6 clouds joined the system, including 3 OpenStack and 3 OpenNebula clouds Tasks DIRAC Site Director VM Scheduler Pilot Cluster Pilot VM Grid Cloud User


Download ppt "BESIII Production with Distributed Computing Xiaomei Zhang, Tian Yan, Xianghu Zhao Institute of High Energy Physics, Chinese Academy of Sciences, Beijing."

Similar presentations


Ads by Google