9 th Weekly Operation Report on DIRAC Distributed Computing YAN Tian From 2015-02-25 to 2015-03-04.

Slides:



Advertisements
Similar presentations
Status of BESIII Distributed Computing BESIII Workshop, Mar 2015 Xianghu Zhao On Behalf of the BESIII Distributed Computing Group.
Advertisements

Condor Project Computer Sciences Department University of Wisconsin-Madison Stork An Introduction Condor Week 2006 Milan.
Batch Process Overview. Batch Processes 2 What Do You Mean By Batch Processing? Allows for Mass Entry of Data All Processing of the Data Happens Without.
Monitoring in DIRAC environment for the BES-III experiment Presented by Igor Pelevanyuk Authors: Sergey BELOV, Igor PELEVANYUK, Alexander UZHINSKIY, Alexey.
A Grid Resource Broker Supporting Advance Reservations and Benchmark- Based Resource Selection Erik Elmroth and Johan Tordsson Reporter : S.Y.Chen.
1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu
DIRAC API DIRAC Project. Overview  DIRAC API  Why APIs are important?  Why advanced users prefer APIs?  How it is done?  What is local mode what.
Distributed Computing for CEPC YAN Tian On Behalf of Distributed Computing Group, CC, IHEP for 4 th CEPC Collaboration Meeting, Sep ,
PHP Tutorials 02 Olarik Surinta Management Information System Faculty of Informatics.
1 port BOSS on Wenjing Wu (IHEP-CC)
Operating Systems.  Operating System Support Operating System Support  OS As User/Computer Interface OS As User/Computer Interface  OS As Resource.
Computing and LHCb Raja Nandakumar. The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
Monte Carlo Instrument Simulation Activity at ISIS Dickon Champion, ISIS Facility.
YAN, Tian On behalf of distributed computing group Institute of High Energy Physics (IHEP), CAS, China CHEP-2015, Apr th, OIST, Okinawa.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
RISICO on the GRID architecture First implementation Mirko D'Andrea, Stefano Dal Pra.
:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: GridKA School 2009 MPI on Grids 1 MPI On Grids September 3 rd, GridKA School 2009.
1 Automatic Processing Pipelines with XNAT and REDCap Vanderbilt University Benjamin Yvernault, Bennett Landman, Brian Boyd,
ATLAS DC2 seen from Prague Tier2 center - some remarks Atlas sw workshop September 2004.
1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.
1 DIRAC – LHCb MC production system A.Tsaregorodtsev, CPPM, Marseille For the LHCb Data Management team CHEP, La Jolla 25 March 2003.
Belle MC Production on Grid 2 nd Open Meeting of the SuperKEKB Collaboration Soft/Comp session 17 March, 2009 Hideyuki Nakazawa National Central University.
Execute Workflow. Home page To execute a workflow navigate to My Workflows Page.
Describe workflows used to maintain and provide the RDA to users – Both are 24x7 operations Transition to the NWSC with zero downtime NWSC is new environment.
Status of StoRM+Lustre and Multi-VO Support YAN Tian Distributed Computing Group Meeting Oct. 14, 2014.
BESIII Production with Distributed Computing Xiaomei Zhang, Tian Yan, Xianghu Zhao Institute of High Energy Physics, Chinese Academy of Sciences, Beijing.
System Analysis (Part 3) System Control and Review System Maintenance.
Framework of Job Managing for MDC Reconstruction and Data Production Li Teng Zhang Yao Huang Xingtao SDU
The Alternative Larry Moore. 5 Nodes and Variant Input File Sizes Hadoop Alternative.
Distributed Computing for CEPC YAN Tian On Behalf of Distributed Computing Group, CC, IHEP for 4 th CEPC Collaboration Meeting, Sep , 2014 Draft.
LHCb Software Week November 2003 Gennady Kuznetsov Production Manager Tools (New Architecture)
EGEE-III INFSO-RI Enabling Grids for E-sciencE Overview of STEP09 monitoring issues Julia Andreeva, IT/GS STEP09 Postmortem.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems with Multi-programming Chapter 4.
DDM Monitoring David Cameron Pedro Salgado Ricardo Rocha.
CSCI-100 Introduction to Computing
Module 8 : Configuration II Jong S. Bok
Alliance Alliance Performance Status - CREQ Régis ELLING July 2011.
AliEn AliEn at OSC The ALICE distributed computing environment by Bjørn S. Nilsen The Ohio State University.
Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.
27 th Weekly Operation Report on DIRAC Distributed Computing YAN Tian From to
The GridPP DIRAC project DIRAC for non-LHC communities.
Status of BESIII Distributed Computing BESIII Workshop, Sep 2014 Xianghu Zhao On Behalf of the BESIII Distributed Computing Group.
The Gateway Computational Web Portal Marlon Pierce Indiana University March 15, 2002.
HUBbub 2013: Developing hub tools that submit HPC jobs Rob Campbell Purdue University Thursday, September 5, 2013.
Aggregator  Performs aggregate calculations  Components of the Aggregator Transformation Aggregate expression Group by port Sorted Input option Aggregate.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI VO auger experience with large scale simulations on the grid Jiří Chudoba.
The GridPP DIRAC project DIRAC for non-LHC communities.
Virtual Cluster Computing in IHEPCloud Haibo Li, Yaodong Cheng, Jingyan Shi, Tao Cui Computer Center, IHEP HEPIX Spring 2016.
StoRM + Lustre Proposal YAN Tian On behalf of Distributed Computing Group
VMDIRAC DEVELOPMENT PROPOSAL Zhao Xianghu Oct 14, 2014.
CDF SAM Deployment Status Doug Benjamin Duke University (for the CDF Data Handling Group)
Advanced Taverna Aleksandra Pawlik University of Manchester materials by Katy Wolstencroft, Aleksandra Pawlik, Alan Williams
Taverna allows you to automatically iterate through large data sets. This section introduces you to some of the more advanced configuration options for.
Progress of Work on SE and DMS YAN Tian April. 16, 2014.
Status of BESIII Distributed Computing BESIII Collaboration Meeting, Nov 2014 Xiaomei Zhang On Behalf of the BESIII Distributed Computing Group.
Instrumenting Badi Abdul-Wahid, RJ Nowling CSE Operating Systems Professor Striegel.
September 26, 2003K User's Meeting1 CCJ Usage for Belle Monte Carlo production and analysis –CPU time: 170K hours (Aug.1, 02 ~ Aug.22, 03)
Status of BESIII Distributed Computing
L’analisi in LHCb Angelo Carbone INFN Bologna
Belle II Physics Analysis Center at TIFR
Work report Xianghu Zhao Nov 11, 2014.
Thursday AM, Lecture 2 Lauren Michael CHTC, UW-Madison
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
Status of Storm+Lustre and Multi-VO Support
Discussions on group meeting
 YongPyong-High Jan We appreciate that you give an opportunity to have this talk. Our Belle II computing group would like to report on.
The LHCb Computing Data Challenge DC06
Thursday AM, Lecture 1 Lauren Michael
Presentation transcript:

9 th Weekly Operation Report on DIRAC Distributed Computing YAN Tian From to

Weekly Running Jobs by User Notes: 1. zhanglei & huangy submit BES jobs 2. CEPC’s production user weiyq keep running MC jobs. itemvalue active users3 max running jobs785 average running jobs394 total executed jobs31.4 k

Final Status of Running Jobs Failed Reasonpercent Application Error3.99 % upload/download failed3.77 % stalled1.10 % other0.025 % CEPC upload err huangy 74# pkg download error

Output Data Generated and Transfered Total: 7.48 TB ~1.07 TB/day good quality

Running job by Site Notes: – UMN’s max running jobs is set to 500 – WHU finished 47.7% jobs.

Job Final Status at Each Site WHU 92.0% upload failed GRID.JINR 100% UMN 91.3% 74# BES OpenNebula 96.5% OpenStack 85.8% 11# CEPC UCAS 96.7% 66# BES

Failed Types at Site: Description GRID.JINR is good UCAS is good too, but still has 66# randomtrg download errors OpenNebula is good OpenStack has problem in a short time when VM started. (occasionally), 900 jobs failed in 1 hour WHU still has upload error. It will be better when CEPC run long jobs. UMN is good. But jobs submit to UMN failed because input data download error arise when large amount of jobs get data from DIRAC server simutaneously :07:39 UTC DataManagement/StorageElement NOTICE: Returning response ( :43120)[bes_user:huangy] (30.23 secs) ERROR: Failed to get file dips://besdirac02.ihep.ac.cn:9148/DataManagement/StorageElement/bes/user/h/huangy/Upload/524 0cc601bde6085a9baf9185a2cc22979c2624bc76705cb10e5301e0fd3fcfb :07:39 UTC DataManagement/StorageElement ERROR: Error processing proposal Error while sending: [('SSL routines', 'SSL3_WRITE_PENDING', 'bad write retry')]

Cumulative User Jobs Total user jobs: 31.4 k

Running jobs and Walltime Usage of VOs Walltime usage: BES 17.6% CEPC 82.4%

New Features in Frontend Added this week to fulfill user’s need gangaBOSS – add support BOSS (done). Only large-scale test needed. dsub – support setting start evt number and batches to run (done and in production use now to deal with 200k events input data) – can handle input stdhep which has events during [0, evtmax] in each job.