Discussions on group meeting

Slides:



Advertisements
Similar presentations
Status of BESIII Distributed Computing BESIII Workshop, Mar 2015 Xianghu Zhao On Behalf of the BESIII Distributed Computing Group.
Advertisements

Monitoring in DIRAC environment for the BES-III experiment Presented by Igor Pelevanyuk Authors: Sergey BELOV, Igor PELEVANYUK, Alexander UZHINSKIY, Alexey.
Scalability By Alex Huang. Current Status 10k resources managed per management server node Scales out horizontally (must disable stats collector) Real.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
BESIII distributed computing and VMDIRAC
YAN, Tian On behalf of distributed computing group Institute of High Energy Physics (IHEP), CAS, China CHEP-2015, Apr th, OIST, Okinawa.
:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: GridKA School 2009 MPI on Grids 1 MPI On Grids September 3 rd, GridKA School 2009.
BESIII Production with Distributed Computing Xiaomei Zhang, Tian Yan, Xianghu Zhao Institute of High Energy Physics, Chinese Academy of Sciences, Beijing.
New perfSonar Dashboard Andy Lake, Tom Wlodek. What is the dashboard? I assume that everybody is familiar with the “old dashboard”:
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
Grid Operations Centre LCG Accounting Trevor Daniels, John Gordon GDB 8 Mar 2004.
CERN Using the SAM framework for the CMS specific tests Andrea Sciabà System Analysis WG Meeting 15 November, 2007.
Getting started DIRAC Project. Outline  DIRAC information system  Documentation sources  DIRAC users and groups  Registration with DIRAC  Getting.
DDM Monitoring David Cameron Pedro Salgado Ricardo Rocha.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
EXPOSING OVS STATISTICS FOR Q UANTUM USERS Tomer Shani Advanced Topics in Storage Systems Spring 2013.
Site Validation Session Report Co-Chairs: Piotr Nyczyk, CERN IT/GD Leigh Grundhoefer, IU / OSG Notes from Judy Novak WLCG-OSG-EGEE Workshop CERN, June.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
Status of BESIII Distributed Computing BESIII Workshop, Sep 2014 Xianghu Zhao On Behalf of the BESIII Distributed Computing Group.
Global ADC Job Monitoring Laura Sargsyan (YerPhI).
FTS monitoring work WLCG service reliability workshop November 2007 Alexander Uzhinskiy Andrey Nechaevskiy.
Cloud Computing – UNIT - II. VIRTUALIZATION Virtualization Hiding the reality The mantra of smart computing is to intelligently hide the reality Binary->
36 th LHCb Software Week Pere Mato/CERN.  Provide a complete, portable and easy to configure user environment for developing and running LHC data analysis.
The GridPP DIRAC project DIRAC for non-LHC communities.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
StoRM + Lustre Proposal YAN Tian On behalf of Distributed Computing Group
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The Dashboard for Operations Cyril L’Orphelin.
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
Accounting Update John Gordon. Outline Multicore CPU Accounting Developments Cloud Accounting Storage Accounting Miscellaneous.
The CMS Beijing Tier 2: Status and Application Xiaomei Zhang CMS IHEP Group Meeting December 28, 2007.
Distributed computing and Cloud Shandong University (JiNan) BESIII CGEM Cloud computing Summer School July 18~ July 23, 2016 Xiaomei Zhang 1.
How to integrate portals with EGI accounting system R.Graciani EGI TF 2012.
SQL Database Management
Daniele Bonacorsi Andrea Sciabà
WLCG IPv6 deployment strategy
Status of BESIII Distributed Computing
The advances in IHEP Cloud facility
Xiaomei Zhang CMS IHEP Group Meeting December
NA61/NA49 virtualisation:
Dag Toppe Larsen UiB/CERN CERN,
Progress on NA61/NA49 software virtualisation Dag Toppe Larsen Wrocław
Dag Toppe Larsen UiB/CERN CERN,
Report of Dubna discussion
Key Activities. MND sections
Installation and configuration of a top BDII
ATLAS Cloud Operations
Practical: The Information Systems
FedCloud Blueprint Update
POW MND section.
Work report Xianghu Zhao Nov 11, 2014.
lcg-infosites documentation (v2.1, LCG2.3.1) 10/03/05
Patricia Méndez Lorenzo ALICE Offline Week CERN, 13th July 2007
Chapter 2: System Structures
Moving from CREAM CE to ARC CE
FTS Monitoring Ricardo Rocha
CRAB and local batch submission
CernVM Status Report Predrag Buncic (CERN/PH-SFT).
WLCG Collaboration Workshop;
VMDIRAC status Vanessa HAMAR CC-IN2P3.
New developments on the LHCb Bookkeeping
The Scheduling Strategy and Experience of IHEP HTCondor Cluster
OpenStack Ani Bicaku 18/04/ © (SG)² Konsortium.
EGEE Middleware: gLite Information Systems (IS)
Chapter 2: Operating-System Structures
Chapter 2: Operating-System Structures
Site availability Dec. 19 th 2006
PyWBEM Python WBEM Client: Overview #2
Information Services Claudio Cherubino INFN Catania Bologna
Presentation transcript:

Discussions on group meeting 2013.5

Site Monitoring Two kinds of monitoring are proposed “SAM test” monitoring Just like SAM tests Send regular tests, collect and filter results and publish Easy to know critical service status, eg. CVMFS, PBS, SE…… Ganglia-based monitoring Similar to Atlas T3 monitoring Set up local and global ganglia monitoring, collect info and publish Easy to know server status and total job numbers….. Site info from two monitoring will be collected into one database and summarized in one web page like dashboard We need to decide what kind of information are necessary Service status: ce, se, cvmfs Transfer status: channel, fts Job number: production, analysis, tests CPU consumption, CPU efficiency

Similar to LCG one

Further thoughts about “SAM tests” monitoring DIRAC resource status system Similar functions, not completely what we want Not send tests, only collect info from the existing jobs In development and in plan Propose to establish our own one It seemed as if not too difficult, if someone can spend time on it

Preliminary designs of site monitoring Develop based on DIRAC framework Monitor Agent Configuration Service Resources MonitorDB Command Line Web Page get site info send tests to sites record test results to DB

Preliminary designs of site monitoring Tests design CE, SE, CVMFS….. CE and CVMFS tests by jobs SE tests by issuing gLite commands Agents Monitor Agent is responsible for getting site info from DIRAC configuration service, sending tests, retrieving and filtering results, updating DB

Preliminary designs of site monitoring DB MonitorDB and table SiteStatus to record site status Commands bes-dirac-site-monitor --sitename --timerange the default print out the latest site status Interact with DB interface to get status Web DiracWeb is in migration period to tornado, better consider later

BESIII data transfer Two transfer protocols are added Testing DIRACFTS(dirac-dms-fts-submit) dirac-dms-fts-submit is not well coded and not easy to debug, need to be fixed if we have time DIRACDMS(dirac-dms-replicate-lfn) Testing Preliminary tests are successful with two modes Dataset created-> transfer request created->transfer status can be followed->transfer errors are showed Error logs still need to be improved

BESIII data transfer Accounting Currently no good channels are available. Dubna SE is in downtime, USTC and IHEP SE need to be tuned Going to use IHEP and IHEPD for testing, a certain volume of transfer tests need to be done Accounting Update transfer info to central DIRAC accounting system DIRACFTS accounting is available, but not correct, need to be fixed DIRACDMS accounting is not available, need to be added. We do it ourselves, or ask DIRAC to fix? More and more small fix need to be done inside DIRAC, need to find out regular procedure to do that

BESIII data transfer More functions are needed Options needed to be introduced to do the switch between two transfer types Transfer types used need to be recorded for each request in DB Functions such as cancelling requests need to be introduced Need to consider to use datasets defined from badger

BESDIRAC An extension to DIRAC How to manage and maintain extension More and more BESIII-specific extensions are coming Definitely need an extension How to manage and maintain extension Need a new release for server and client Need someone to look into it If simply add extra packages locally, there would be problems with pilot jobs during software download We have set up a development env in bager01 Dubna need one too Use Git for code management? To be consistent with DIRAC development environment

UMN site UMN site is going to have a SE for BES Good news! Currently they are working in joining SE to BES VO Their SE type is BestMan We seemed not trying to add BestMan SE to BES VO before Document for that is not available Someone need to look into it if they help

Virtual sites PBS cluster are set up over virtual resources WHU is using VirtualBOX NSCCSZ is using KVM It is easy to add new nodes and extend cluster Use images generating by existing node Light configuration and check can be done to VM after booting to make all the necessary services up and running Virtual sites are working well as a normal DIRAC cluster site

Virtual sites(2)

Virtual sites Advantage: Expect to be improved: Site don’t need to change basic OS Clusters are easy to set up and extend with virtual images Expect to be improved: Virtual sites expect to provide cloud resource management platform (eg. Openstack) and provide API for creating and deleting VM DIRAC has a good support to some well-known resource management platform such as openstack, cloudstack, opennebula The size of virtual resources is able to vary with the number of job in real time In this way resource usage is more flexible and efficient Currently resources are relatively static and VM set-up are done by hand