INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org ATLAS DDM Operations - II Monitoring and Daily Tasks Jiří Chudoba ATLAS meeting, 25.9.2007,

Slides:



Advertisements
Similar presentations
Graeme Stewart: ATLAS Computing WLCG Workshop, Prague ATLAS Suspension and Downtime Procedures Graeme Stewart (for ATLAS Central Operations Team)
Advertisements

INFSO-RI Enabling Grids for E-sciencE EGEE Middleware The Resource Broker EGEE project members.
The ATLAS Production System. The Architecture ATLAS Production Database Eowyn Lexor Lexor-CondorG Oracle SQL queries Dulcinea NorduGrid Panda OSGLCG The.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
AMOD Report Simone Campana CERN IT-ES. Grid Services A very good week for sites – No major issues for T1s and T2s The only one to report is
FZU participation in the Tier0 test CERN August 3, 2006.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks From ROCs to NGIs The pole1 and pole 2 people.
Data management at T3s Hironori Ito Brookhaven National Laboratory.
How to Install and Use the DQ2 User Tools US ATLAS Tier2 workshop at IU June 20, Bloomington, IN Marco Mambelli University of Chicago.
DDM-Panda Issues Kaushik De University of Texas At Arlington DDM Workshop, BNL September 29, 2006.
The huge amount of resources available in the Grids, and the necessity to have the most up-to-date experimental software deployed in all the sites within.
CCRC’08 Weekly Update Jamie Shiers ~~~ LCG MB, 1 st April 2008.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware: gLite Data Management EGEE Tutorial 23rd APAN Meeting, Manila Jan.
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
My Name: ATLAS Computing Meeting – NN Xxxxxx A Dynamic System for ATLAS Software Installation on OSG Sites Xin Zhao, Tadashi Maeno, Torre Wenaus.
INFSO-RI Enabling Grids for E-sciencE ATLAS DDM Operations - III DPM at T2’s Jiří Chudoba ATLAS meeting, , CNAF.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Grid Monitoring Tools Alexandre Duarte CERN.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Overview of STEP09 monitoring issues Julia Andreeva, IT/GS STEP09 Postmortem.
Graeme Stewart: ATLAS Computing WLCG Workshop, Prague ATLAS Suspension and Downtime Procedures Graeme Stewart (for ATLAS Central Operations Team)
June 22, 2007USATLAS T2-T3 DQ2 0.3 SiteServices Patrick McGuigan
DDM Monitoring David Cameron Pedro Salgado Ricardo Rocha.
Author: Andrew C. Smith Abstract: LHCb's participation in LCG's Service Challenge 3 involves testing the bulk data transfer infrastructure developed to.
INFSO-RI Enabling Grids for E-sciencE Enabling Grids for E-sciencE Pre-GDB Storage Classes summary of discussions Flavia Donno Pre-GDB.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
ATLAS Production System Monitoring John Kennedy LMU München CHEP 07 Victoria BC 06/09/2007.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Ricardo Rocha CERN (IT/GS) EGEE’08, September 2008, Istanbul, TURKEY Experiment.
8 th CIC on Duty meeting Krakow /2006 Enabling Grids for E-sciencE Feedback from SEE first COD shift Emanoil Atanassov Todor Gurov.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Using GStat 2.0 for Information Validation.
The ATLAS Cloud Model Simone Campana. LCG sites and ATLAS sites LCG counts almost 200 sites. –Almost all of them support the ATLAS VO. –The ATLAS production.
ATLAS Dashboard Recent Developments Ricardo Rocha.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
Site Report: Prague Jiří Chudoba Institute of Physics, Prague WLCG GridKa+T2s Workshop.
CERN IT Department CH-1211 Genève 23 Switzerland t Streams Service Review Distributed Database Workshop CERN, 27 th November 2009 Eva Dafonte.
ATLAS Distributed Data Management Operations Experience and Projection Alexei Klimentov, Pavel Nevski Brookhaven National Laboratory Sergey Pirogov, Alexander.
Service Availability Monitor tests for ATLAS Current Status Tests in development To Do Alessandro Di Girolamo CERN IT/PSS-ED.
INFSO-RI Enabling Grids for E-sciencE /10/20054th EGEE Conference - Pisa1 gLite Configuration and Deployment Models JRA1 Integration.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Operations procedures: summary for round table Maite Barroso OCC, CERN
INFSO-RI Enabling Grids for E-sciencE SRMv2.2 in DPM Sophie Lemaitre Jean-Philippe.
Recovery of Lost Files Jiří Chudoba Institute of Physics, Prague.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Middleware Update Maria Alandes Pradillo.
Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The LCG interface Stefano BAGNASCO INFN Torino.
14/03/2007A.Minaenko1 ATLAS computing in Russia A.Minaenko Institute for High Energy Physics, Protvino JWGC meeting 14/03/07.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI VO auger experience with large scale simulations on the grid Jiří Chudoba.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.
INFSO-RI Enabling Grids for E-sciencE gLite Test and Certification Effort Nick Thackray CERN.
Operations model Maite Barroso, CERN On behalf of EGEE operations WLCG Service Workshop 11/02/2006.
Finding Data in ATLAS. May 22, 2009Jack Cranshaw (ANL)2 Starting Point Questions What is the latest reprocessing of cosmics? Are there are any AOD produced.
DDM Central Catalogs and Central Database Pedro Salgado.
Grid Deployment Board 5 December 2007 GSSD Status Report Flavia Donno CERN/IT-GD.
ATLAS Distributed Analysis DISTRIBUTED ANALYSIS JOBS WITH THE ATLAS PRODUCTION SYSTEM S. González D. Liko
WLCG Service Report Jean-Philippe Baud ~~~ WLCG Management Board, 24 th August
SAM Status Update Piotr Nyczyk LCG Management Board CERN, 5 June 2007.
Consistency Checking And RUCIO Progress Update Sarah Williams Indiana University ADC Weekly Meeting,
28 Nov 2007 Alessandro Di Girolamo 1 A “Hands On” overview of the ATLAS Distributed Data Management Disclaimer & Special Thanks Things are changing (of.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Best Practices and Use cases David Bouvet,
Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.
VO Box discussion ATLAS NIKHEF January, 2006 Miguel Branco -
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The Dashboard for Operations Cyril L’Orphelin.
US ATLAS DDM Operations Alexei Klimentov, BNL US ATLAS Tier-2 Workshop UCSD, Mar 8 th 2007.
Enabling Grids for E-sciencE Work Load Management & Simple Job Submission Practical Shu-Ting Liao APROC, ASGC EGEE Tutorial.
The ATLAS “DQ2 Accounting and Storage Usage Service”
David Adams Brookhaven National Laboratory September 28, 2006
Patricia Méndez Lorenzo ALICE Offline Week CERN, 13th July 2007
EGEE Operation Tools and Procedures
Site availability Dec. 19 th 2006
Presentation transcript:

INFSO-RI Enabling Grids for E-sciencE ATLAS DDM Operations - II Monitoring and Daily Tasks Jiří Chudoba ATLAS meeting, , CNAF

Enabling Grids for E-sciencE INFSO-RI ATLAS DDM Operations 2 Cloud Status Scheduled and unscheduled downtimes –direct s from sites –EGEE broadcasts –GOCDB: ARDA Dashboard pages –T0 to T1 transfers –all other transfers

Enabling Grids for E-sciencE INFSO-RI ATLAS DDM Operations 3 VOBoxes at CERN aManagementARDAMachineshttps://twiki.cern.ch/twiki/bin/view/Atlas/DistributedDat aManagementARDAMachines separate machines for db services and site services CNAF: –dq2db-cnaf – db services –dq2cnaf – site services for CNAF and T2’s Access via an account ddmusr02 –limited possibilities, check /tmp/dq2.log Account ddmusr01 restricted to developers –why ??? Installation done by developers

Enabling Grids for E-sciencE INFSO-RI ATLAS DDM Operations 4 Panda Monitoring panda pages –DS on sites erview=dslist –AOD: ode=listAODReplications ode=listAODReplications –aborted DS: ode=listAbortedDatasets –M4: ode=listM4

Enabling Grids for E-sciencE INFSO-RI ATLAS DDM Operations 5 More Monitoring Stephane’s overview of disks occupancies les_sites/all_sites/list_sites.html Per data type version, DE cloud: muenchen.de/ddm/DE/summary.html Site status monitored by GOC – gstat – IHEP/GIISQuery_Usage_store_.htmlhttp://goc.grid.sinica.edu.tw/gstat/RU-Protvino- IHEP/GIISQuery_Usage_store_.html

Enabling Grids for E-sciencE INFSO-RI ATLAS DDM Operations 6 FTS monitoring FTS 1.5 –DE cloud: –SARA: –glite-transfer commands: glite-transfer-channel-list -s transfer-fts/services/ChannelManagementhttps://fts.grid.sara.nl:8443/glite-data- transfer-fts/services/ChannelManagement

Enabling Grids for E-sciencE INFSO-RI ATLAS DDM Operations 7 Typical tasks Errors spotted via monitoring –check reasons –contact site –possibly close the FTS channel –verify when corrected –open FTS channel

Enabling Grids for E-sciencE INFSO-RI ATLAS DDM Operations 8 Deletion of Aborted DS Mail sent to T1 cloud responsibles (usually 1 per week) Different procedures in different clouds –FZK  Cedric’s script delete_dataset_aborted.py  run regularly from a crontab  uses: dq2.deleteDatasetReplicas, dq2.deleteDatasetSubscription, dq2.listFilesInDataset, lcg-del, lcg-uf  list of DS from a file  part of MyFrameWork: /afs/cern.ch/user/s/serfon/public/ddm/Myframework  will be published on Thursday

Enabling Grids for E-sciencE INFSO-RI ATLAS DDM Operations 9 Deletion of Aborted DS II SARA cloud: wrappers around dq2_cleanup: dq2_delete_aborted.sh #!/bin/sh # delete aborted DS using dq2_cleanup # start 1 d2_cleanup instance per site # input via parameter. # Parameter 1: list of aborted dataset and sites # example: # ideal0_mc singlepart_gamma_Et60.simul.HITS.v _tid ITEP # tested from lxplus, when grid and dq2 environment was set and # production proxy obtained like this: # # source /afs/cern.ch/project/gd/LCG-share/current/etc/profile.d/grid_env.sh # voms-proxy-init -voms atlas:/atlas/Role=production -valid 96:0 # source /afs/cern.ch/atlas/offline/external/GRID/ddm/pro03/dq2.sh SITES="SARADISK SARATAPE NIKHEF ITEP IHEP JINR SINP" DSLIST=$1 for SITE in $SITES ; do dq2_delete_aborted_site.sh $DSLIST $SITE & done

Enabling Grids for E-sciencE INFSO-RI ATLAS DDM Operations 10 Deletion of Aborted DS II dq2_delete_aborted_site.sh #!/bin/sh # delete aborted DS from a site using dq2_cleanup # # Input # parameter 1: list of aborted DS # parameter 2: SITENAME DSLIST=$1 SITE=$2 DQ2_CLEANUP=/afs/cern.ch/atlas/offline/external/GRID/ddm/pro03/dq2_clea nup LOG="${SITE}_${DSLIST}_`date +%Y%m%d_%H%M`.log" touch $LOG grep $SITE $DSLIST | while read DS ; do $DQ2_CLEANUP $DS >>$LOG 2>&1 done

Enabling Grids for E-sciencE INFSO-RI ATLAS DDM Operations 11 Integrity checks Cedric’ script – atlas.cgi/offline/Production/swing/scripts/ddm/integrity_check.py?view=loghttp://atlas-sw.cern.ch/cgi-bin/viewcvs- atlas.cgi/offline/Production/swing/scripts/ddm/integrity_check.py?view=log –some assumptions (/pnfs access) Simple compare of dumps: #!/bin/bash # read files from a DPM dump and match them with an LFC dump # DPM dump obtained by select name from Cns_file_metadata where gid=1307 and filesize > 0; DPM_DUMP=$1 LFC_DUMP=$2 FOUND=$1.found MISS=$1.miss cat $DPM_DUMP | while read FN FILEID; do grep -q $FN $LFC_DUMP if [ $? == 0 ] ; then echo "$FN $FILEID" >> $FOUND else echo "$FN $FILEID" >> $MISS fi done

Enabling Grids for E-sciencE INFSO-RI ATLAS DDM Operations 12 Data loss Only production files are treated Get list of lost files (provided by a sysadmin) Remove information about lost files from the SE db (must be done by a sysadmin) – see later talk Delete lost entries from an LFC catalogue Locate replicas of lost files. If they exist, consider replication to the affected SE. If they do not exist, remove lost files from datasets (DQ2 db) and pass the list of really lost files to prodsys group. DB of lost files – will be part of DQ2

Enabling Grids for E-sciencE INFSO-RI ATLAS DDM Operations 13 T2 cleaning remove_t2_in_t1.py by Stephane –A file is deleted if it fullfills all the following requests:  The file in the T2 is replicated in the T1DISK? of the name cloud?  The file belongs to a dataset which is not complete at the site  The file belongs to a dataset (with _tid) which is not subscribed to the T2 site ( Be carefull: During DDM migration to 0.3, all subscriptions are removed. You might deleted too many files untill subscriptions are put back. ) –Since v1.4, you can provide a list of restricted datasets to be deleted (even if subscribed) –It first scan the LFC catalog at the Tier1 (it is possible to use a local dump of the LFC catalog), scans the T2 entries in the LFC and deletes duplicated files on the T2 (using lcg-del). To run : python remove_t2_in_t1.py LAPP LPC or python remove_t2_in_t1.py LAPP LPC dataset1 dataset2

Enabling Grids for E-sciencE INFSO-RI ATLAS DDM Operations 14 More scripts Framework in preparation