Availability of ALICE Grid resources in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week.

Slides:

Advertisements

Similar presentations

During the last three years, ALICE has used AliEn continuously. All the activities needed by the experiment (Monte Carlo productions, raw data registration,

Advertisements

Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.

1 User Analysis Workgroup Update  All four experiments gave input by mid December  ALICE by document and links  Very independent.

GSIAF "CAF" experience at GSI Kilian Schwarz. GSIAF Present status Present status installation and configuration installation and configuration usage.

ALICE Operations short summary LHCC Referees meeting June 12, 2012.

ALICE Operations short summary and directions in 2012 WLCG workshop May 19-20, 2012.

October 24, 2000Milestones, Funding of USCMS S&C Matthias Kasemann1 US CMS Software and Computing Milestones and Funding Profiles Matthias Kasemann Fermilab.

Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.

3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.

CY2003 Computer Systems Lecture 09 Memory Management.

Status of the production and news about Nagios ALICE TF Meeting 22/07/2010.

Grid Lab About the need of 3 Tier storage 5/22/121CHEP 2012, The need of 3 Tier storage Dmitri Ozerov Patrick Fuhrmann CHEP 2012, NYC, May 22, 2012 Grid.

Panda Grid Status Kilian Schwarz, GSI on behalf of PANDA GRID Group (slides to a large extend from Radoslaw Karabowicz)

WLCG GDB, CERN, 10th December 2008 Latchezar Betev (ALICE-Offline) and Patricia Méndez Lorenzo (WLCG-IT/GS) 1.

Status Report of WLCG Tier-1 candidate for KISTI-GSDC Sang-Un Ahn, for the GSDC Tier-1 Team GSDC Tier-1 Team 12 th CERN-Korea.

EGEE-III INFSO-RI Enabling Grids for E-sciencE Overview of STEP09 monitoring issues Julia Andreeva, IT/GS STEP09 Postmortem.

DDM Monitoring David Cameron Pedro Salgado Ricardo Rocha.

Status of PDC’07 and user analysis issues (from admin point of view) L. Betev August 28, 2007.

V.Ilyin, V.Gavrilov, O.Kodolova, V.Korenkov, E.Tikhonenko Meeting of Russia-CERN JWG on LHC computing CERN, March 14, 2007 RDMS CMS Computing.

INFSO-RI Enabling Grids for E-sciencE Enabling Grids for E-sciencE Pre-GDB Storage Classes summary of discussions Flavia Donno Pre-GDB.

1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.

ALICE Operations short summary ALICE Offline week June 15, 2012.

Jan 2010 OSG Update Grid Deployment Board, Feb 10 th 2010 Now having daily attendance at the WLCG daily operations meeting. Helping in ensuring tickets.

The CMS Top 5 Issues/Concerns wrt. WLCG services WLCG-MB April 3, 2007 Matthias Kasemann CERN/DESY.

AliEn central services Costin Grigoras. Hardware overview  27 machines  Mix of SLC4, SLC5, Ubuntu 8.04, 8.10, 9.04  100 cores  20 KVA UPSs  2 * 1Gbps.

LCG WLCG Accounting: Update, Issues, and Plans John Gordon RAL Management Board, 19 December 2006.

Participation of JINR in CERN- INTAS project ( ) Korenkov V., Mitcin V., Nikonov E., Oleynik D., Pose V., Tikhonenko E. 19 march 2004.

Predrag Buncic CERN ALICE Status Report LHCC Referee Meeting 01/12/2015.

03/09/2007http://pcalimonitor.cern.ch/1 Monitoring in ALICE Costin Grigoras 03/09/2007 WLCG Meeting, CHEP.

Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.

Data transfers and storage Kilian Schwarz GSI. GSI – current storage capacities vobox LCG RB/CE GSI batchfarm: ALICE cluster (67 nodes/480 cores for batch.

CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.

ALICE Grid operations +some specific for T2s US-ALICE Grid operations review 7 March 2014 Latchezar Betev 1.

The Grid Storage System Deployment Working Group 6 th February 2007 Flavia Donno IT/GD, CERN.

WLCG Operations Coordination report Maria Alandes, Andrea Sciabà IT-SDC On behalf of the WLCG Operations Coordination team GDB 9 th April 2014.

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Operations: Evolution of the Role of.

Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.

Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.

Pledged and delivered resources to ALICE Grid computing in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week.

INFN-Grid WS, Bari, 2004/10/15 Andrea Caltroni, INFN-Padova Marco Verlato, INFN-Padova Andrea Ferraro, INFN-CNAF Bologna EGEE User Support Report.

Grid Operations in Germany T1-T2 workshop 2015 Torino, Italy Kilian Schwarz WooJin Park Christopher Jung.

Grid Operations in Germany T1-T2 workshop 2016 Bergen, Norway Kilian Schwarz Sören Fleischer Raffaele Grosso Christopher Jung.

The ALICE Christmas Production L. Betev, S. Lemaitre, M. Litmaath, P. Mendez, E. Roche WLCG LCG Meeting 14th January 2009.

Alice Operations In France

Kilian Schwarz ALICE Computing Meeting GSI, October 7, 2009

WLCG IPv6 deployment strategy

WLCG Tier-2 Asia Workshop TIFR, Mumbai 1-3 December 2006

Computing Operations Roadmap

Summary on PPS-pilot activity on CREAM CE

Database Services at CERN Status Update

LHC Computing Grid Status of Resources Financial Plan and Sue Foffano

Report PROOF session ALICE Offline FAIR Grid Workshop #1

ALICE FAIR Meeting KVI, 2010 Kilian Schwarz GSI.

Brief overview on GridICE and Ticketing System

GSIAF & Anar Manafov, Victor Penso, Carsten Preuss, and Kilian Schwarz, GSI Darmstadt, ALICE Offline week, v. 0.8.

Update on Plan for KISTI-GSDC

The CREAM CE: When can the LCG-CE be replaced?

Luca dell’Agnello INFN-CNAF

Readiness of ATLAS Computing - A personal view

Update on SHA-2 and RFC proxy support

RDIG for ALICE today and in future

GSIAF "CAF" experience at GSI

Disk capacities in 2017 and 2018 ALICE Offline week 12/11/2017.

Simulation use cases for T2 in ALICE

AliEn central services (structure and operation)

Bernd Panzer-Steindel CERN/IT

Universita’ di Torino and INFN – Torino

Monitoring of the infrastructure from the VO perspective

Installation/Configuration

The LHCb Computing Data Challenge DC06

Presentation transcript:

Availability of ALICE Grid resources in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week

Overview installed resources for ALICE Grid computing in Germany (especially GridKa) are shown and how they compare to the pledged resources as well as the resources being actually used by Grid jobs. CPU: the pledged resources are not fully used. An analysis within this talk discusses the reasons. possible methods to improve the situation summary and conclusions

ALICE resources at GridKa GridKa pledged % of the requested ALICE T1 resources. The plan was to do the same for But since the globally requested T1 CPU resources reduced significantly compared to the original request GridKa will go in 2010 for 25% disk and tape, CPU will be ca. 30% of the requested ALICE T1 resources this way the increase in absolute numbers will stay the same as CPUs have already been ordered GridKa resources for all 8 experiments & ALICE share

resources at GridKa (CPU)  all requested resources are installed and ready to be used !!! nominal ALICE share: 38% of CPUs

GridKa CPU Share  If CPUs are not requested by ALICE they can be used easily by other experiments statistics September 2009

storage at GridKa new SE to follow: ALICE::FZK::TAPE full xrootd based storage solution in the sense of WLCG (SRM interface, gridftp interface, tape backend, many more features) The 1.5 PB disk space installed at GridKa are to a large extend empty !!! Tapes are basically not used at all !!! If the storage devices are not used the disks stay empty. They can not be used by other experiments easily.

resources at GSI CPUs: installed Disk: installed

analysis of consumed CPU resources by ALICE Grid jobs at German sites the following 3 slides show 3 run periods of 1 month each within the time span July to October - issues at GridKa are shown in the plots

German sites GridKa WMS GridKa CREAM GSI Münster no production due to AliRoot issue ALICE IS firewall issue Switch of hostcert from FZK to KIT ALICE total central Task Queue July 23 – August 23

WMS CREAM connectionWMS Software dir Job profile Production pause CREAM DB Software dir readonly # job dirs too high (CREAM) ALICE total central TQ August 22 – September 21

WMS BDII no production planned production restart no production due to blocking AliRoot problems German sites ALICE total central TQ September 22 – October 22

done job statistics GridKa: 13% GridKa WMS GridKa CREAM

pledged resources

analysis GridKa various issues (MW, GridKa, ALICE) – during first run period issues were more GridKa related, in second run period issues were equally distributed, in last run period the issues are more ALICE related in the last run period almost no region did well compared to pledged resources due to missing Grid jobs big oscillations in job pattern also due to complicated interplay between AliEn and LCG services GSI runs more stable before July 27 and end of September: no official statement but number of Jobs waiting in TQ on lower end GridKa belongs with up to 15% to the largest producers of DONE jobs in terms of delivered CPUs usually Germany belongs to the largest groups ALICE is not able to fully use the German pledged resources, though, with the largest deficit being at GridKa

Suggestions how to improve things GridKa related issues More manpower Better communication Participation in related meetings Better monitoring contineous testing via small productions and corresponding analysis jobs In terms of resources: GSI could pledge more resources to the Grid, resource distribution among German Grid sites can be done in a flexible way to ensure that ALICE Grid gets what it needs Middleware related issues Better monitoring (e.g. at CERN exists a technique to monitor overladed WMS) Also here participation in related meetings ALICE related issues Production free time should be reduced and better communicated if no central production jobs then user jobs should fill the gap resource monitoring should be folded with availability of Grid jobs (e.g. jobs matching with site)

current situation actually, a lot of things suggested (see slide before) have already been implemented as soon as production restarts we will see if things improved

AOB missing tape space GridKa pledged a significant part of the requested T1 tape capacity we are investigating if more tape space can be provided on short notice

summary and conclusion ALICE is currently not able to use the pledged resources of German Grid sites to a full extend reasons for this are many folded (site problems, Middleware problems, missing production jobs) site problems could be solved with higher response time MW problems could be monitored better (e.g. overloaded WMS following a procedure existing at CERN) but if production jobs are missing the sites can not fulfill the request for pledged resources  the delivered CPUs have to be folded with the number of existing jobs (e.g. "jobs match the site resources") from the hardware point of view all resources are installed and available