Operational Experience with CMS Tier-2 Sites I. González Caballero (Universidad de Oviedo) for the CMS Collaboration.

Slides:



Advertisements
Similar presentations
Introduction to CMS computing CMS for summer students 7/7/09 Oliver Gutsche, Fermilab.
Advertisements

1 User Analysis Workgroup Update  All four experiments gave input by mid December  ALICE by document and links  Very independent.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
Integrating Network and Transfer Metrics to Optimize Transfer Efficiency and Experiment Workflows Shawn McKee, Marian Babik for the WLCG Network and Transfer.
K.Harrison CERN, 23rd October 2002 HOW TO COMMISSION A NEW CENTRE FOR LHCb PRODUCTION - Overview of LHCb distributed production system - Configuration.
A tool to enable CMS Distributed Analysis
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
Zhiling Chen (IPP-ETHZ) Doktorandenseminar June, 4 th, 2009.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.
José M. Hernández CIEMAT Grid Computing in the Experiment at LHC Jornada de usuarios de Infraestructuras Grid January 2012, CIEMAT, Madrid.
Preparation of KIPT (Kharkov) computing facilities for CMS data analysis L. Levchuk Kharkov Institute of Physics and Technology (KIPT), Kharkov, Ukraine.
Building a distributed software environment for CDF within the ESLEA framework V. Bartsch, M. Lancaster University College London.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities.
Fermilab User Facility US-CMS User Facility and Regional Center at Fermilab Matthias Kasemann FNAL.
CHEP'07 September D0 data reprocessing on OSG Authors Andrew Baranovski (Fermilab) for B. Abbot, M. Diesburg, G. Garzoglio, T. Kurca, P. Mhashilkar.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
UMD TIER-3 EXPERIENCES Malina Kirn October 23, 2008 UMD T3 experiences 1.
Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April
1 LCG-France sites contribution to the LHC activities in 2007 A.Tsaregorodtsev, CPPM, Marseille 14 January 2008, LCG-France Direction.
EGEE is a project funded by the European Union under contract IST HEP Use Cases for Grid Computing J. A. Templon Undecided (NIKHEF) Grid Tutorial,
CERN Using the SAM framework for the CMS specific tests Andrea Sciabà System Analysis WG Meeting 15 November, 2007.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Overview of STEP09 monitoring issues Julia Andreeva, IT/GS STEP09 Postmortem.
Stefano Belforte INFN Trieste 1 Middleware February 14, 2007 Resource Broker, gLite etc. CMS vs. middleware.
Stefano Belforte INFN Trieste 1 CMS Simulation at Tier2 June 12, 2006 Simulation (Monte Carlo) Production for CMS Stefano Belforte WLCG-Tier2 workshop.
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
The CMS CERN Analysis Facility (CAF) Peter Kreuzer (RWTH Aachen) - Stephen Gowdy (CERN), Jose Afonso Sanches (UERJ Brazil) on behalf.
Handling ALARMs for Critical Services Maria Girone, IT-ES Maite Barroso IT-PES, Maria Dimou, IT-ES WLCG MB, 19 February 2013.
Claudio Grandi INFN Bologna CERN - WLCG Workshop 13 November 2008 CMS - Plan for shutdown and data-taking preparation Claudio Grandi Outline: Global Runs.
Jean-Roch Vlimant, CERN Physics Performance and Dataset Project Physics Data & MC Validation Group McM : The Evolution of PREP. The CMS tool for Monte-Carlo.
US-CMS T2 Centers US-CMS Tier 2 Report Patricia McBride Fermilab GDB Meeting August 31, 2007 Triumf - Vancouver.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CRAB: the CMS tool to allow data analysis.
The ATLAS Computing Model and USATLAS Tier-2/Tier-3 Meeting Shawn McKee University of Michigan Joint Techs, FNAL July 16 th, 2007.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Data Management Highlights in TSA3.3 Services for HEP Fernando Barreiro Megino,
1 Andrea Sciabà CERN The commissioning of CMS computing centres in the WLCG Grid ACAT November 2008 Erice, Italy Andrea Sciabà S. Belforte, A.
Predrag Buncic CERN ALICE Status Report LHCC Referee Meeting 01/12/2015.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
Operational Experience with CMS Tier-2 Sites I. González Caballero (Universidad de Oviedo) for the CMS Collaboration.
Monitoring the Readiness and Utilization of the Distributed CMS Computing Facilities XVIII International Conference on Computing in High Energy and Nuclear.
LHCb Computing activities Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group.
Gestion des jobs grille CMS and Alice Artem Trunov CMS and Alice support.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
Job Priorities and Resource sharing in CMS A. Sciabà ECGI meeting on job priorities 15 May 2006.
CMS data access Artem Trunov. CMS site roles Tier0 –Initial reconstruction –Archive RAW + REC from first reconstruction –Analysis, detector studies, etc.
The CMS Beijing Tier 2: Status and Application Xiaomei Zhang CMS IHEP Group Meeting December 28, 2007.
Claudio Grandi INFN Bologna Workshop congiunto CCR e INFNGrid 13 maggio 2009 Le strategie per l’analisi nell’esperimento CMS Claudio Grandi (INFN Bologna)
Daniele Bonacorsi Andrea Sciabà
Xiaomei Zhang CMS IHEP Group Meeting December
Overview of the Belle II computing
Jan 12, 2005 Improving CMS data transfers among its distributed Computing Facilities N. Magini CERN IT-ES-VOS, Geneva, Switzerland J. Flix Port d'Informació.
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group
Quality Control in the dCache team.
Artem Trunov and EKP team EPK – Uni Karlsruhe
N. De Filippis - LLR-Ecole Polytechnique
ATLAS DC2 & Continuous production
The LHCb Computing Data Challenge DC06
Presentation transcript:

Operational Experience with CMS Tier-2 Sites I. González Caballero (Universidad de Oviedo) for the CMS Collaboration

- 2 - Operational Experience with CMS Tier-2 Sites - CHEP 2009 Some relevant aspects of CMS Computing Model Data driven: Move big blocks of data in a more or less controlled way Jobs are sent to the data and not vice versa Tools to handle the date and find where it is become very important Distributed Extensive use of the GRID technology Profit from the two more extended GRID infrastructures: OSG and EGEE Hierarchical: Tier-0 serves data to Tier-1s, which serve data to Tier-2s, which serve data to Tier-3s Different workflows occur in different tiers Different degrees of service and compromise are expected from different tiers Some figures for CMS  Event Size (MB): RAW: 1 - RECO: AOD: 0.1  CPU required (SI2k/event): Sim.: 90 – Rec.: 25 – Análisis: 0.25 T0 + T1 + T CPU (MSI2k)6090 Disk (PB)1525 Tape (PB)2540

- 3 - Operational Experience with CMS Tier-2 Sites - CHEP 2009 CMS Computing Model CMS Computing Model simulation reconstruction analysis interactive physics analysis batch physics analysis batch physics analysis detector event summary data raw data event reprocessing event reprocessing event simulation event simulation analysis objects (extracted by physics topic) event filter (selection & reconstruction) event filter (selection & reconstruction) processed data Tier-2

- 4 - Operational Experience with CMS Tier-2 Sites - CHEP 2009 CMS Computing Model: Tier-2 tasks Tier-2s account for 1/3 of the total CMS resources More than 40 sites in 22 countries They are expected to provide resources for: Production of all the simulation the collaboration needs User Data Analysis Centrally controlled activity MC Production requires… GRID environment Working Storage Element that understands SRM Ability to transfer data to Tier-1s CMS software (CMSSW) installed at the site User driven activity  Bursty Data Analysis requires… GRID environment CMS software (CMSSW) Working Storage Element: That understands SRM With enough space to host the datasets needed Ability to transfer data from Tier-1s

- 5 - Operational Experience with CMS Tier-2 Sites - CHEP 2009 CMS Tier-2 requirements A CMS Tier-2 needs the following GRID infrastructure: A GRID computing cluster: OSG or EGEE A storage cluster: CASTOR, dCache, DPM, GPFS… With an SRMv2 frontend: StoRM GRID interfaces to both clusters Local monitoring tools: batch, storage, accounting, … Plus the following CMS Services: PhEDEx: To manage Data Transfers Connects sites through SRMv2 FTS service at Tier-1s is used to schedule transfers A dedicated mid size machine FroNTier: Squid to cache locally alignment and calibration constants A small size machine every 800 slots Besides it may operate some other services A login facility for local users: User Interfaces, interactive access to data locally stored,… Local non mandatory GRID and CMS services to improve the local users experience: Local WMS, CRAB Server, local Data Bookkeeping Service (DBS),... Related talk by R. Egeland: PhEDEx Data Service (Thur, 16:30 )PhEDEx Data Service

- 6 - Operational Experience with CMS Tier-2 Sites - CHEP 2009 CMS Data Handling: Transfers at CMS Tier-2s A full metric to commission links (up and down) has been developed Based on expected data bandwidths and data transfer quality To avoid sites with problems overloading good performing sites Only commissioned links may be used to transfer CMS data CMS Model is very dependent on an efficient data transfer system CMS has a very flexible transfer topology Any Tier-2 downloads and uploads data from any Tier-1 Tier-2 to Tier-2 transfers are also allowed (though not encouraged) Interesting for Tier-2s associated with the same physics groups This additional complexity in the operation of the Tier-2 network: Multiple SRM connections must be managed by the sites The different latencies make optimization difficult Operators are geographically spread in different time zones difficulting communications T0 (CERN) T1 (ASGC) T1 (FZK) T1 (CNAF) T1 (FNAL) T1 (PIC) T2 (ES) T2 (…) T2 (…) T2 (…) T2 (…)

- 7 - Operational Experience with CMS Tier-2 Sites - CHEP 2009 CMS Data Handling: Commisioning links at Tier-2 A big effort has been put by CMS Facility Operations to improve the amount of active links The downlink mesh is almost full Around 50% of the uplinks have been commissioned At least two uplinks are mandatory for every Tier-2 The Debugging Data Transfers effort, still ongoing, is helping Tiert-2s to fill the mesh Also working on reducing dataset transfer latencies so data can be used sooner at sites For more details see the poster from J. Letts: Debugging Data Transfers in CMS (Thur - 024) Debugging Data Transfers in CMS

- 8 - Operational Experience with CMS Tier-2 Sites - CHEP 2009 CMS Data Handling: Transfers to and from Tier-2 PhEDEx takes care of the transfers using a subscription mechanism Transfers use SRMv2 scheduled with FTS A set of agents take care of the different activities needed: download, upload, data consistency checks, etc… PhEDEx also provides data validation and monitoring tools The Tier-2s need to set a UI machine PhEDEx software is centrally distributed through apt-get Local operators need to configure the agents Tuning them is not always trivial Lots of documentation and examples available and public A XML file (the Trivial File Catalog) takes care of converting LFN  PFN Transfers to CMS Tier-2 Last year: 14,035 TB Transfers from CMS Tier-2 Last year: 4,787 TB

- 9 - Operational Experience with CMS Tier-2 Sites - CHEP 2009 CMS Data Handling: Tier-2 Storage distribution MC Space – 20 TB For MC produced samples before they are transferred to the Tier-1s Central Space - 30 TB Intended for RECO samples of Primary Datasets Physics Group Space TB Assigned to 1-3 physics groups Space allocated by physics data manager Local Storage Space - 30TB-60TB Intended to benefit the geographically associated community User Space – 0.5-1TB per user Each CMS user is associated to a CMS Tier-2 site Big outputs from user jobs can be staged out to this area Temporary Space - < 1TB For more details see the poster from T. Kress: CMS Tier-2 Resource Management (Mon )CMS Tier-2 Resource Management

Operational Experience with CMS Tier-2 Sites - CHEP 2009 CMS Data Handling: Selecting the data at the Tier-2 Datasets for Central Space is managed by the Data Operations team by subscribing the assigned samples PAGs and DPGs usually appoint one or two persons responsible for subscribing data to the Physics Group Space at their “associated” Tier- 2s PhEDEx keeps track of the “property” of each dataset for this two disk areas: Easy to follow the correct use of data at the sites MC Space is filled by the production jobs Data is requested for deletion as soon as it is transferred to Tier-1s Each single user in CMS can make a request for a dataset to be placed on the Local Storage Space at any Tier-2 Sites are free to manage the use of the User Space the way the prefer: quotas, mail, etc… Users are usually close to the Tier-2 CMS created the role of the Data Manager at each site with special rights: Reviews every single transfer or deletion request… …and approves or denies it The Data Manager makes sure: The data is in accordance with the site commitments with the Physics and Detector Groups There is enough space at the local Storage Element to store the data At big sites this can be a quite time consuming activity

Operational Experience with CMS Tier-2 Sites - CHEP 2009 Computing at CMS Tier-2s Software Installation is centrally managed by CMS The VO sgm role is used and is expected to have the highest priority on the queues Due to some limitations in rpm under SLC4, CMSSW installation needs a 64 bit node The installation of old CMSSW releases needs big amounts of memory in the installation node Improvement in newer releases reduce this requirement to O(100MB) The CMSSW procedure needs write access for all software managers Map all sgm grid logins to a single account The installation area has to be shared among all Worker Nodes Data access from the WNs CMSSW understands the Trivial File Catalog so it is used to convert LFNs to PFNs POSIX/dCache/RFIO protocols are supported Production Workflow: A nominal Tier-2 is expected to reserve half of its CPUs for MC Production Managed through the VO production role GRID access for local users: A User Interface needs to be set With CRAB manually installed on it Really easy to install using a tar file and a automatic configuration script Related talk by D. Spiga: Automatization of User Analysis Workflow in CMS (Thur, 17:10 )Automatization of User Analysis Workflow in CMS

Operational Experience with CMS Tier-2 Sites - CHEP 2009 Operating CMS Tier-2s: Central Aspects Operating the more than 40 CMS Tier-2 sites is a complex task: Geographically spread around the globe… in different time zones With wide variety of sizes, technologies, bandwidths… Good means to communicate important news, configuration changes, requirements and problems is important: Special Hypernews forum dedicated to Tier-2s At least one local operator at every site needs to follow A Savanah squad per site has been created Each problem found at a site is assigned to the squad A new metric has been developed to establish the site capability to contribute efficiently to CMS: Site Readiness Based on the number of commissioned links, fake analysis jobs (JobRobot) and Site Availability Monitoring (SAM) tests Sites are then classified as READY, NOT-READY or WARNING (in danger to become NOT-READY) See the poster by J. Flix (Thur 040): The commissioning of CMS sites: improving the site reliabilityThe commissioning of CMS sites: improving the site reliability

Operational Experience with CMS Tier-2 Sites - CHEP 2009 Monitoring CMS Tier-2s Workflows can be monitored through the CMS Dashboard Almost any aspect of analysis and production jobs can be checked: Successful/cancelled/aborted jobs By user, by site, by application, by dataset, by CE,… By GRID or Application error code All aspects of data handling can be monitored through the wide variety of PhEDEx Web Server plots and tables: Transfer rates and volumes Quality of the transfers Errors detected and reasons for those errors Latencies, routing details, … SAM tests and Site Readiness offer its own set of tools integrated in the Dashboard Many tools have been developed to monitor the different aspects of a Tier-2 from the point of view of CMS For both local and central operators

Operational Experience with CMS Tier-2 Sites - CHEP 2009 CMS Tier-2 Workflows: Production Production uses a special tool developed by CMS: ProdAgent Completely centralized No local operator intervention in the operation Data is produced at Tier-2s and automatically uploaded to Tier-1 See the poster by F. Van Lingen (Tue 014): CMS production and processing system - Design and experiencesCMS production and processing system - Design and experiences More than 2 billion events produced during the last 12 months x 10 9

Operational Experience with CMS Tier-2 Sites - CHEP 2009 CMS Tier-2 Workflows: User Analysis 39.6% 60.4% More than 7.5 million user analysis jobs executed at Tier-2s On produced samples And on real data: Cosmics recorded with full and no magnetic field

Operational Experience with CMS Tier-2 Sites - CHEP 2009 Future plans… The main goal in the near future is to completely integrate all the CMS Tier-2s into CMS computing operations Using dedicated task forces to help sites meet the Site Readiness metrics Improve the availability and reliability of the sites to increase further the efficiency of both analysis and production activities Complete the data transfer mesh by commissioning the missing links Specially Tier-2  Tier-1 links And continuously checking the Improve the deployment of CMS Software loosening the requisites at the sites Install CRAB Servers at more sites: CRAB Server takes care of some user routine interactions with the GRID improving the user experience Improves the accounting and helps spotting problems and bugs in CMS software A new powerful machine and special software needs to be installed by local operators CMS is building the tools to allow users to share their data with other users or groups This will impact on the way data is handled at the sites

Operational Experience with CMS Tier-2 Sites - CHEP 2009 Conclusions Tier-2 sites play a very important role in the CMS Computing Model: They are expected to provide one third of the CMS computing resources CMS Tier-2 sites handle a mix of centrally controlled activity (MC production) and chaotic workflows (user analysis) CPU needs to be appropriately set to ensure enough resources are given to each workflow CMS has built the tools to facilitate the day by day handling of data at the sites The PhEDEx servers located at every site helps transferring data in an unattended way A Data Manager appointed at every site links CMS central data operations with the local management CMS has established metrics to validate the availability and readiness of the Tier-2s to contribute efficiently to the collaboration computing needs By verifying the ability to transfer and analyze data A big number of monitoring tools have been developed by CMS to monitor every aspect of a Tier-2 in order to better identify and correct the problems that may appear CMS Tier-2s have proved to be already well prepared for massive data MC production, dynamic data transfer, and efficient data serving to local GRID clusters CMS Tier-2s have proved to be able to provide our physicists with the infrastructure and the computing power to perform their analysis efficiently CMS Tier-2s have a crucial role to play in the coming years in the experiment, and are already well prepared for the LHC collisions and the CMS data taking

The End ¡Thank you very much!

Operational Experience with CMS Tier-2 Sites - CHEP 2009 DRAFT Abstract: In the CMS computing model, about one third of the computing resources are located at Tier-2 sites, which are distributed across the countries in the collaboration. These sites are the primary platform for user analyses; they host datasets that are created at Tier- 1 sites, and users from all CMS institutes submit analysis jobs that run on those data through grid interfaces. They are also the primary resource for the production of large simulation samples for general use in the experiment. As a result, Tier-2 sites have an interesting mix of organized experiment-controlled activities and chaotic user-controlled activities. CMS currently operates about 40 Tier-2 sites in 22 countries, making the sites a far-flung computational and social network. We describe our operational experience with the sites, touching on our achievements, the lessons learned, and the challenges for the future.

Operational Experience with CMS Tier-2 Sites - CHEP 2009 CMS Tier-2 Workflows: User Analysis Part of the analysis jobs run on real data: Cosmics at full and no magnetic field Jobs run at each Tier-2 during the last 12 months Total jobs: OK 39.6 ERR