Stefano Belforte INFN Trieste 1 CMS Simulation at Tier2 June 12, 2006 Simulation (Monte Carlo) Production for CMS Stefano Belforte WLCG-Tier2 workshop.

Slides:



Advertisements
Similar presentations
Review of WLCG Tier-2 Workshop Duncan Rand Royal Holloway, University of London Brunel University.
Advertisements

Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
K.Harrison CERN, 23rd October 2002 HOW TO COMMISSION A NEW CENTRE FOR LHCb PRODUCTION - Overview of LHCb distributed production system - Configuration.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
Stefano Belforte INFN Trieste 1 CMS SC4 etc. July 5, 2006 CMS Service Challenge 4 and beyond.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
Zhiling Chen (IPP-ETHZ) Doktorandenseminar June, 4 th, 2009.
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
The SAMGrid Data Handling System Outline:  What Is SAMGrid?  Use Cases for SAMGrid in Run II Experiments  Current Operational Load  Stress Testing.
Computing and LHCb Raja Nandakumar. The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.
INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities.
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
Interactive Job Monitor: CafMon kill CafMon tail CafMon dir CafMon log CafMon top CafMon ps LcgCAF: CDF submission portal to LCG resources Francesco Delli.
1 DIRAC – LHCb MC production system A.Tsaregorodtsev, CPPM, Marseille For the LHCb Data Management team CHEP, La Jolla 25 March 2003.
F. Fassi, S. Cabrera, R. Vives, S. González de la Hoz, Á. Fernández, J. Sánchez, L. March, J. Salt, A. Lamas IFIC-CSIC-UV, Valencia, Spain Third EELA conference,
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
Tier-2  Data Analysis  MC simulation  Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia.
São Paulo Regional Analysis Center SPRACE Status Report 22/Aug/2006 SPRACE Status Report 22/Aug/2006.
CCRC’08 Weekly Update Jamie Shiers ~~~ LCG MB, 1 st April 2008.
Febryary 10, 1999Stefano Belforte - INFN Trieste1 CDF Run II Computing Workshop. A user’s perspective Stefano Belforte INFN - Trieste.
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
EGEE is a project funded by the European Union under contract IST HEP Use Cases for Grid Computing J. A. Templon Undecided (NIKHEF) Grid Tutorial,
CERN Using the SAM framework for the CMS specific tests Andrea Sciabà System Analysis WG Meeting 15 November, 2007.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Data reprocessing for DZero on the SAM-Grid Gabriele Garzoglio for the SAM-Grid Team Fermilab, Computing Division.
Stefano Belforte INFN Trieste 1 Middleware February 14, 2007 Resource Broker, gLite etc. CMS vs. middleware.
Outline: Tasks and Goals The analysis (physics) Resources Needed (Tier1) A. Sidoti INFN Pisa.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.
Outline: Status: Report after one month of Plans for the future (Preparing Summer -Fall 2003) (CNAF): Update A. Sidoti, INFN Pisa and.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
Tier3 monitoring. Initial issues. Danila Oleynik. Artem Petrosyan. JINR.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
1 November 17, 2005 FTS integration at CMS Stefano Belforte - INFN-Trieste CMS Computing Integration Coordinator CMS/LCG integration Task Force leader.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
A Data Handling System for Modern and Future Fermilab Experiments Robert Illingworth Fermilab Scientific Computing Division.
A Computing Tier 2 Node Eric Fede – LAPP/IN2P3. 2 Eric Fede – 1st Chinese-French Workshop Plan What is a Tier 2 –Context and definition To be a Tier 2.
Gestion des jobs grille CMS and Alice Artem Trunov CMS and Alice support.
The HEPiX IPv6 Working Group David Kelsey (STFC-RAL) EGI OMB 19 Dec 2013.
Geant4 GRID production Sangwan Kim, Vu Trong Hieu, AD At KISTI.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
CERN IT Department CH-1211 Genève 23 Switzerland t CMS SAM Testing Andrea Sciabà Grid Deployment Board May 14, 2008.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
ATLAS Computing Model Ghita Rahal CC-IN2P3 Tutorial Atlas CC, Lyon
Compute and Storage For the Farm at Jlab
Dynamic Extension of the INFN Tier-1 on external resources
U.S. ATLAS Grid Production Experience
Data Challenge with the Grid in ATLAS
INFN-GRID Workshop Bari, October, 26, 2004
Artem Trunov and EKP team EPK – Uni Karlsruhe
Simulation use cases for T2 in ALICE
Universita’ di Torino and INFN – Torino
N. De Filippis - LLR-Ecole Polytechnique
MonteCarlo production for the BaBar experiment on the Italian grid
The LHCb Computing Data Challenge DC06
Presentation transcript:

Stefano Belforte INFN Trieste 1 CMS Simulation at Tier2 June 12, 2006 Simulation (Monte Carlo) Production for CMS Stefano Belforte WLCG-Tier2 workshop

Stefano Belforte INFN Trieste June 12, 2006 CMS Simulation at Tier2 2 Outline What MonteCarlo simulation is How is it done at CMS What Tier2 role is What services Tier2 need to provide Tutorials later this week to learn how exactly to setup and operate those services

Stefano Belforte INFN Trieste June 12, 2006 CMS Simulation at Tier2 3 Simulation Production Simulation (Monte Carlo) Production means  Generate, simulate, reconstruct data that look like data coming from detector, i.e. as realistic as possible, but all simulated Ideally:  Input is a configuration (few KB), output a dataset (TB)  Heavily CPU-bound activity In practice:  Simulation, reconstruction, addition of multiple interactions (pileup) look more like analysis: data in – executable – data out  Some step may be I/O bound  CMS will try to combine the flow and run CPU bound jobs Expect a lot of CPU bound jobs, but do not build a Tier2 thinking only of CPU  Tier2 also needs to run analysis anyhow

Stefano Belforte INFN Trieste June 12, 2006 CMS Simulation at Tier2 4 Simulation at Tier2 In CMS Computing Model and Computing-TDR, Tier2’s are the place where MC production is run  Tier2 comply with this requirement by offering a computing service that is exploited by CMS MC Operation team to produce simulation datasets according to CMS policy (defined by CMS physics groups) Tier2 do not define autonomously what data needs to be produced There will be “user level MC”, small production that satisfy the need of single user and do not go through physics groups scheduling  Could still be managed centrally or run directly by that user  To a Tier2 it will look the same: a bunch of grid jobs Tier2 are not handed a list of jobs to run, they are used as service providers Tier2 are “good grid sites that offer CMS-needed services”

Stefano Belforte INFN Trieste June 12, 2006 CMS Simulation at Tier2 5 Simulation Workflow Simulation is a grid based activity  Jobs are submitted centrally  Expect 2-5 groups of operators to be able to keep the system busy and manage all production  Local submission is not forbidden, but un-economic Automated tools will be used  Central request queue (ProductionManager)  Chunks of (thousands of) jobs are handed over to a few ProductionAgents that submit, track, bookkeep, store output to final location and register in CMS databases Each ProductionAgent will need a human being (or two)  Ideally one ProductionAgent can fill the grid  In practice humans behind it will need to deal with operational failures on the grid, hence need to have more people and split competences  E.g. one group for OSG, one for EGEE Your customers will be remote users

Stefano Belforte INFN Trieste June 12, 2006 CMS Simulation at Tier2 6 Simulation Dataflow Simulation is an expensive task (many CPU hours): precious output Data will be stored at CMS Tier1’s  Backup on tape  May need to re-process with new reconstruction  May need to distribute to more T2  Free disk while making it available even years later from tape But job output does not go directly from WN to T1:  Each WN output is stored locally  small files are merged locally (another set of central jobs)  Heavily I/O bound !  A validation step is run locally  Large files are sent to T1  Sometimes the output is short lived, may stay a week at T2, then get deleted Full integration in CMS data management system needed

Stefano Belforte INFN Trieste June 12, 2006 CMS Simulation at Tier2 7 Resource management In CMS Computing Mode Tier2 also support majority of user analysis  Each Tier2 more or less split: 50% analysis, 50% simulation Both activities can easily fill whatever resources are there  Simulation is planned, can fill queues with weeks-worth of jobs  Analysis by users is dynamic: I need this by tomorrow Need a way to partition resources Grid tools are not so developed here, especially on EGEE side  You will need to help, at least initially  EGEE PreProduction sites will be asked to setup an initial test configuration this summer, additional volunteers will be welcome Simulation and analysis have difficult co-existance

Stefano Belforte INFN Trieste June 12, 2006 CMS Simulation at Tier2 8 Simulation Operation Division of responsibility CMS Operation team focus on global issues:  CMS application: configuration, version, result  Validation of result  Bookeeping in CMS data catalogs  Ranks sites as yes/no based on works/does-not-work  No need to know details of dCache or DPM Tier2 operation team focus on fabric issues:  Racks, network switches, file servers  Cooling, heating, power, repairs  Operating system, middleware installation, SFT  Local failures tracking/fixing  No need to know details of CMS Dataset provenance tracking e.g.

Stefano Belforte INFN Trieste June 12, 2006 CMS Simulation at Tier2 9 Tier2 required grid services Pass all SiteFunctionalTests Working Computing Element  Good, solid, powerful batch farm Also provide good storage  Size  Performance  Services SRM access FTS channels (run on a server at some T1, but transfer is always a two end responsibility) Good I/O needed (those I/O bound cases) Good network (1Gbit/sec) capability to move data fast on WAN while data is being read/written fast locally CMS nominal Tier2 (200 CPU, 200TB): not trivial to operate

Stefano Belforte INFN Trieste June 12, 2006 CMS Simulation at Tier2 10 CMS services needed CMS Software distribution available  Simple, but critical  Central installation, but problems must be fixed locally Integration in CMS Data Management  Local (Trivial) File Catalog  PhEDEx (CMS’s own high level dataset transfer tool, will work on top of FTS)  DLS client (tracks in CMS catalog which data is at the site) Even simulation jobs will need to access CMS calibration and condition Database (how to make realistic simulation otherwise ?)  Need to deploy a Squid cache  CMS Database calls are mapped into the application to http calls  Squid is an open-source, widely-used, robust, high-performance, easy-to-install http cache server Details on CMS services during tutorials

Stefano Belforte INFN Trieste June 12, 2006 CMS Simulation at Tier2 11 Bringing it all toghether What are the really important things a Tier2 has to do in order to support CMS Simulation operation ? The answer is not:  Have lots of hardware  Even if that helps Next slide

Stefano Belforte INFN Trieste June 12, 2006 CMS Simulation at Tier2 12 What CMS Tier2 really need to do Sites need to be up and running and care for themselves They provide MC capacity as a service to remote operators and must be aware that those can not reach in and fix problems. Users can not login on your Worker nodes to run ps/dbg/kill !  You have to help  fixing SW area  spotting hung jobs, forwarding error logs, traces  Spotting/removing bad nodes  Monitor hardware and remove before it breaks A good Tier2 is active, responsive, attentive, proactive  Spot problems before the users  Fixe them before the users have problems  Study usage patters, talk to users, understand needs and plan for the future, reports problems to CMS Computing A Tier2 provides services to CMS globally, but also to local community, and reflects the obligations of local community versus CMS

Stefano Belforte INFN Trieste June 12, 2006 CMS Simulation at Tier2 13 Optional Service some site may host a "production operation unit" and so need to run ProductionAgent Or may decide to run a ProductionAgent for the convenience of local users Support will be available for running a ProductionAgent locally  See tutorials

Stefano Belforte INFN Trieste June 12, 2006 CMS Simulation at Tier2 14 Conclusion Tier2’s are vital to CMS physics program Supporting CMS Simulation Production is easy:  Just be a good grid site with a few easy additional services But it is also important, and difficult:  Computers may stay up in unattended mode  But quality of services usually not  Continuous, proactive, intelligent care is needed You are the ones who will make your site productive and efficient  No matter how fancy tools CMS develops centrally A Tier2 will succeed or fail based on the dedication of the people running it