Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 1 Produzioni MC ai Tiers CMS nel 2007: prospettive CMS-wide e contributo italiano Università,

Slides:



Advertisements
Similar presentations
Nicola De Filippis Integration meeting, 28 th September p. 1 MC production for CSA06 Department of Physics and INFN Bari N. De Filippis S. My and.
Advertisements

1 14 Feb 2007 CMS Italia – Napoli A. Fanfani Univ. Bologna A. Fanfani University of Bologna MC Production System & DM catalogue.
1 CRAB Tutorial 19/02/2009 CERN F.Fanzago CRAB tutorial 19/02/2009 Marco Calloni CERN – Milano Bicocca Federica Fanzago INFN Padova.
CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
Grid in action: from EasyGrid to LCG testbed and gridification techniques. James Cunha Werner University of Manchester Christmas Meeting
A tool to enable CMS Distributed Analysis
Client/Server Grid applications to manage complex workflows Filippo Spiga* on behalf of CRAB development team * INFN Milano Bicocca (IT)
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
Stefano Belforte INFN Trieste 1 CMS SC4 etc. July 5, 2006 CMS Service Challenge 4 and beyond.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
BaBar Grid Computing Eleonora Luppi INFN and University of Ferrara - Italy.
CHEP – Mumbai, February 2006 The LCG Service Challenges Focus on SC3 Re-run; Outlook for 2006 Jamie Shiers, LCG Service Manager.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
F.Fanzago – INFN Padova ; S.Lacaprara – LNL; D.Spiga – Universita’ Perugia M.Corvo - CERN; N.DeFilippis - Universita' Bari; A.Fanfani – Universita’ Bologna;
Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
2 Sep Experience and tools for Site Commissioning.
CMS Stress Test Report Marco Verlato (INFN-Padova) INFN-GRID Testbed Meeting 17 Gennaio 2003.
1 M. Paganoni, HCP2007 Computing tools and analysis architectures: the CMS computing strategy M. Paganoni HCP2007 La Biodola, 23/5/2007.
1. Maria Girone, CERN  Q WLCG Resource Utilization  Commissioning the HLT for data reprocessing and MC production  Preparing for Run II  Data.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
Overall Goal of the Project  Develop full functionality of CMS Tier-2 centers  Embed the Tier-2 centers in the LHC-GRID  Provide well documented and.
CERN Using the SAM framework for the CMS specific tests Andrea Sciabà System Analysis WG Meeting 15 November, 2007.
Certification and test activity IT ROC/CIC Deployment Team LCG WorkShop on Operations, CERN 2-4 Nov
EGEE-III INFSO-RI Enabling Grids for E-sciencE Overview of STEP09 monitoring issues Julia Andreeva, IT/GS STEP09 Postmortem.
1 CHEP07, September 2-7, Victoria, CanadaN. De Filippis Real-time dataflow and workflow with the CMS Tracker data N. De Filippis Dipartimento di Fisica.
Stefano Belforte INFN Trieste 1 Middleware February 14, 2007 Resource Broker, gLite etc. CMS vs. middleware.
1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008.
Karsten Köneke October 22 nd 2007 Ganga User Experience 1/9 Outline: Introduction What are we trying to do? Problems What are the problems? Conclusions.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
1 LHCC RRB SG 16 Sep P. Vande Vyvre CERN-PH On-line Computing M&O LHCC RRB SG 16 Sep 2004 P. Vande Vyvre CERN/PH for 4 LHC DAQ project leaders.
Claudio Grandi INFN Bologna CERN - WLCG Workshop 13 November 2008 CMS - Plan for shutdown and data-taking preparation Claudio Grandi Outline: Global Runs.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
CMS Usage of the Open Science Grid and the US Tier-2 Centers Ajit Mohapatra, University of Wisconsin, Madison (On Behalf of CMS Offline and Computing Projects)
INFSO-RI Enabling Grids for E-sciencE CRAB: a tool for CMS distributed analysis in grid environment Federica Fanzago INFN PADOVA.
Upgrade Software University and INFN Catania Upgrade Software Alessia Tricomi University and INFN Catania CMS Trigger Workshop CERN, 23 July 2009.
Daniele Spiga PerugiaCMS Italia 14 Feb ’07 Napoli1 CRAB status and next evolution Daniele Spiga University & INFN Perugia On behalf of CRAB Team.
1 Andrea Sciabà CERN The commissioning of CMS computing centres in the WLCG Grid ACAT November 2008 Erice, Italy Andrea Sciabà S. Belforte, A.
Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Enabling Grids for E-sciencE CMS/ARDA activity within the CMS distributed system Julia Andreeva, CERN On behalf of ARDA group CHEP06.
8 August 2006MB Report on Status and Progress of SC4 activities 1 MB (Snapshot) Report on Status and Progress of SC4 activities A weekly report is gathered.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Operations: Evolution of the Role of.
SAM Status Update Piotr Nyczyk LCG Management Board CERN, 5 June 2007.
Status of gLite-3.0 deployment and uptake Ian Bird CERN IT LCG-LHCC Referees Meeting 29 th January 2007.
Monitoring the Readiness and Utilization of the Distributed CMS Computing Facilities XVIII International Conference on Computing in High Energy and Nuclear.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
LHCb 2009-Q4 report Q4 report LHCb 2009-Q4 report, PhC2 Activities in 2009-Q4 m Core Software o Stable versions of Gaudi and LCG-AA m Applications.
II EGEE conference Den Haag November, ROC-CIC status in Italy
WLCG November Plan for shutdown and 2009 data-taking Kors Bos.
Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.
CERN IT Department CH-1211 Genève 23 Switzerland t CMS SAM Testing Andrea Sciabà Grid Deployment Board May 14, 2008.
BaBar & Grid Eleonora Luppi for the BaBarGrid Group TB GRID Bologna 15 febbraio 2005.
1-2 March 2006 P. Capiluppi INFN Tier1 for the LHC Experiments: ALICE, ATLAS, CMS, LHCb.
1 M. Paganoni, 17/1/08 Modello di calcolo di CMS M. Paganoni Workshop Storage T2 - 17/01/08.
Claudio Grandi INFN Bologna Workshop congiunto CCR e INFNGrid 13 maggio 2009 Le strategie per l’analisi nell’esperimento CMS Claudio Grandi (INFN Bologna)
Kevin Thaddeus Flood University of Wisconsin
gLite->EMI2/UMD2 transition
DPG Activities DPG Session, ALICE Monthly Mini Week
Summary on PPS-pilot activity on CREAM CE
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
Universita’ di Torino and INFN – Torino
N. De Filippis - LLR-Ecole Polytechnique
The LHCb Computing Data Challenge DC06
Presentation transcript:

Nicola De Filippis CMS Italia, Napoli, Feb p. 1 Produzioni MC ai Tiers CMS nel 2007: prospettive CMS-wide e contributo italiano Università, Politecnico e INFN Bari N. De Filippis M. Abbrescia, G. Cuscela, G. Donvito, G. Maggi, S. My, A. Pierro, A. Pompili, + contribution of developers (Kavka, Fanfani, Codispoti, Bacchi)

Nicola De Filippis CMS Italia, Napoli, Feb p. 2 Outline  Status of CMS Monte Carlo production: organization and current requests  Monte Carlo production in Italy: Activity post –CSA06 Problems with sites Efficiency of italian sites Reliability of sites  CMS plans and milestones for 2007

Nicola De Filippis CMS Italia, Napoli, Feb p. 3  Goal of MC production: to produce events for CMSSW validation (simulation/reconstruction) and physics studies  Small RelVal samples upon a new CMSSW release  PhysVal / HLT groups make requests in form of cfg´s  Experts provide ProdAgent Workflows  Assignment to Production Teams posted on twiki:  Currently 6 teams: LCG(1,2,3,5,6) and OSG  Each team has O(10) dedicated T1/T2 sites’  When done, files merged and injected to PhEDEx  Too many manual steps and too many extra-prod. duties (e.g. monitoring/dealing with sites availability & stability)  A lot of pressure from SDV group ( P. Janot) to produce events ASAP MC production cycle

Nicola De Filippis CMS Italia, Napoli, Feb p. 4  After CSA06: CMSSW_1_1_1 and 1_1_2 used until Xmas  CMSSW_1_2_0 released mid-Dec06  Production with CMSSW_1_2_0 running continously since Dec06  PhysVal requests (10M w/o PU M w PU)  HLT requests (100M w/o PU+ 20M w PU x 2)  HLT + PU in 2 steps GEN-SIM / DIGI-RECO about 20M done, many running, but very tight schedule! some samples: –QCD di-jets (0 < pt-bin< 3.5TeV), w & w/o PU –Excl. W & Z decays, Wjets(0 < pt < 1TeV) w & w/o PU –Inclusive ttbar, … see Current official requests P. Kreuzer

Nicola De Filippis CMS Italia, Napoli, Feb p. 5 PhysVal samples with CMSSW_1_2_0 LCG (3)

Nicola De Filippis CMS Italia, Napoli, Feb p. 6 HLT samples with CMSSW_1_2_0 LCG (3) After120 bulk production over, a few «special» requests will be addressed: – Muon Enriched sample with 121: few hundredK events – Cosmics for Tracker with122: M events

Nicola De Filippis CMS Italia, Napoli, Feb p. 7 On going effort of the OSG, LCG1,2,5,6 Conclusions of P. Kreuzer:  with2 new and efficient production teams on board, remaining120 assignments should be delivered(at least partially) within 10 days.

Nicola De Filippis CMS Italia, Napoli, Feb p. 8 MC production in Italy

Nicola De Filippis CMS Italia, Napoli, Feb p. 9 Post-CSA06 activity (1) Official CSA06 note complete Internal CMS note on CSA06 in italian tiers complete CSA06 analyses completed

Nicola De Filippis CMS Italia, Napoli, Feb p. 10 Post-CSA06 activity (2) Since October 2006 until today the LCG(3) team:  re-started the Monte Carlo production withous stops also during the Xmas break  has increased the number of esperts to run ProdAgent  has exported the monitoring tool developed at Bari also at the other LCG teams  has produced about 15 M events for the studies of Physics validation and HLT with and without PU…..1/3 of the entire production in CMS  has used the European LCG resources with continuity, giving enormous feedback for the problem resolution of remote sites

Nicola De Filippis CMS Italia, Napoli, Feb p. 11 Sites used by the LCG(3) team CERN used intensively before and after Xmas Italian sites English sites Hungary Taiwan IN2P3

Nicola De Filippis CMS Italia, Napoli, Feb p. 12 On going effort of LCG (3) On going GEN-SIM and DIGI-RECO with low luminosity Pileup

Nicola De Filippis CMS Italia, Napoli, Feb p. 13 Issues about ProdAgent  Production setup at Bari:  3 instances of PA running at Bari:  two for FEVT and GEN-SIM production  one for DIGI-RECO production with PU  one machine for on-line dump of the DBs  Monitoring tool exported to other LCG teams with positive feedback.  The submission of jobs is somehow slow (up to 2-3 job/min) due to:  performances of the PA machines which are two years old  overhead of the RBs  no bulk submission  The control of jobs that failed or aborted because of the middleware problems is difficult. Killing jobs of a given production or submitted to a given site was problematic  PA developers provided a script to do this.  LCG(3) will smoothly leave English CEs to LCG (6) (the english team) and IN2P3 to LCG(5) (the belgian team) w.r.t debugging & intensive use.  On the long run: BulkSubmission& Resource Monitor

Nicola De Filippis CMS Italia, Napoli, Feb p. 14  Most of LCG(3) sites had various problems before and during the Xmas break  November: Bari, Pisa, Roma when restarting production, CNAF: problems with castor  English sites and IN2P3 had alternate periods of activity also during last month. Italian sites were really efficient during last month.  Debugging of sites is tipically really painful and requires continous interaction with the site administrators.  Problems:  stage out was the main cause of job failures.  site validation: storage, software tag, software mount points, local copy of PU  grid problems: instabilities of the CE because of high load, overload of RBs which caused:  RB didn´t change status of jobs («Waiting» status forever)  No chance to monitor: FWJobreport and log files lost  Difficult/tedious for prod. teams to kill jobs via BOSS commands  The debugging of sites is not a task to be covered by production teams.  CMS is reacting and preparing centralized tests to ensure the reliability of sites. Problems with sites

Nicola De Filippis CMS Italia, Napoli, Feb p. 15 Efficiency of the italian sites (last month): CNAF No PU CE replaced Except for few days CNAF worked very well to ensure high efficiency of the CMS production during last month

Nicola De Filippis CMS Italia, Napoli, Feb p. 16 CPU hours and the percentage % of Tier-1 resources used by CMS: Month-week | CPU hr | % jan  21 jan : 33.4% 22 jan  28 jan : 19.0% 29 jan  4 feb : 24.8% 5 feb  11 feb : 22.4% Statistics of use of CNAF (last month) The percentage of use depends on the fairshare setup at CNAF Successful jobs Queues always full of jobs, CMS at maximum of use at CNAF.

Nicola De Filippis CMS Italia, Napoli, Feb p. 17 Efficiency of the italian sites (last month): INFN Except for limited problems with the storage at Bari, Pisa and Rome all the Italian tier-2 like sites worked very well during last month.

Nicola De Filippis CMS Italia, Napoli, Feb p. 18 Statistics from dashboard

Nicola De Filippis CMS Italia, Napoli, Feb p. 19 Reliability of sites: tests 1)Submit a small processing job for each advertised CMSSW release at a site. This job checks:  Job can be submitted to site  Local stage out can be done  report can be made back via grid middleware  10 event Minimum Bias?  test frontier access as well? 2)Following completion of the test job, submit a read back job:  verifies job submission  checks data access  clean up file to test cleanup procedure 3)Check global DBS datasets at site:  check read access to all fileblocks at site  report back bad files and invalidate in DBS  perhaps randomly select a dataset to test every day/week etc. Following the feedback of problems found by production operators CMS is defining centralized tests to be run every given time to certify sites for production and analysis. The ideas are:

Nicola De Filippis CMS Italia, Napoli, Feb p. 20 Reliability of sites: SAM tests SAM (Service Availibility Monitoring) Hopefully the human resources needed for MC production are expected to decrease so less production teams submitting jobs to any sites

Nicola De Filippis CMS Italia, Napoli, Feb p. 21 Plans for MC production in 2007

Nicola De Filippis CMS Italia, Napoli, Feb p. 22  Finalize 120 Production (aim for mid-Feb!)  Expecting small 12x requests (RelVal, Muon-enrichedHLT, …)  130 Release (all HLT components) end Feb07  130 HLT Production in Mar07  In parallel, Alpgen Integration in Production  Timescale: integrate till Mar07 + test samples, PH prod. Apr-May07  140 Release (new geo) end Mar07  140 Physics production Apr-May07 (30M / month)  150 Release mid-May07 with improved reco algorithms(re-RECO)  Launch CSA07 with16x end-July07 To be defined the contribution of Italy to the previous activities and the manpower. In addition the CSA07 during summer could be a real problem milestones

Nicola De Filippis CMS Italia, Napoli, Feb p. 23 Conclusions Monte Carlo production of LCG(3) team run continuosly since the end of CSA06 until now About 15M of events produced (1/3 of the overall CMS productio) Italian sites are working very well during last month to unsure high efficiency production. Warning: keep high the attention to Italian Tiers, mainly at CNAF Effective interaction between operators and developers of PA The load of production operators should decrease as soon as (possible) the centralized SAM tests will run to certify sites for production. The Italian contribution to the activities in preparation and for CSA07 has to be discussed.