Stefano Belforte INFN Trieste 1 Middleware February 14, 2007 Resource Broker, gLite etc. CMS vs. middleware.

Slides:



Advertisements
Similar presentations
1 14 Feb 2007 CMS Italia – Napoli A. Fanfani Univ. Bologna A. Fanfani University of Bologna MC Production System & DM catalogue.
Advertisements

Applications Area Issues RWL Jones GridPP13 – 5 th June 2005.
CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
CMS Applications Towards Requirements for Data Processing and Analysis on the Open Science Grid Greg Graham FNAL CD/CMS for OSG Deployment 16-Dec-2004.
A tool to enable CMS Distributed Analysis
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
Client/Server Grid applications to manage complex workflows Filippo Spiga* on behalf of CRAB development team * INFN Milano Bicocca (IT)
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
Stefano Belforte INFN Trieste 1 CMS SC4 etc. July 5, 2006 CMS Service Challenge 4 and beyond.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
LHCC Comprehensive Review – September WLCG Commissioning Schedule Still an ambitious programme ahead Still an ambitious programme ahead Timely testing.
Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.
OSG Services at Tier2 Centers Rob Gardner University of Chicago WLCG Tier2 Workshop CERN June 12-14, 2006.
Claudio Grandi INFN Bologna CMS Operations Update Ian Fisk, Claudio Grandi 1.
CERN IT Department CH-1211 Genève 23 Switzerland t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting.
SRM 2.2: status of the implementations and GSSD 6 th March 2007 Flavia Donno, Maarten Litmaath INFN and IT/GD, CERN.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Experience with the gLite Workload Management.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
Interactive Job Monitor: CafMon kill CafMon tail CafMon dir CafMon log CafMon top CafMon ps LcgCAF: CDF submission portal to LCG resources Francesco Delli.
2 Sep Experience and tools for Site Commissioning.
F. Fassi, S. Cabrera, R. Vives, S. González de la Hoz, Á. Fernández, J. Sánchez, L. March, J. Salt, A. Lamas IFIC-CSIC-UV, Valencia, Spain Third EELA conference,
Glite/ARC interoperability Christian Ulrik Søttrup EGEE-II/(KnowARC)
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
David Adams ATLAS ADA, ARDA and PPDG David Adams BNL June 28, 2004 PPDG Collaboration Meeting Williams Bay, Wisconsin.
Maarten Litmaath (CERN), GDB meeting, CERN, 2006/02/08 VOMS deployment Extent of VOMS usage in LCG-2 –Node types gLite 3.0 Issues Conclusions.
MW Readiness Verification Status Andrea Manzi IT/SDC 21/01/ /01/15 2.
CERN Using the SAM framework for the CMS specific tests Andrea Sciabà System Analysis WG Meeting 15 November, 2007.
GLite – An Outsider’s View Stephen Burke RAL. January 31 st 2005gLite overview Introduction A personal view of the current situation –Asked to be provocative!
Stefano Belforte INFN Trieste 1 CMS Simulation at Tier2 June 12, 2006 Simulation (Monte Carlo) Production for CMS Stefano Belforte WLCG-Tier2 workshop.
1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The usage of the gLite Workload Management.
Karsten Köneke October 22 nd 2007 Ganga User Experience 1/9 Outline: Introduction What are we trying to do? Problems What are the problems? Conclusions.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Priorities update Andrea Sciabà IT/GS Ulrich Schwickerath IT/FIO.
Use of the gLite-WMS in CMS for production and analysis Giuseppe Codispoti On behalf of the CMS Offline and Computing.
The CMS Top 5 Issues/Concerns wrt. WLCG services WLCG-MB April 3, 2007 Matthias Kasemann CERN/DESY.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
CMS Usage of the Open Science Grid and the US Tier-2 Centers Ajit Mohapatra, University of Wisconsin, Madison (On Behalf of CMS Offline and Computing Projects)
CERN IT Department CH-1211 Geneva 23 Switzerland t A proposal for improving Job Reliability Monitoring GDB 2 nd April 2008.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CRAB: the CMS tool to allow data analysis.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
1 Andrea Sciabà CERN The commissioning of CMS computing centres in the WLCG Grid ACAT November 2008 Erice, Italy Andrea Sciabà S. Belforte, A.
The ATLAS Strategy for Distributed Analysis on several Grid Infrastructures D. Liko, IT/PSS for the ATLAS Distributed Analysis Community.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Criteria for Deploying gLite WMS and CE Ian Bird CERN IT LCG MB 6 th March 2007.
ATLAS Distributed Analysis Dietrich Liko IT/GD. Overview  Some problems trying to analyze Rome data on the grid Basics Metadata Data  Activities AMI.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
Enabling Grids for E-sciencE CMS/ARDA activity within the CMS distributed system Julia Andreeva, CERN On behalf of ARDA group CHEP06.
Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
Stefano Belforte INFN Trieste 1 EGEE OSG Interoperability March 14, 2007 EGEE/OSG interoperability.
WLCG Operations Coordination report Maria Alandes, Andrea Sciabà IT-SDC On behalf of the WLCG Operations Coordination team GDB 9 th April 2014.
CERN Certification & Testing LCG Certification & Testing Team (C&T Team) Marco Serra - CERN / INFN Zdenek Sekera - CERN.
SAM Status Update Piotr Nyczyk LCG Management Board CERN, 5 June 2007.
Status of gLite-3.0 deployment and uptake Ian Bird CERN IT LCG-LHCC Referees Meeting 29 th January 2007.
Monitoring the Readiness and Utilization of the Distributed CMS Computing Facilities XVIII International Conference on Computing in High Energy and Nuclear.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
ALICE Physics Data Challenge ’05 and LCG Service Challenge 3 Latchezar Betev / ALICE Geneva, 6 April 2005 LCG Storage Management Workshop.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Management Claudio Grandi.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.
VO Box discussion ATLAS NIKHEF January, 2006 Miguel Branco -
Open Science Grid Consortium Storage on Open Science Grid Placing, Using and Retrieving Data on OSG Resources Abhishek Singh Rana OSG Users Meeting July.
CERN IT Department CH-1211 Genève 23 Switzerland t CMS SAM Testing Andrea Sciabà Grid Deployment Board May 14, 2008.
1-2 March 2006 P. Capiluppi INFN Tier1 for the LHC Experiments: ALICE, ATLAS, CMS, LHCb.
CREAM Status and Plans Massimo Sgaravatto – INFN Padova
Short update on the latest gLite status
Quality Control in the dCache team.
Analysis Operations Monitoring Requirements Stefano Belforte
Presentation transcript:

Stefano Belforte INFN Trieste 1 Middleware February 14, 2007 Resource Broker, gLite etc. CMS vs. middleware

Stefano Belforte INFN Trieste February 14, 2007 Middleware 2 Middleware for CMS Event Data:  Catalogs are CMS-made: need to be tailored to experiment Non-Event Data:  Access outside CERN via http + standard web caches (Squid) Data transfer  Middleware provides the storage: SRM v2.2  Middleware provides File Transfer Service  CMS moves datasets on top of that: PhEDEx Running jobs  Middleware provides remote job submission  LCG RB, gLite WMS, Condor-G  CMS embeds that into CMS users workflows  CRAB, CRAB Analysis Server, ProductionAgent Resource sharing (job priorities and all that)  In the near future: managed at the sites

Stefano Belforte INFN Trieste February 14, 2007 Middleware 3 Issues for 2007 Data Management: SRM v2  site interoperability  better control at Tier1 of disk/tape, pin/unpin  New FTS and some changes needed in PhEDEx Job Priorities: only configuration/deployment issue  Have asked 3 “service classes” at all sites  software manager: express queue  production: up to 50% resources  normal users: all the rest, fair share based, static mapping will help Job Submission: still a big issues  LCG RB is slow (~one job/minute)  LCG RB chokes at ~5K jobs/day vs. 200K/day target for 2008  gLite WMS : much promised still not production after 2 years  Condor-G: fast and basic (too basic ?)  Will the CE be the next bottleneck ?

Stefano Belforte INFN Trieste February 14, 2007 Middleware 4 CMS plans For 2007 middleware integration and test for CMS is tackled within the Computing Commissioning sub-project (i.e. S.B.) Work on the current issues (especially scaling up the Job Submission tools) will be tackled jointly with OSG collaborators Means everybody checks their own tools, but we compare, possibly using same test suite, and will jointly pick the best solution for each use case A workplan for the next 6 months has been outlined  CMS-Italy and INFN have responsibility for testing the gLite tools  gLite WMS  gLite CE  CREAM CE (the next all-italian computing element)

Stefano Belforte INFN Trieste February 14, 2007 Middleware 5 From Computing Commissioning Plan SRM v2.2  Make sure CMS can use new SRM’s gLite 3.x  New WMS, new gLite CE  gLite3.1 single job (CMS), bulk submission (ATLAS)  Better error reporting in UI (important for dashboard) OSG  Stress test of various job submission tools  Stress test of current and future OSG CE’s  Stress test of dCache Job priorities  Verify that is consistently deployed and works Interoperability  Keep OSG and EGEE interoperating  Integrate NGDF aka NordUgrid  Condor-G submission to work for EGEE sites

Stefano Belforte INFN Trieste February 14, 2007 Middleware 6 Work program on gLite WMS gLite WMS to replace LCG-RB for single job submission  Better scalability, faster submission, additional features  Tested already to 1~2K jobs/day continously, 5K for short times  Work by EIS team (Andrea Sciaba’ and Enzo Miccio)  Time to use it with Production gLite WMS for bulk submission: higher performance  Stress test until April by EIS team  Already available in CRAB (but not advised for general users)  Work in progress to integrate in Production Agent  Carlos, Ale, Giuseppe, William gLite CE  EIS team to add them to test suite, easy  Expect better reliability and error reporting  Work for March,April, May Cream CE  Use same test suite, easy to add, have to see how it works  From April, onward

Stefano Belforte INFN Trieste February 14, 2007 Middleware 7 Status of gLite WMS Bulk submission from UI to WMS is fast Problem so far is that WMS dies under its own load  Could make 20K jobs/day, but not day after day  Not as simple as “reboot it”. Need specific actions (kill processes, restart processes, clean hung jobs, clean logs) every day or so. Not viable for production  Current “production” version gLite 3.0: no way Crash effort started since last fall on gLite 3.1  One machine at CERN under stress by Atlas (same pattern as CMS, using Andrea Scaba’s test suite)  Enormous work and progress by developers in last months, many components improved, including new Condor versions, processes that teminates themselves after some time. Tons of new patches As of last week it submits ~15K jobs/day using bulk submission continously (5 days in a row by now) More robustness expected after rewriting one critical piece to avoid Condor DAGMAN (work by F.Giacomini, almost finished)

Stefano Belforte INFN Trieste February 14, 2007 Middleware 8 Conclusion Resource Broker is not what one would like yet Still it may be almost there  Future is in the hands of CMS-Italy (and ATLAS-Italy) Keeping the Grid filled from a few submission points (lxplus, a few ProductionAgents) will be a daunting task anyhow One hammer does not fit all screws  Do not be surprised if in the end different submission tools will better serve different use cases  CRAB and Production Tools developers will make that transparent to users Do not panic at Grid cryptic error messages, we will analyse the data