LHCb status and plans Ph.Charpentier CERN. LHCb status and plans WLCG Workshop 1-2 Sept 2007, Victoria, BC 2 Ph.C. Status of DC06  Reminder:  Two-fold.

Slides:



Advertisements
Similar presentations
LCG Tiziana Ferrari - SC3: INFN installation status report 1 Service Challenge Phase 3: Status report Tiziana Ferrari on behalf of the INFN SC team INFN.
Advertisements

Storage Issues: the experiments’ perspective Flavia Donno CERN/IT WLCG Grid Deployment Board, CERN 9 September 2008.
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
LHCb Quarterly Report October Core Software (Gaudi) m Stable version was ready for 2008 data taking o Gaudi based on latest LCG 55a o Applications.
LHCC Comprehensive Review – September WLCG Commissioning Schedule Still an ambitious programme ahead Still an ambitious programme ahead Timely testing.
Status of WLCG Tier-0 Maite Barroso, CERN-IT With input from T0 service managers Grid Deployment Board 9 April Apr-2014 Maite Barroso Lopez (at)
CERN IT Department CH-1211 Genève 23 Switzerland t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
Status report on SRM v2.2 implementations: results of first stress tests 2 th July 2007 Flavia Donno CERN, IT/GD.
SRM 2.2: tests and site deployment 30 th January 2007 Flavia Donno, Maarten Litmaath IT/GD, CERN.
SRM 2.2: status of the implementations and GSSD 6 th March 2007 Flavia Donno, Maarten Litmaath INFN and IT/GD, CERN.
CCRC08-1 report WLCG Workshop, April KorsBos, ATLAS/NIKHEF/CERN.
LCG Plans for Chrsitmas Shutdown John Gordon, STFC-RAL GDB December 10 th, 2008.
Status of the production and news about Nagios ALICE TF Meeting 22/07/2010.
Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
1 LCG-France sites contribution to the LHC activities in 2007 A.Tsaregorodtsev, CPPM, Marseille 14 January 2008, LCG-France Direction.
CERN Using the SAM framework for the CMS specific tests Andrea Sciabà System Analysis WG Meeting 15 November, 2007.
LHCb The LHCb Data Management System Philippe Charpentier CERN On behalf of the LHCb Collaboration.
Author: Andrew C. Smith Abstract: LHCb's participation in LCG's Service Challenge 3 involves testing the bulk data transfer infrastructure developed to.
1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008.
1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.
INFSO-RI Enabling Grids for E-sciencE Enabling Grids for E-sciencE Pre-GDB Storage Classes summary of discussions Flavia Donno Pre-GDB.
1 LHCb File Transfer framework N. Brook, Ph. Charpentier, A.Tsaregorodtsev LCG Storage Management Workshop, 6 April 2005, CERN.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
Jens G Jensen RAL, EDG WP5 Storage Element Overview DataGrid Project Conference Heidelberg, 26 Sep-01 Oct 2003.
CHEP 2006, February 2006, Mumbai 1 LHCb use of batch systems A.Tsaregorodtsev, CPPM, Marseille HEPiX 2006, 4 April 2006, Rome.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
T1 status Input for LHCb- NCB 9 th November 2009.
WLCG Service Report ~~~ WLCG Management Board, 7 th September 2010 Updated 8 th September
David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop.
The CMS Top 5 Issues/Concerns wrt. WLCG services WLCG-MB April 3, 2007 Matthias Kasemann CERN/DESY.
Plans for Service Challenge 3 Ian Bird LHCC Referees Meeting 27 th June 2005.
The GridPP DIRAC project DIRAC for non-LHC communities.
Service Availability Monitor tests for ATLAS Current Status Tests in development To Do Alessandro Di Girolamo CERN IT/PSS-ED.
WLCG LHCC mini-review LHCb Summary. Outline m Activities in 2008: summary m Status of DIRAC m Activities in 2009: outlook m Resources in PhC2.
LHCb report to LHCC and C-RSG Philippe Charpentier CERN on behalf of LHCb.
DIRAC Project A.Tsaregorodtsev (CPPM) on behalf of the LHCb DIRAC team A Community Grid Solution The DIRAC (Distributed Infrastructure with Remote Agent.
Handling of T1D0 in CCRC’08 Tier-0 data handling Tier-1 data handling Experiment data handling Reprocessing Recalling files from tape Tier-0 data handling,
LHCbDirac and Core Software. LHCbDirac and Core SW Core Software workshop, PhC2 Running Gaudi Applications on the Grid m Application deployment o CVMFS.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
GDB meeting - July’06 1 LHCb Activity oProblems in production oDC06 plans & resource requirements oPreparation for DC06 oLCG communications.
1 DIRAC agents A.Tsaregorodtsev, CPPM, Marseille ARDA Workshop, 7 March 2005, CERN.
Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.
8 August 2006MB Report on Status and Progress of SC4 activities 1 MB (Snapshot) Report on Status and Progress of SC4 activities A weekly report is gathered.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
Grid Deployment Board 5 December 2007 GSSD Status Report Flavia Donno CERN/IT-GD.
LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.
The GridPP DIRAC project DIRAC for non-LHC communities.
WLCG Operations Coordination report Maria Alandes, Andrea Sciabà IT-SDC On behalf of the WLCG Operations Coordination team GDB 9 th April 2014.
SAM Status Update Piotr Nyczyk LCG Management Board CERN, 5 June 2007.
LHCb Computing activities Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
LHCb 2009-Q4 report Q4 report LHCb 2009-Q4 report, PhC2 Activities in 2009-Q4 m Core Software o Stable versions of Gaudi and LCG-AA m Applications.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
VO Box discussion ATLAS NIKHEF January, 2006 Miguel Branco -
Dissemination and User Feedback Castor deployment team Castor Readiness Review – June 2006.
LHCb Status report June 08. LHCb Computing Report Activities since February  Applications and Core Software  Preparation of applications for real data.
Jean-Philippe Baud, IT-GD, CERN November 2007
L’analisi in LHCb Angelo Carbone INFN Bologna
LCG Service Challenge: Planning and Milestones
Database Readiness Workshop Intro & Goals
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group
R. Graciani for LHCb Mumbay, Feb 2006
LHCb status and plans Ph.Charpentier CERN.
The LHCb Computing Data Challenge DC06
Presentation transcript:

LHCb status and plans Ph.Charpentier CERN

LHCb status and plans WLCG Workshop 1-2 Sept 2007, Victoria, BC 2 Ph.C. Status of DC06  Reminder:  Two-fold goal: produce and reconstruct useful data, exercise the LHCb Computing model, DIRAC and ganga  To be tested:  Software distribution  Job submission and data upload (simulation: no input data)  Data export from CERN (FTS) using MC raw data (DC06-SC4)  Job submission with input data (reconstruction and re-reconstruction)  For staged and non-staged files  Data distribution (DSTs to Tier1s T0D1 storage)  Batch analysis on the Grid (data analysis and standalone SW)  Datasets deletion  LHCb Grid community solution  DIRAC (WMS, DMS, production system)  ganga (for analysis jobs)

LHCb status and plans WLCG Workshop 1-2 Sept 2007, Victoria, BC 3 Ph.C. DC06 phases  Summer 2006  Data production on all sites  Background events (~100 Mevts b-inclusive and 300 Mevts minimum bias), all MC raw files uploaded to CERN  Autumn 2006  MC raw files transfers to Tier1s, registration in the DIRAC processing database  As part of SC4, using FTS  Ran smoothly (when SEs were up and running, never 7 at once)  Fake reconstruction for some files (software not finally tuned)  December 2006 onwards  Simulation, digitisation and reconstruction  Signal events (200 Mevts)  DSTs uploaded to Tier1 SEs  Originally to all 7 Tiers, then to CERN+2

LHCb status and plans WLCG Workshop 1-2 Sept 2007, Victoria, BC 4 Ph.C. DC06 phases (cont’d)  February 2007 onwards  Background events reconstruction at Tier1s  Uses 20 MC raw files as input  were no longer on cache, hence had to be recalled from tape  output rDST uploaded locally to Tier1  June 2007 onwards  Background events stripping at Tier1s  Uses 2 rDST as input  Accesses the 40 corresponding MC raw files for full reconstruction of selected events  DST distributed to Tier1s  Originally 7 Tier1s, then CERN+2  need to clean up datasets from sites to free space

LHCb status and plans WLCG Workshop 1-2 Sept 2007, Victoria, BC 5 Ph.C. Software distribution  Performed by LHCb SAM jobs  See Joël Closier’s poster at CHEP  Problems encountered  Reliability of shared area: scalability of NFS?  Access permissions (lhcbsgm)  Move to pool accounts…  Important: beware of access permissions when changing acounts mapping at sites!!!  moving to pool accounts was a nightmare

LHCb status and plans WLCG Workshop 1-2 Sept 2007, Victoria, BC 6 Ph.C. Simulation jobs  Up to 10,000 jobs running simultaneously  Continuous requests from physics teams  Problems encountered  SE unavailability for output data upload  Implemented a fail-over mechanism in the DIRAC DMS  Final data transfer filed in one of the VOBOXes  Had to develop multithreaded transfer agent  too large backlog of transfers  Had to develop an lcg-cp able to transfer to SURL  Request to support SURL in lcg-cp  Took 10 months to be in production (2 weeks to implement)  Handling of full disk SEs  Handled by VOBOXes  Cleaning SEs: painful as no SRM tool (mail to SE admin)

LHCb status and plans WLCG Workshop 1-2 Sept 2007, Victoria, BC 7 Ph.C. Reconstruction jobs  Needs files to be staged  Easy for first prompt processing, painful for reprocessing  Developed a DIRAC stager agent  Jobs are put in the central queue only when files are staged  File access problems  Inconsistencies between SRM tURLs and root access  unreliability of rfio, problems with rootd protocol authentication on the Grid (now fixed by ROOT)  Impossible of copy input data locally (not enough disk guaranteed)  lcg-gt returning a tURL on dCache but not staging files  Workaround with dccp, then fixed by dCache

LHCb status and plans WLCG Workshop 1-2 Sept 2007, Victoria, BC 8 Ph.C. What is still missing?  gLite WMS  Many attempts at using it, not very successful  Still not used in production (not released as such…)  Full VOMS support  Many problems of mapping when using VOMS  Was working, had to move back to plain proxies due to dCache problems  No castor proper authentication (i.e. no security for files)  SRM v2.2  See plans later, ongoing tests  Agreement and support for generic pilot jobs  Essential for good optimisation at Tier1s  Prioritisation of activities (simulation, reconstruction, analysis)

LHCb status and plans WLCG Workshop 1-2 Sept 2007, Victoria, BC 9 Ph.C. Plans and outlook  Re-processing of background  Just restarted (software fault found) : 6,000 jobs  20 files as input per job  Stripping will follow: 3,000 jobs  42 files as input per job  SRM v2.2 tests  Ongoing, many issues found and fixed  Very collaborative work with GD  Difficult to get space tokens and corresponding pools properly configured  Analysis  Rapidly growing (batch data analysis, ROOT scripts for fits, toy MC)

LHCb status and plans WLCG Workshop 1-2 Sept 2007, Victoria, BC 10 Ph.C. Plans (cont’d)  Conditions DB test  Deployed and 3D streaming working at all Tier1s  Stress tests starting (Bologna)  Usage in production during Autumn  LFC replication  Requested at all Tier1s  Oracle backend, 3D streaming  In production for over 6 months at CNAF  Dress rehearsals  Assuming it means producing data at Tier0, shipping to Tier1s and processing there…  Pit - Tier0: ongoing  Autumn: include Tier1 distribution and reconstruction  LHCb welcomes a concurrent DR in Spring 08

LHCb status and plans WLCG Workshop 1-2 Sept 2007, Victoria, BC 11 Ph.C. Storage Resources  Main problem encountered is with Disk1TapeX storage  3 out of 7 sites didn’t provide what had been requested  Continuously change distribution plans  Need to clean up datasets to get space (painful with SRM v1)  Not efficient to add servers one by one  When all servers are full, puts a very large load on the new server  Not easy to monitor the storage usage  Too many instabilities in SEs  Full time job checking availability  Enabling/disabling SEs in the DMS  VOBOX helps but needs guidance to avoid DoS  Several plans for SE migration  RAL, PIC, CNAF, SARA (to NIKHEF): to be clarified

LHCb status and plans WLCG Workshop 1-2 Sept 2007, Victoria, BC 12 Ph.C. Generic pilots  LHCb happy with the proposed agreement from JSPG (EDMS )  Eager to see it endorsed by all Tier1s  Essential as LHCb run concurrent activities at Tier1’s  DIRAC prepared for running its payload through a glexec-compatible mechanism  Wait for sites to deploy the one they prefer

LHCb status and plans WLCG Workshop 1-2 Sept 2007, Victoria, BC 13 Ph.C. Middleware deployment cycle  Problem of knowing “what runs where”  Reporting problems that was fixed long ago  but either were not released or not deployed  Attempt at getting the client MW from LCG-AA  very promising solution  very collaborative attitude from GD  versions for all available platforms installed as soon as ready  allows testing on LXPLUS and on production WNs  tarball shipped with DIRAC and environment set using CMT  not yet in full production mode, but very promising  allows full control of versions  possible to report precisely to developers  no way to know which version runs by default on a WN

LHCb status and plans WLCG Workshop 1-2 Sept 2007, Victoria, BC 14 Ph.C. LHCb and PPS  Very impractical to test client MW on PPS  completely different setup for DIRAC  hard to verify all use cases (e.g. file access)  Was used for testing some services  e.g. gLite WMS  but easier to get an LHCb instance of the service  known to the production BDII  possibility to use or not depending on reliability  sees all production resources  caveat: should not break e.g. production CEs  but expected to be beyond that level of testing…  PPS uses a lot of resources in GD  worth discussing with experiments how to test MW

LHCb status and plans WLCG Workshop 1-2 Sept 2007, Victoria, BC 15 Ph.C. Monitoring & availability  Essential to test sites permanently  See J.Closier’s poster at CHEP  Use the SAM framework  check availability of CEs open to LHCb  install LHCb and LCG-AA software  platform dependent  reports to the SAM database  LHCb would like to report the availability as they see it  no point claiming a site is available just for the ops VO  Faulty sites are “banned” from the DIRAC submission  Faulty SEs or full disk-SEs can also be “banned” from the DMS (as source and/or destination)

LHCb status and plans WLCG Workshop 1-2 Sept 2007, Victoria, BC 16 Ph.C. Conclusions  LHCb using WLCG/EGEE infrastructure successfully  Eagerly waiting for generic pilots general scheme  Still many issues to iron out (mainly DM)  SE reliability, scalability and availability  Data access  SRM v2.2  SE migration at many sites  Trying to improve certification and usage of middleware  LCG-AA deployment, production preview instances  Plans to mainly continue regular activities  Move from “challenge mode” to “steady mode”