Download presentation
Presentation is loading. Please wait.
Published byMeagan Wells Modified over 8 years ago
1
LHCb 2009-Q4 report
2
2009 Q4 report LHCb 2009-Q4 report, PhC2 Activities in 2009-Q4 m Core Software o Stable versions of Gaudi and LCG-AA m Applications o Stable as of September for real data o Fast minor releases to cope with reality of life… m Monte-Carlo o Some MC09 channel simulation o Few events in foreseen 2009 configuration o Minimum bias MC09 stripping m Real data reconstruction o As of November 20 th
3
2009 Q4 report Jobs in 2009-Q4 LHCb 2009-Q4 report, PhC3
4
2009 Q4 report First experience with real data m Very low crossing rate o Maximum 8 bunches colliding (88 kHz crossing) o Very low luminosity o Minimum bias trigger rate: from 0.1 to 10 Hz o Data taken with single beam and with collisions LHCb 2009-Q4 report, PhC4
5
2009 Q4 report Real data processing m Iterative process o Small changes in reconstruction application o Improved alignment o In total 5 sets of processing conditions P Only last files were all processed twice m Processing submission o Automatic job creation and submission after: P File is successfully migrated in Castor P File is successfully replicated at Tier1 o If job fails for a reason other than application crash P The file is reset as “to be processed” P New job is created / submitted o Processing more efficient at CERN (see later) P Eventually after few trials at Tier1, the file is processed at CERN o No stripping ;-) P DST files distributed to all Tier1s for analysis LHCb 2009-Q4 report, PhC5
6
2009 Q4 report Reconstruction jobs LHCb 2009-Q4 report, PhC6
7
2009 Q4 report Issues with real data m Castor migration o Very low rate: had to change the migration algorithm for more frequent migration m Issue with large files (above 2 GB) o Real data files are not ROOT files but open by ROOT o There was an issue with a compatibility library for slc4-32 bit on slc5 nodes P Fixed within a day m Wrong magnetic field sign o Due to different coordinate systems for LHCb and LHC ;-) o Fixed within hours m Data access problem (by protocol, directly from server) o Still dCache issue at IN2P3 and NIKHEF P dCache experts working on it o Moved to copy mode paradigm for reconstruction o Still a problem for user jobs P Sites have been banned for analysis LHCb 2009-Q4 report, PhC7
8
2009 Q4 report Transfers and job latency m No problem observed during file transfers o Files randomly distributed to Tier1 o Will move to distribution by runs (few 100’s files) o For 2009, runs were not longer than 4-5 files! m Very good Grid latency o Time between submission and jobs starting running LHCb 2009-Q4 report, PhC8
9
2009 Q4 report Brief digression on LHCb position w.r.t. MUPJs m So-called Multi-User Pilot Jobs (MUPJ) are used by DIRAC on all sites that accept role=Pilot o They are just regular jobs! m MUPJs match any Dirac job in the central queue o Production or User analysis (single queue) o Each PJ can execute sequentially up to 5 jobs P If remaining capabilities allow (e.g. CPU time left) P MUPJ has 5 tokens for matching jobs o role=Pilot proxy can only retrieve jobs, limited to 5 m A limited user proxy is used for DM operations of the payload o Cannot be used for job submission m Proxies can be hidden when not needed m DIRAC is instrumented for using gLexec (in any mode) m First experience o Problems are not with gLexec but with SCAS configuration LHCb 2009-Q4 report, PhC9
10
2009 Q4 report MUPJs (cont’d) m LHCb is not willing to loose efficiency due to the introduction of badly configured gLexec o Yet another point of failure! o Cannot afford testing individually all sites at once m This topic has been lasting over 3 years now o Where is the emergency? Why did it take so long if so important? m Propose to reconsider pragmatically the policy o They were defined when the frameworks had not been evaluated, and VOs had to swallow the bullet o Re-assessing the risks was not really done in the TF (yet) o Questionnaire leaves decisions to sites P We got no message that sites are unhappy with current situation o MUPJs are just jobs for which their owner is responsible P Move responsibility to MUPJ owner (“the VO”) P Two tier trust relation ( Sites / VO / User ) P Apply a posteriori control and not a priori mistrust o VOs should assess the risk on their side LHCb 2009-Q4 report, PhC10
11
2009 Q4 report Conclusions m Concentrating on real data o Very few data (200 GB)! o Very important learning exercise o A few improvements identified for the 2010 running P Run distribution (rather than files) P Conditions DB synchronization check d Make sure Online Conditions are up-to-date m Still some MC productions o With feedback from first real data P E.g. final position of the VeLo (15 mm from beam) m Analysis o First analysis of 2009 data made on the Grid o Foresee a stripping phase for V 0 physics publications m LHCb definitely wants to continue using MUPJs! LHCb 2009-Q4 report, PhC11
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.