DØ Computing Model & Monte Carlo & Data Reprocessing Gavin Davies Imperial College London DOSAR Workshop, Sao Paulo, September 2005.

Slides:



Advertisements
Similar presentations
GridPP July 2003Stefan StonjekSlide 1 SAM middleware components Stefan Stonjek University of Oxford 7 th GridPP Meeting 02 nd July 2003 Oxford.
Advertisements

B A B AR and the GRID Roger Barlow for Fergus Wilson GridPP 13 5 th July 2005, Durham.
Amber Boehnlein, FNAL D0 Computing Model and Plans Amber Boehnlein D0 Financial Committee November 18, 2002.
6/2/2015 Michael Diesburg HCP Distributed Computing at the Tevatron D0 Computing and Event Model Michael Diesburg, Fermilab For the D0 Collaboration.
Reconstruction and Analysis on Demand: A Success Story Christopher D. Jones Cornell University, USA.
Oxford Jan 2005 RAL Computing 1 RAL Computing Implementing the computing model: SAM and the Grid Nick West.
JIM Deployment for the CDF Experiment M. Burgon-Lyon 1, A. Baranowski 2, V. Bartsch 3,S. Belforte 4, G. Garzoglio 2, R. Herber 2, R. Illingworth 2, R.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
F Run II Experiments and the Grid Amber Boehnlein Fermilab September 16, 2005.
L3 Filtering: status and plans D  Computing Review Meeting: 9 th May 2002 Terry Wyatt, on behalf of the L3 Algorithms group. For more details of current.
Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.
High Energy Physics At OSCER A User Perspective OU Supercomputing Symposium 2003 Joel Snow, Langston U.
S. Veseli - SAM Project Status SAMGrid Developments – Part I Siniša Veseli CD/D0CA.
The SAMGrid Data Handling System Outline:  What Is SAMGrid?  Use Cases for SAMGrid in Run II Experiments  Current Operational Load  Stress Testing.
Remote Production and Regional Analysis Centers Iain Bertram 24 May 2002 Draft 1 Lancaster University.
3 Sept 2001F HARRIS CHEP, Beijing 1 Moving the LHCb Monte Carlo production system to the GRID D.Galli,U.Marconi,V.Vagnoni INFN Bologna N Brook Bristol.
Grid Job and Information Management (JIM) for D0 and CDF Gabriele Garzoglio for the JIM Team.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
November 7, 2001Dutch Datagrid SARA 1 DØ Monte Carlo Challenge A HEP Application.
Building a distributed software environment for CDF within the ESLEA framework V. Bartsch, M. Lancaster University College London.
D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
CHEP'07 September D0 data reprocessing on OSG Authors Andrew Baranovski (Fermilab) for B. Abbot, M. Diesburg, G. Garzoglio, T. Kurca, P. Mhashilkar.
Offline Coordinators  CMSSW_7_1_0 release: 17 June 2014  Usage:  Generation and Simulation samples for run 2 startup  Limited digitization and reconstruction.
1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
Jan. 17, 2002DØRAM Proposal DØRACE Meeting, Jae Yu 1 Proposal for a DØ Remote Analysis Model (DØRAM) IntroductionIntroduction Remote Analysis Station ArchitectureRemote.
DØ RAC Working Group Report Progress Definition of an RAC Services provided by an RAC Requirements of RAC Pilot RAC program Open Issues DØRACE Meeting.
SAMGrid as a Stakeholder of FermiGrid Valeria Bartsch Computing Division Fermilab.
DOSAR Workshop, Sao Paulo, Brazil, September 16-17, 2005 LCG Tier 2 and DOSAR Pat Skubic OU.
DOSAR Workshop at Sao Paulo Dick Greenwood What’s Next for DOSAR? Dick Greenwood Louisiana Tech University 1 st DOSAR Workshop at the Sao Paulo, Brazil.
ATLAS and GridPP GridPP Collaboration Meeting, Edinburgh, 5 th November 2001 RWL Jones, Lancaster University.
GridPP18 Glasgow Mar 07 DØ – SAMGrid Where’ve we come from, and where are we going? Evolution of a ‘long’ established plan Gavin Davies Imperial College.
Stuart Wakefield Imperial College London Evolution of BOSS, a tool for job submission and tracking W. Bacchi, G. Codispoti, C. Grandi, INFN Bologna D.
Status of UTA IAC + RAC Jae Yu 3 rd DØSAR Workshop Apr. 7 – 9, 2004 Louisiana Tech. University.
Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.
22 nd September 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
CERN Physics Database Services and Plans Maria Girone, CERN-IT
16 September GridPP 5 th Collaboration Meeting D0&CDF SAM and The Grid Act I: Grid, Sam and Run II Rick St. Denis – Glasgow University Act II: Sam4CDF.
1 DØ Grid PP Plans – SAM, Grid, Ceiling Wax and Things Iain Bertram Lancaster University Monday 5 November 2001.
DØSAR a Regional Grid within DØ Jae Yu Univ. of Texas, Arlington THEGrid Workshop July 8 – 9, 2004 Univ. of Texas at Arlington.
The Experiments – progress and status Roger Barlow GridPP7 Oxford 2 nd July 2003.
Data reprocessing for DZero on the SAM-Grid Gabriele Garzoglio for the SAM-Grid Team Fermilab, Computing Division.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
ATLAS is a general-purpose particle physics experiment which will study topics including the origin of mass, the processes that allowed an excess of matter.
GridPP11 Liverpool Sept04 SAMGrid GridPP11 Liverpool Sept 2004 Gavin Davies Imperial College London.
HIGUCHI Takeo Department of Physics, Faulty of Science, University of Tokyo Representing dBASF Development Team BELLE/CHEP20001 Distributed BELLE Analysis.
UTA MC Production Farm & Grid Computing Activities Jae Yu UT Arlington DØRACE Workshop Feb. 12, 2002 UTA DØMC Farm MCFARM Job control and packaging software.
Feb. 14, 2002DØRAM Proposal DØ IB Meeting, Jae Yu 1 Proposal for a DØ Remote Analysis Model (DØRAM) Introduction Partial Workshop Results DØRAM Architecture.
Status report of the KLOE offline G. Venanzoni – LNF LNF Scientific Committee Frascati, 9 November 2004.
Frank Wuerthwein, UCSD Update on D0 and CDF computing models and experience Frank Wuerthwein UCSD For CDF and DO collaborations October 2 nd, 2003 Many.
Run II Review Closeout 15 Sept., 2004 FNAL. Thanks! …all the hard work from the reviewees –And all the speakers …hospitality of our hosts Good progress.
Feb. 13, 2002DØRAM Proposal DØCPB Meeting, Jae Yu 1 Proposal for a DØ Remote Analysis Model (DØRAM) IntroductionIntroduction Partial Workshop ResultsPartial.
Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013
McFarm Improvements and Re-processing Integration D. Meyer for The UTA Team DØ SAR Workshop Oklahoma University 9/26 - 9/27/2003
Jianming Qian, UM/DØ Software & Computing Where we are now Where we want to go Overview Director’s Review, June 5, 2002.
CDF SAM Deployment Status Doug Benjamin Duke University (for the CDF Data Handling Group)
Apr. 25, 2002Why DØRAC? DØRAC FTFM, Jae Yu 1 What do we want DØ Regional Analysis Centers (DØRAC) do? Why do we need a DØRAC? What do we want a DØRAC do?
DØ Computing Model and Operational Status Gavin Davies Imperial College London Run II Computing Review, September 2005.
DØ Grid Computing Gavin Davies, Frédéric Villeneuve-Séguier Imperial College London On behalf of the DØ Collaboration and the SAMGrid team The 2007 Europhysics.
5/12/06T.Kurca - D0 Meeting FNAL1 p20 Reprocessing Introduction Computing Resources Architecture Operational Model Technical Issues Operational Issues.
Moving the LHCb Monte Carlo production system to the GRID
Monte Carlo Production and Reprocessing at DZero
Production Resources & Issues p20.09 MC-data Regeneration
for the Offline and Computing groups
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
DØ MC and Data Processing on the Grid
DØ RAC Working Group Report
Status and plans for bookkeeping system and production tools
Presentation transcript:

DØ Computing Model & Monte Carlo & Data Reprocessing Gavin Davies Imperial College London DOSAR Workshop, Sao Paulo, September 2005

DOSAR Workshop, Sept Outline Operational status   Globally – continue to do well   Shared by recent Run II Computing Review D Ø Computing model   Ongoing, ‘ long ’ established plan Production Computing   Monte Carlo   Reprocessing of Run II data   10 9 events reprocessed on the grid – largest HEP grid effort Looking forward Conclusions

DOSAR Workshop, Sept Snapshot of Current Status Reconstruction keeping up with data taking Data handling is performing well Production computing is off-site and grid based. It continues to grow & work well Over 75 million Monte Carlo events produced in last year Run IIa data set being reprocessed on the grid – 10 9 events Analysis cpu power has been expanded Globally doing well  Shared by recent Run II Computing Review

DOSAR Workshop, Sept Computing Model Started with distributed computing with evolution to automated use of common tools/solutions on the grid (SAM-Grid) for all tasks  Scalable  Not alone – Joint effort with others at FNAL and elsewhere, LHC … 1997 – Original Plan  All Monte Carlo to be produced off-site  SAM to be used for all data handling, provides a ‘ data-grid ’ Now: Monte Carlo and data reprocessing with SAM-Grid Next: Other production tasks e.g. fixing and then user analysis Use concept of Regional Centres  DOSAR one of pioneers  Builds local expertise

DOSAR Workshop, Sept Reconstruction Release Periodically update version of reconstruction code  As develop new / more refined algorithms  As get better understanding of detector Frequency of releases decreases with time  One major release in last year – p17  Basis for current Monte Carlo (MC) & data reprocessing Benefits of p17  Reco speed-up  Full calorimeter calibration  Fuller description of detector material  Use of zero-bias overlay for MC  (More details:

DOSAR Workshop, Sept SAM continues to perform well, providing a data-grid  50 SAM sites worldwide  Over 2.5 PB (50B events) consumed in the last year  Up to 300 TB moved per month  Larger SAM cache solved tape access issues  Continued success of SAM shifters  Often remote collaborators  Form 1 st line of defense  SAMTV monitors SAM & SAM stations Data Handling - SAM

DOSAR Workshop, Sept SAMGrid More than 10 DØ execution sites SAM – data handling JIM – job submission & monitoring SAM + JIM  SAM-Grid

DOSAR Workshop, Sept Remote Production Activities – Monte Carlo - I Over 75M events produced in last year, at more than 10 sites  More than double last year’s production Vast majority on shared sites  DOSAR major part of this SAM-Grid introduced in spring 04, becoming the default  Based on request system and jobmanager-mc_runjob  MC software package retrieved via SAM o way, inc central farm  Average production efficiency ~90%  Average inefficiency due to grid infrastructure ~1-5%  Continued move to common tools  DOSAR sites continue move to SAMGrid from McFarm From 04

DOSAR Workshop, Sept Beyond just ‘shared’ resources   More than 17M events produced ‘directly’ on LCG via submission from Nikhef   Good example of remote site driving the ‘development’ Similar momentum building on/for OSG Two good site examples within p17 reprocessing Remote Production Activities – Monte Carlo - II

DOSAR Workshop, Sept After significant improvements to reconstruction, reprocess old data P14 Winter 2003/04   500M events, 100M remotely, from DST   Based around mc_runjob   Distributed computing rather than Grid P17 End march  ~Oct   x 10 larger ie 1000M events, 250TB   Basically all remote   From raw ie use of db proxy servers   SAM-Grid as default (using mc_runjob)   GHz PIIIs for 6 months   Massive activity - largest grid activity in HEP Remote Production Activities – Reprocessing - I

DOSAR Workshop, Sept Reprocessing - II “Production” “Merging” Grid jobs spawns many batch jobs

DOSAR Workshop, Sept Reprocessing -III SAMGrid provides   Common environment & operation scripts at each site   Effective book-keeping   SAM avoids data duplication + defines recovery jobs   JIM’s XML-DB used to ease bug tracing   Tough deploying a product, under evolution with limited manpower to new sites (we are a running experiment)   Very significant improvements in JIM (scalability) during this period Certification of sites - Need to check   SAMGrid vs usual production   Remote sites vs central site   Merged vs unmerged files FNAL vs SPRACE

DOSAR Workshop, Sept ( = production speed in M events / day) ( = number batch jobs completing successfully) Reprocessing - IV Monitoring (illustration)   Overall efficiency, speed or by site. Status – into the “end-game”   Data sets all allocated, moving to ‘cleaning-up’   Must now push on the Monte Carlo ~855 Mevents done

DOSAR Workshop, Sept Need access to greater resources as data sets grow Ongoing programme on LCG and OSG interoperability Step 1 (co-existence) – use shared resources with SAM-Grid head-node  Widely done for both Reprocessing and MC  OSG co-existence shown for data reprocessing Step 2 – SAMGrid-LCG interface  SAM does data handling & JIM job submission  Basically forwarding mechanism  Prototype established at IN2P3/Wuppertal  Extending to production level OSG activity increasing – build on LCG experience Team work between core developers / sites SAM-Grid Interoperability

DOSAR Workshop, Sept Looking Forward Increased data sets require increased resources for MC, repro etc Route to these is increased use of grid and common tools Have an ongoing joint program, but work to do..  Continue development of SAM-Grid  Automated production job submission by shifters  Deployment team  Bring in new sites in manpower efficient manner  ‘Benefit’ of a new site goes well beyond a ‘cpu’ count – we appreciate / value this.  Full interoperability  Ability to access efficiently all shared resources Additional resources for above recommended by Taskforce

DOSAR Workshop, Sept Conclusions Computing model continues to be successful Based around grid-like computing, using common tools Key part of this is the production computing – MC and reprocessing Significant advances this year:  Continued migration to common tools  Progress on interoperability, both LCG and OSG  Two reprocessing sites operating under OSG  P17 reprocessing – a tremendous success  Strongly praised by Review Committee DOSAR major part of this  More ‘ general ’ contribution also strongly acknowledged.  Thank you Let ’ s all keep up the good work

DOSAR Workshop, Sept Back-up

DOSAR Workshop, Sept Terms Tevatron  Approx equiv challenge to LHC in “today’s” money  Running experiments SAM (Sequential Access to Metadata)  Well developed metadata and distributed data replication system  Originally developed by DØ & FNAL-CD JIM (Job Information and Monitoring)  handles job submission and monitoring (all but data handling)  SAM + JIM → SAM-Grid – computational grid Tools  Runjob- Handles job workflow management  dØtools– User interface for job submission  dØrte- Specification of runtime needs

DOSAR Workshop, Sept Reminder of Data Flow Data acquisition (raw data in evpack format)  Currently limited to 50 Hz Level-3 accept rate  Request increase to 100 Hz, as planned for Run IIb – see later Reconstruction (tmb/DST in evpack format)  Additional information in tmb → tmb ++ (DST format stopped)  Sufficient for ‘complex’ corrections, inc track fitting Fixing (tmb in evpack format)  Improvements / corrections coming after cut of production release  Centrally performed Skimming (tmb in evpack format)  Centralised event streaming based on reconstructed physics objects  Selection procedures regularly improved Analysis (out: root histogram)  Common root-based Analysis Format (CAF) introduced in last year  tmb format remains

DOSAR Workshop, Sept Remote Production Activities – Monte Carlo

DOSAR Workshop, Sept The Good and Bad of the Grid Only viable way to go … Increase in resources (cpu and potentially manpower)   Work with, not against, LHC   Still limited BUT Need to conform to standards – dependence on others.. Long term solutions must be favoured over short term idiosyncratic convenience   Or won’t be able to maintain adequate resources. Must maintain production level service (papers), while increasing functionality   As transparent as possible to non-expert