DØ Computing Model and Operational Status Gavin Davies Imperial College London Run II Computing Review, September 2005.

Slides:

Advertisements

Similar presentations

GridPP July 2003Stefan StonjekSlide 1 SAM middleware components Stefan Stonjek University of Oxford 7 th GridPP Meeting 02 nd July 2003 Oxford.

Advertisements

B A B AR and the GRID Roger Barlow for Fergus Wilson GridPP 13 5 th July 2005, Durham.

Amber Boehnlein, FNAL D0 Computing Model and Plans Amber Boehnlein D0 Financial Committee November 18, 2002.

Batch Production and Monte Carlo + CDB work status Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.

6/2/2015 Michael Diesburg HCP Distributed Computing at the Tevatron D0 Computing and Event Model Michael Diesburg, Fermilab For the D0 Collaboration.

Reconstruction and Analysis on Demand: A Success Story Christopher D. Jones Cornell University, USA.

JIM Deployment for the CDF Experiment M. Burgon-Lyon 1, A. Baranowski 2, V. Bartsch 3,S. Belforte 4, G. Garzoglio 2, R. Herber 2, R. Illingworth 2, R.

F Run II Experiments and the Grid Amber Boehnlein Fermilab September 16, 2005.

Hall D Online Data Acquisition CEBAF provides us with a tremendous scientific opportunity for understanding one of the fundamental forces of nature. 75.

L3 Filtering: status and plans D  Computing Review Meeting: 9 th May 2002 Terry Wyatt, on behalf of the L3 Algorithms group. For more details of current.

Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.

High Energy Physics At OSCER A User Perspective OU Supercomputing Symposium 2003 Joel Snow, Langston U.

W.Smith, U. Wisconsin, ZEUS Computing Board Zeus Executive, March 1, ZEUS Computing Board Report Zeus Executive Meeting Wesley H. Smith, University.

The SAMGrid Data Handling System Outline:  What Is SAMGrid?  Use Cases for SAMGrid in Run II Experiments  Current Operational Load  Stress Testing.

Remote Production and Regional Analysis Centers Iain Bertram 24 May 2002 Draft 1 Lancaster University.

Grid Job and Information Management (JIM) for D0 and CDF Gabriele Garzoglio for the JIM Team.

03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.

November 7, 2001Dutch Datagrid SARA 1 DØ Monte Carlo Challenge A HEP Application.

Building a distributed software environment for CDF within the ESLEA framework V. Bartsch, M. Lancaster University College London.

CDF Grid Status Stefan Stonjek 05-Jul th GridPP meeting / Durham.

D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.

3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.

CHEP'07 September D0 data reprocessing on OSG Authors Andrew Baranovski (Fermilab) for B. Abbot, M. Diesburg, G. Garzoglio, T. Kurca, P. Mhashilkar.

1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.

SAMGrid as a Stakeholder of FermiGrid Valeria Bartsch Computing Division Fermilab.

ALICE Upgrade for Run3: Computing HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.

ATLAS and GridPP GridPP Collaboration Meeting, Edinburgh, 5 th November 2001 RWL Jones, Lancaster University.

Instrumentation of the SAM-Grid Gabriele Garzoglio CSC 426 Research Proposal.

GridPP18 Glasgow Mar 07 DØ – SAMGrid Where’ve we come from, and where are we going? Evolution of a ‘long’ established plan Gavin Davies Imperial College.

DØ Computing Model & Monte Carlo & Data Reprocessing Gavin Davies Imperial College London DOSAR Workshop, Sao Paulo, September 2005.

Costin Grigoras ALICE Offline. In the period of steady LHC operation, The Grid usage is constant and high and, as foreseen, is used for massive RAW and.

November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.

CERN Physics Database Services and Plans Maria Girone, CERN-IT

16 September GridPP 5 th Collaboration Meeting D0&CDF SAM and The Grid Act I: Grid, Sam and Run II Rick St. Denis – Glasgow University Act II: Sam4CDF.

1 DØ Grid PP Plans – SAM, Grid, Ceiling Wax and Things Iain Bertram Lancaster University Monday 5 November 2001.

Data reprocessing for DZero on the SAM-Grid Gabriele Garzoglio for the SAM-Grid Team Fermilab, Computing Division.

CMS Computing and Core-Software USCMS CB Riverside, May 19, 2001 David Stickland, Princeton University CMS Computing and Core-Software Deputy PM.

Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.

ATLAS is a general-purpose particle physics experiment which will study topics including the origin of mass, the processes that allowed an excess of matter.

GridPP11 Liverpool Sept04 SAMGrid GridPP11 Liverpool Sept 2004 Gavin Davies Imperial College London.

Run II Review Closeout 15 Sept., 2005 FNAL. Thanks! …all the hard work from the reviewees –And all the speakers …hospitality of our hosts Good progress.

HIGUCHI Takeo Department of Physics, Faulty of Science, University of Tokyo Representing dBASF Development Team BELLE/CHEP20001 Distributed BELLE Analysis.

Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,

UTA MC Production Farm & Grid Computing Activities Jae Yu UT Arlington DØRACE Workshop Feb. 12, 2002 UTA DØMC Farm MCFARM Job control and packaging software.

Feb. 14, 2002DØRAM Proposal DØ IB Meeting, Jae Yu 1 Proposal for a DØ Remote Analysis Model (DØRAM) Introduction Partial Workshop Results DØRAM Architecture.

Status report of the KLOE offline G. Venanzoni – LNF LNF Scientific Committee Frascati, 9 November 2004.

CD FY09 Tactical Plan Status FY09 Tactical Plan Status Report for Neutrino Program (MINOS, MINERvA, General) Margaret Votava April 21, 2009 Tactical plan.

Frank Wuerthwein, UCSD Update on D0 and CDF computing models and experience Frank Wuerthwein UCSD For CDF and DO collaborations October 2 nd, 2003 Many.

Run II Review Closeout 15 Sept., 2004 FNAL. Thanks! …all the hard work from the reviewees –And all the speakers …hospitality of our hosts Good progress.

Predrag Buncic CERN ALICE Status Report LHCC Referee Meeting 01/12/2015.

Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013

D0 Farms 1 D0 Run II Farms M. Diesburg, B.Alcorn, J.Bakken, R. Brock,T.Dawson, D.Fagan, J.Fromm, K.Genser, L.Giacchetti, D.Holmgren, T.Jones, T.Levshina,

Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,

Site Services and Policies Summary Dirk Düllmann, CERN IT More details at

WLCG Status Report Ian Bird Austrian Tier 2 Workshop 22 nd June, 2010.

Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.

A Data Handling System for Modern and Future Fermilab Experiments Robert Illingworth Fermilab Scientific Computing Division.

Jianming Qian, UM/DØ Software & Computing Where we are now Where we want to go Overview Director’s Review, June 5, 2002.

Apr. 25, 2002Why DØRAC? DØRAC FTFM, Jae Yu 1 What do we want DØ Regional Analysis Centers (DØRAC) do? Why do we need a DØRAC? What do we want a DØRAC do?

Computing infrastructures for the LHC: current status and challenges of the High Luminosity LHC future Worldwide LHC Computing Grid (WLCG): Distributed.

DØ Grid Computing Gavin Davies, Frédéric Villeneuve-Séguier Imperial College London On behalf of the DØ Collaboration and the SAMGrid team The 2007 Europhysics.

1 P. Murat, Mini-review of the CDF Computing Plan 2006, 2005/10/18 An Update to the CDF Offline Plan and FY2006 Budget ● Outline: – CDF computing model.

Monte Carlo Production and Reprocessing at DZero

Data Challenge with the Grid in ATLAS

Production Resources & Issues p20.09 MC-data Regeneration

for the Offline and Computing groups

Readiness of ATLAS Computing - A personal view

DØ MC and Data Processing on the Grid

Gridifying the LHCb Monte Carlo production system

Presentation transcript:

DØ Computing Model and Operational Status Gavin Davies Imperial College London Run II Computing Review, September 2005

Run II 2005 Computing Review2 Outline Operational status   Globally – continue to do well D Ø Computing model & data flow   Ongoing, ‘ long ’ established plan Highlights from the last year   Algorithms   SAM-Grid & reprocessing of Run II data   10 9 events reprocessed on the grid – largest HEP grid effort Looking forward   Budget request   Manpower Conclusions

Run II 2005 Computing Review3 Snapshot of Current Status Reconstruction keeping up with data taking Data handling is performing well Production computing is off-site and grid based. It continues to grow & work well Over 75 million Monte Carlo events produced in last year Run IIa data set being reprocessed on the grid – 10 9 events Analysis cpu power has been expanded Globally doing well

Run II 2005 Computing Review4 Computing Model Started with distributed computing with evolution to automated use of common tools/solutions on the grid (SAM-Grid) for all tasks  Scalable  Not alone – Joint effort with CD and others, LHC … 1997 – Original Plan  All Monte Carlo to be produced off-site  SAM to be used for all data handling, provides a ‘ data-grid ’ Now: Monte Carlo and data reprocessing with SAM-Grid Next: Other production tasks e.g. fixing and then user analysis (~in order of increasing complexity)

Run II 2005 Computing Review5 Reminder of Data Flow Data acquisition (raw data in evpack format)  Currently limited to 50 Hz Level-3 accept rate  Request increase to 100 Hz, as planned for Run IIb – see later Reconstruction (tmb/DST in evpack format)  Additional information in tmb → tmb ++ (DST format stopped)  Sufficient for ‘complex’ corrections, inc track fitting Fixing (tmb in evpack format)  Improvements / corrections coming after cut of production release  Centrally performed Skimming (tmb in evpack format)  Centralised event streaming based on reconstructed physics objects  Selection procedures regularly improved Analysis (out: root histogram)  Common root-based Analysis Format (CAF) introduced in last year  tmb format remains

Run II 2005 Computing Review6 Reconstruction Central farm   Processing   & reprocessing (SAM-Grid) with spare cycles   Evolving to shared FNAL farms Reco-timing   Significant improvement, especially at higher instantaneous luminosity   See Qizhong ’ s talk

Run II 2005 Computing Review7 Highlights – “Algorithms” Algorithms reaching maturity P17 improvements include  Reco speed-up  Full calorimeter calibration  Fuller description of detector material Common Analysis Format (CAF)  Limits development of different root-based formats  Common object-selection, trigger-selection, normalization tools  Simplify, accelerate analysis development First 1fb -1 analyses by Moriond See Qizhong’s talk

Run II 2005 Computing Review8 SAM continues to perform well, providing a data-grid  50 SAM sites worldwide  Over 2.5 PB (50B events) consumed in the last year  Up to 300 TB moved per month  Larger SAM cache solved tape access issues  Continued success of SAM shifters  Often remote collaborators  Form 1 st line of defense  SAMTV monitors SAM & SAM stations Data Handling - SAM

Run II 2005 Computing Review9 Remote Production Activities / SAM-Grid Monte Carlo  Over 75M events produced in last year, at more than 10 sites  More than double last year’s production  Vast majority on shared sites (often national Tier 1 sites - primarily “LCG”)  SAM-Grid introduced in spring 04, becoming the default  Consolidation of SAM-Grid / LCG co-existence  Over 17M events produced ‘directly’ on LCG via submission from Nikhef Data reprocessing  After significant improvements to reconstruction, reprocess old data  P14 – winter 03/04 – from DST - 500M events, 100M off-site  P17 – now – from raw – 1B events – SAM-Grid default – basically all off-site  Massive task – largest HEP activity on the grid  GHz PIIIs for 6 months  Led to significant improvements to SAM-Grid  Collaborative effort

Run II 2005 Computing Review10 Reprocessing / SAM-Grid - I More than 10 DØ execution sites SAM – data handling JIM – job submission & monitoring SAM + JIM  SAM-Grid

Run II 2005 Computing Review11 ~825 Mevents done Reprocessing / SAM-Grid - II Speed ( = production speed in M events / day) SAM-Grid ‘enables’ a common environment & operation scripts as well as effective book-keeping  JIM’s XML-DB used for monitoring / bug tracing  SAM avoids data duplication + defines recovery jobs  Monitor speed and efficiency – by site or overall ( Started end march Comment:  Tough deploying a product, under evolution to new sites (we are a running experiment)

Run II 2005 Computing Review12 Need access to greater resources as data sets grow Ongoing programme on LCG and OSG interoperability Step 1 (co-existence) – use shared resources with SAM-Grid head-node  Widely done for both Reprocessing and MC  OSG co-existence shown for data reprocessing Step 2 – SAMGrid-LCG interface  SAM does data handling & JIM job submission  Basically forwarding mechanism  Prototype established at IN2P3/Wuppertal  Extending to production level OSG activity increasing – build on LCG experience Limited manpower SAM-Grid Interoperability

Run II 2005 Computing Review13 Looking Forward – Budget Request Long planned increase to 100Hz for Run IIb Experiment performing well   Run II average data taking eff ~84%, now pushing 90%   Making efficient use of data and resources   Many analyses published (have a complete analysis with 0.6fb -1 data) “Core” physics program saturates 50Hz rate at 1 x Maintaining 50Hz at 2 x → an effective loss of 1-2fb -1   Increase requires ~$1.5M in FY06/07, and <$1M after Details to come in Amber’s talk

Run II 2005 Computing Review14 Looking Forward – Manpower - I From directorate Taskforce report on manpower issues…. Some vulnerability through limited number of suitably qualified experts in either collaboration or CD  Databases – serve most of our needs but concern wrt trigger and luminosity data bases  Online system – key dependence on a single individual  Offline code management, build and distribution systems Additional areas where central consolidation of hardware support → reduced overall manpower needs  e.g. Level-3 trigger hardware support Under discussion with CD

Run II 2005 Computing Review15 Looking Forward – Manpower - II Increased data sets require increased resources Route to these is increased use of grid and common tools Have an ongoing joint program, but work to do.. Need effort to  Continue development of SAM-Grid  Automated production job submission by shifters  User analyses  Deployment team  Bring in new sites in manpower efficient manner  Full interoperability  Ability to access efficiently all shared resources Additional resources for above recommended by Taskforce  Support recommendation that some of additional manpower come via more guest scientists, postdocs and associate scientists

Run II 2005 Computing Review16 Conclusions Computing model continues to be successful Significant advances this year:  Reco speed-up, Common Analysis Format  Extension of grid capabilities  P17 reprocessing with SAM-Grid  Interoperability Want to maintain / build on this progress Potential issues/ challenges being addressed  Short term - Ongoing action on immediate vulnerabilities  Longer term – larger data sets  Continued development of common tools, increased use of the grid  Continued development of above in collaboration with others  Manpower injection required to achieve reduced effort in steady state, with increased functionality – see Taskforce talk Globally doing well

Run II 2005 Computing Review17 Back-up

Run II 2005 Computing Review18 Terms Tevatron  Approx equiv challenge to LHC in “today’s” money  Running experiments SAM (Sequential Access to Metadata)  Well developed metadata and distributed data replication system  Originally developed by DØ & FNAL-CD JIM (Job Information and Monitoring)  handles job submission and monitoring (all but data handling)  SAM + JIM → SAM-Grid – computational grid Tools  Runjob- Handles job workflow management  dØtools– User interface for job submission  dØrte- Specification of runtime needs

Run II 2005 Computing Review19 Reprocessing

Run II 2005 Computing Review20 The Good and Bad of the Grid Only viable way to go … Increase in resources (cpu and potentially manpower)   Work with, not against, LHC   Still limited BUT Need to conform to standards – dependence on others.. Long term solutions must be favoured over short term idiosyncratic convenience   Or won’t be able to maintain adequate resources. Must maintain production level service (papers), while increasing functionality   As transparent as possible to non-expert