Presentation is loading. Please wait.

Presentation is loading. Please wait.

DØ Computing Model & Monte Carlo & Data Reprocessing Gavin Davies Imperial College London DOSAR Workshop, Sao Paulo, September 2005.

Similar presentations


Presentation on theme: "DØ Computing Model & Monte Carlo & Data Reprocessing Gavin Davies Imperial College London DOSAR Workshop, Sao Paulo, September 2005."— Presentation transcript:

1 DØ Computing Model & Monte Carlo & Data Reprocessing Gavin Davies Imperial College London DOSAR Workshop, Sao Paulo, September 2005

2 DOSAR Workshop, Sept 20052 Outline Operational status   Globally – continue to do well   Shared by recent Run II Computing Review D Ø Computing model   Ongoing, ‘ long ’ established plan Production Computing   Monte Carlo   Reprocessing of Run II data   10 9 events reprocessed on the grid – largest HEP grid effort Looking forward Conclusions

3 DOSAR Workshop, Sept 20053 Snapshot of Current Status Reconstruction keeping up with data taking Data handling is performing well Production computing is off-site and grid based. It continues to grow & work well Over 75 million Monte Carlo events produced in last year Run IIa data set being reprocessed on the grid – 10 9 events Analysis cpu power has been expanded Globally doing well  Shared by recent Run II Computing Review

4 DOSAR Workshop, Sept 20054 Computing Model Started with distributed computing with evolution to automated use of common tools/solutions on the grid (SAM-Grid) for all tasks  Scalable  Not alone – Joint effort with others at FNAL and elsewhere, LHC … 1997 – Original Plan  All Monte Carlo to be produced off-site  SAM to be used for all data handling, provides a ‘ data-grid ’ Now: Monte Carlo and data reprocessing with SAM-Grid Next: Other production tasks e.g. fixing and then user analysis Use concept of Regional Centres  DOSAR one of pioneers  Builds local expertise

5 DOSAR Workshop, Sept 20055 Reconstruction Release Periodically update version of reconstruction code  As develop new / more refined algorithms  As get better understanding of detector Frequency of releases decreases with time  One major release in last year – p17  Basis for current Monte Carlo (MC) & data reprocessing Benefits of p17  Reco speed-up  Full calorimeter calibration  Fuller description of detector material  Use of zero-bias overlay for MC  (More details: http://cdinternal.fnal.gov/RUNIIRev/runIIMP05.asp)

6 DOSAR Workshop, Sept 20056 SAM continues to perform well, providing a data-grid  50 SAM sites worldwide  Over 2.5 PB (50B events) consumed in the last year  Up to 300 TB moved per month  Larger SAM cache solved tape access issues  Continued success of SAM shifters  Often remote collaborators  Form 1 st line of defense  SAMTV monitors SAM & SAM stations Data Handling - SAM http://d0db-prd.fnal.gov/sm_local/SamAtAGlance/

7 DOSAR Workshop, Sept 20057 SAMGrid More than 10 DØ execution sites http://samgrid.fnal.gov:8080/ http://samgrid.fnal.gov:8080/list_of_schedulers.php http://samgrid.fnal.gov:8080/list_of_resources.php SAM – data handling JIM – job submission & monitoring SAM + JIM  SAM-Grid

8 DOSAR Workshop, Sept 20058 Remote Production Activities – Monte Carlo - I Over 75M events produced in last year, at more than 10 sites  More than double last year’s production Vast majority on shared sites  DOSAR major part of this SAM-Grid introduced in spring 04, becoming the default  Based on request system and jobmanager-mc_runjob  MC software package retrieved via SAM o way, inc central farm  Average production efficiency ~90%  Average inefficiency due to grid infrastructure ~1-5%  http://www-d0.fnal.gov/computing/grid/deployment-issues.html Continued move to common tools  DOSAR sites continue move to SAMGrid from McFarm From 04

9 DOSAR Workshop, Sept 20059 Beyond just ‘shared’ resources   More than 17M events produced ‘directly’ on LCG via submission from Nikhef   Good example of remote site driving the ‘development’ Similar momentum building on/for OSG Two good site examples within p17 reprocessing Remote Production Activities – Monte Carlo - II

10 DOSAR Workshop, Sept 200510 After significant improvements to reconstruction, reprocess old data P14 Winter 2003/04   500M events, 100M remotely, from DST   Based around mc_runjob   Distributed computing rather than Grid P17 End march  ~Oct   x 10 larger ie 1000M events, 250TB   Basically all remote   From raw ie use of db proxy servers   SAM-Grid as default (using mc_runjob)   3200 1GHz PIIIs for 6 months   Massive activity - largest grid activity in HEP Remote Production Activities – Reprocessing - I http://www-d0.fnal.gov/computing/reprocessing/p17/

11 DOSAR Workshop, Sept 200511 Reprocessing - II “Production” “Merging” Grid jobs spawns many batch jobs

12 DOSAR Workshop, Sept 200512 Reprocessing -III SAMGrid provides   Common environment & operation scripts at each site   Effective book-keeping   SAM avoids data duplication + defines recovery jobs   JIM’s XML-DB used to ease bug tracing   Tough deploying a product, under evolution with limited manpower to new sites (we are a running experiment)   Very significant improvements in JIM (scalability) during this period Certification of sites - Need to check   SAMGrid vs usual production   Remote sites vs central site   Merged vs unmerged files FNAL vs SPRACE

13 DOSAR Workshop, Sept 200513 ( = production speed in M events / day) ( = number batch jobs completing successfully) Reprocessing - IV http://samgrid.fnal.gov:8080/cgi-bin/plot_efficiency.cgi Monitoring (illustration)   Overall efficiency, speed or by site. Status – into the “end-game”   Data sets all allocated, moving to ‘cleaning-up’   Must now push on the Monte Carlo ~855 Mevents done

14 DOSAR Workshop, Sept 200514 Need access to greater resources as data sets grow Ongoing programme on LCG and OSG interoperability Step 1 (co-existence) – use shared resources with SAM-Grid head-node  Widely done for both Reprocessing and MC  OSG co-existence shown for data reprocessing Step 2 – SAMGrid-LCG interface  SAM does data handling & JIM job submission  Basically forwarding mechanism  Prototype established at IN2P3/Wuppertal  Extending to production level OSG activity increasing – build on LCG experience Team work between core developers / sites SAM-Grid Interoperability

15 DOSAR Workshop, Sept 200515 Looking Forward Increased data sets require increased resources for MC, repro etc Route to these is increased use of grid and common tools Have an ongoing joint program, but work to do..  Continue development of SAM-Grid  Automated production job submission by shifters  Deployment team  Bring in new sites in manpower efficient manner  ‘Benefit’ of a new site goes well beyond a ‘cpu’ count – we appreciate / value this.  Full interoperability  Ability to access efficiently all shared resources Additional resources for above recommended by Taskforce

16 DOSAR Workshop, Sept 200516 Conclusions Computing model continues to be successful Based around grid-like computing, using common tools Key part of this is the production computing – MC and reprocessing Significant advances this year:  Continued migration to common tools  Progress on interoperability, both LCG and OSG  Two reprocessing sites operating under OSG  P17 reprocessing – a tremendous success  Strongly praised by Review Committee DOSAR major part of this  More ‘ general ’ contribution also strongly acknowledged.  Thank you Let ’ s all keep up the good work

17 DOSAR Workshop, Sept 200517 Back-up

18 DOSAR Workshop, Sept 200518 Terms Tevatron  Approx equiv challenge to LHC in “today’s” money  Running experiments SAM (Sequential Access to Metadata)  Well developed metadata and distributed data replication system  Originally developed by DØ & FNAL-CD JIM (Job Information and Monitoring)  handles job submission and monitoring (all but data handling)  SAM + JIM → SAM-Grid – computational grid Tools  Runjob- Handles job workflow management  dØtools– User interface for job submission  dØrte- Specification of runtime needs

19 DOSAR Workshop, Sept 200519 Reminder of Data Flow Data acquisition (raw data in evpack format)  Currently limited to 50 Hz Level-3 accept rate  Request increase to 100 Hz, as planned for Run IIb – see later Reconstruction (tmb/DST in evpack format)  Additional information in tmb → tmb ++ (DST format stopped)  Sufficient for ‘complex’ corrections, inc track fitting Fixing (tmb in evpack format)  Improvements / corrections coming after cut of production release  Centrally performed Skimming (tmb in evpack format)  Centralised event streaming based on reconstructed physics objects  Selection procedures regularly improved Analysis (out: root histogram)  Common root-based Analysis Format (CAF) introduced in last year  tmb format remains

20 DOSAR Workshop, Sept 200520 Remote Production Activities – Monte Carlo

21 DOSAR Workshop, Sept 200521 The Good and Bad of the Grid Only viable way to go … Increase in resources (cpu and potentially manpower)   Work with, not against, LHC   Still limited BUT Need to conform to standards – dependence on others.. Long term solutions must be favoured over short term idiosyncratic convenience   Or won’t be able to maintain adequate resources. Must maintain production level service (papers), while increasing functionality   As transparent as possible to non-expert


Download ppt "DØ Computing Model & Monte Carlo & Data Reprocessing Gavin Davies Imperial College London DOSAR Workshop, Sao Paulo, September 2005."

Similar presentations


Ads by Google