Download presentation
Presentation is loading. Please wait.
Published byIra Wells Modified over 8 years ago
1
McFarm Improvements and Re-processing Integration D. Meyer for The UTA Team DØ SAR Workshop Oklahoma University 9/26 - 9/27/2003 http://www-hep.uta.edu/~d0race/McFarm/McFarm.html
2
Sept. 26, 2003McFarm Improvements; D. Meyer, UTA DØ SAR Workshop, OU 2 Reasons for Using McFarm McFarm is a DØ MC Control Software developed at UTA and used in six farms Simplifies Monte Carlo Production Manages the Cluster with Minimum Labor Manages the Cluster Efficiently Minimizes Impact of Changes to SAM, mc_runjob, other DØ software User-Oriented
3
Sept. 26, 2003McFarm Improvements; D. Meyer, UTA DØ SAR Workshop, OU 3 McFarm Software Integration DØ Binaries - minitars or full release SAM - declaration, storage, retrieval mc_runjob - job and metadata construction NFS - access to binaries, minbias database NIS - account management ssh - intra-cluster monitoring and control Batch Queues - PBS and Condor
4
Sept. 26, 2003McFarm Improvements; D. Meyer, UTA DØ SAR Workshop, OU 4 Improvements: Procedural Changes Request monitoring now provided by McFarm monitor to close requests “check_sam” now obsolete, replaced by archive daemon & store-verification Mechanism to handle too-large reco tasks: do just the pythia/d0g/sim (PDS jobs) and let requestor do reco on CAB
5
Sept. 26, 2003McFarm Improvements; D. Meyer, UTA DØ SAR Workshop, OU 5 Bug Fixes event-count now correct when not all events done available-space correct on NFS-mounted disks (df command) No longer attempting to patch metadata for bad key-words. Others
6
Sept. 26, 2003McFarm Improvements; D. Meyer, UTA DØ SAR Workshop, OU 6 Enhancements Two-day grace period for final tmb merge now configurable: FARM_MERGE_GRACE_HOURS Also FARM_MERGE_MAX_EVENTS and FARM_MERGE_MAX_FILES Monitor reassurance can be turned off: FARM_MONITOR_REASSURE=‘NO’
7
Sept. 26, 2003McFarm Improvements; D. Meyer, UTA DØ SAR Workshop, OU 7 Enhancements - 2 FARM_SAM_VERIFY_STORE_HOURS makes storing and purging separate events in McFarm. “Archive” daemon. SAM store retry improved - will undeclare, cancel-store as necessary bin/onetime/re-store full-job-dir-name SAM gather will get to merger files periodically even if busy with regular stores
8
Sept. 26, 2003McFarm Improvements; D. Meyer, UTA DØ SAR Workshop, OU 8 Enhancements - 3 Request life-cycle monitoring handled by McFarm monitor to improve turn-around. Number of events now in gather.log execute daemon will detect reco stall due to over-swapping and will kill job. Execute daemon retains job hist even when stopped/restarted (job.hist file)
9
Sept. 26, 2003McFarm Improvements; D. Meyer, UTA DØ SAR Workshop, OU 9 Enhancements - 4 launch_request accepts PDS (and S) jobs to handle unwieldy reco requests - it stores sim files. purge_job accepts “--d0phase=mcpNN” argument to purge archives by D0 phase, including merger archives.
10
Sept. 26, 2003McFarm Improvements; D. Meyer, UTA DØ SAR Workshop, OU 10 Re-Processing Real Data Basic approach is to do d0reco binary only, using reprocessing Framework rcp, running on raw or reco file as input. Joel Snow traced the rcp usage in UMICH job Mark Sosebee has done sample by hand and analyzed histograms - so far so good Dave Evans has included re-processing support in version 06-00-02 of mc_runjob
11
Sept. 26, 2003McFarm Improvements; D. Meyer, UTA DØ SAR Workshop, OU 11 Re-Processing Real Data - 2 I have used mc_runjob v06 to manually re- process both MC reco files and raw files. Testing is continuing - presently some problems with metadata declaration. The bad news: mc_runjob v06 contains substantial changes to job structure and execution that will require days of work to integrate into McFarm and test all code
12
Sept. 26, 2003McFarm Improvements; D. Meyer, UTA DØ SAR Workshop, OU 12 Re-Processing Real Data - 3 Our approach is to have Request_NNNN.py file include a sam dataset definition of files to be re- processed, feed into McFarm just like a Monte Carlo request Some of McFarm is ready (SAM acquire), some is not (launch_request RT, v06 adaptation, switch from events to files) Dave Evans is leaving
13
Sept. 26, 2003McFarm Improvements; D. Meyer, UTA DØ SAR Workshop, OU 13 Re-Processing Real Data - 4 Key contents of Request_REPROC02.py: 'Reconstructed':{'datasetdefname':’ reco_14.02.00_raw_2_files’, 'frameworkrcpname':'runD0recoSAM_data_reprocess _p13dst.rcp',}, launch_request REPROC02 /home/mctest 0 RT job UTATEST-RT-ReqREPROC02-03265220712 It runs under mc_runjob v05 / McFarm v10.04, but no proper metadata yet.
14
Sept. 26, 2003McFarm Improvements; D. Meyer, UTA DØ SAR Workshop, OU 14 Re-Processing Real Data To Be Done mc_runjob v06 must be debugged and released McFarm must be adapted to v06 and debugged Metadata must be stabilized and accepted by SAM Re-processing authority should use MC-like Request_NNNN.py to invoke re-reco
15
Sept. 26, 2003McFarm Improvements; D. Meyer, UTA DØ SAR Workshop, OU 15 Conclusions McFarm has morphed significantly since its creation to accommodate –Enhanced error handling –Enhanced monitoring –Other improvements Re-processing capability in the works, despite some worries on schedule and support IAC’s use and comments prompted McFarm improvements (Thank you everyone!!) Comments always appreciated
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.