McFarm Improvements and Re-processing Integration D. Meyer for The UTA Team DØ SAR Workshop Oklahoma University 9/26 - 9/27/2003

Slides:



Advertisements
Similar presentations
Making the System Operational
Advertisements

Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.
4/2/2002HEP Globus Testing Request - Jae Yu x Participating in Globus Test-bed Activity for DØGrid UTA HEP group is playing a leading role in establishing.
McFarm: first attempt to build a practical, large scale distributed HEP computing cluster using Globus technology Anand Balasubramanian Karthik Gopalratnam.
Status of the new CRS software (update) Tomasz Wlodek June 22, 2003.
Batch Production and Monte Carlo + CDB work status Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
Reconstruction and Analysis on Demand: A Success Story Christopher D. Jones Cornell University, USA.
Pertemuan 16 Matakuliah: A0214/Audit Sistem Informasi Tahun: 2007.
Introduction to z/OS Basics © 2006 IBM Corporation Chapter 7: Batch processing and the Job Entry Subsystem (JES) Batch processing and JES.
Design, Implementation and Maintenance
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. Chapter 5: Batch processing and the Job Entry Subsystem (JES) Batch.
PPOUG, 05-OCT-01 Agenda RMAN Architecture Why Use RMAN? Implementation Decisions RMAN Oracle9i New Features.
Upcoming Enhancements to the HST Archive Mark Kyprianou Operations and Engineering Division Data System Branch.
CFT Offline Monitoring Michael Friedman. Contents Procedure  About the executable  Notes on how to run Results  What output there is and how to access.
UPC/SHMEM PAT High-level Design v.1.1 Hung-Hsun Su UPC Group, HCS lab 6/21/2005.
High Energy Physics At OSCER A User Perspective OU Supercomputing Symposium 2003 Joel Snow, Langston U.
S. Veseli - SAM Project Status SAMGrid Developments – Part I Siniša Veseli CD/D0CA.
The SAMGrid Data Handling System Outline:  What Is SAMGrid?  Use Cases for SAMGrid in Run II Experiments  Current Operational Load  Stress Testing.
 To explain the importance of software configuration management (CM)  To describe key CM activities namely CM planning, change management, version management.
Claudio Grandi INFN Bologna CMS Operations Update Ian Fisk, Claudio Grandi 1.
Remote Production and Regional Analysis Centers Iain Bertram 24 May 2002 Draft 1 Lancaster University.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
Central Reconstruction System on the RHIC Linux Farm in Brookhaven Laboratory HEPIX - BNL October 19, 2004 Tomasz Wlodek - BNL.
November 7, 2001Dutch Datagrid SARA 1 DØ Monte Carlo Challenge A HEP Application.
D0 Farms 1 D0 Run II Farms M. Diesburg, B.Alcorn, J.Bakken, T.Dawson, D.Fagan, J.Fromm, K.Genser, L.Giacchetti, D.Holmgren, T.Jones, T.Levshina, L.Lueking,
Principles of Information Systems, Sixth Edition Systems Design, Implementation, Maintenance, and Review Chapter 13.
1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
Bookkeeping Tutorial. Bookkeeping & Monitoring Tutorial2 Bookkeeping content  Contains records of all “jobs” and all “files” that are created by production.
GridPP18 Glasgow Mar 07 DØ – SAMGrid Where’ve we come from, and where are we going? Evolution of a ‘long’ established plan Gavin Davies Imperial College.
DØ Computing Model & Monte Carlo & Data Reprocessing Gavin Davies Imperial College London DOSAR Workshop, Sao Paulo, September 2005.
Status of UTA IAC + RAC Jae Yu 3 rd DØSAR Workshop Apr. 7 – 9, 2004 Louisiana Tech. University.
26SEP03 2 nd SAR Workshop Oklahoma University Dick Greenwood Louisiana Tech University LaTech IAC Site Report.
Giuseppe Codispoti INFN - Bologna Egee User ForumMarch 2th BOSS: the CMS interface for job summission, monitoring and bookkeeping W. Bacchi, P.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
DØSAR a Regional Grid within DØ Jae Yu Univ. of Texas, Arlington THEGrid Workshop July 8 – 9, 2004 Univ. of Texas at Arlington.
Data reprocessing for DZero on the SAM-Grid Gabriele Garzoglio for the SAM-Grid Team Fermilab, Computing Division.
SAM - Sequential Data Access via Metadata Schema Metadata Functionality Workshop Glasgow University April 26-28,2004.
DØ Data Handling & Access The DØ Meta-Data Browser Pushpa Bhat Fermilab June 4, 2001.
Analysis trains – Status & experience from operation Mihaela Gheata.
From DØ To ATLAS Jae Yu ATLAS Grid Test-Bed Workshop Apr. 4-6, 2002, UTA Introduction DØ-Grid & DØRACE DØ Progress UTA DØGrid Activities Conclusions.
GridPP11 Liverpool Sept04 SAMGrid GridPP11 Liverpool Sept 2004 Gavin Davies Imperial College London.
Introduction to z/OS Basics © 2006 IBM Corporation Chapter 7: Batch processing and the Job Entry Subsystem (JES) Batch processing and JES.
Principles of Information Systems, Sixth Edition 1 Systems Design, Implementation, Maintenance, and Review Chapter 13.
Copyright 2007, Information Builders. Slide 1 Machine Sizing and Scalability Mark Nesson, Vashti Ragoonath June 2008.
UTA MC Production Farm & Grid Computing Activities Jae Yu UT Arlington DØRACE Workshop Feb. 12, 2002 UTA DØMC Farm MCFARM Job control and packaging software.
Re-Reconstruction Of Generated Monte Carlo In a McFarm Context 2003/09/26 Joel Snow, Langston U.
D4Science and ETICS Building and Testing gCube and gCore Pedro Andrade CERN EGEE’08 Conference 25 September 2008 Istanbul (Turkey)
D0 Farms 1 D0 Run II Farms M. Diesburg, B.Alcorn, J.Bakken, R. Brock,T.Dawson, D.Fagan, J.Fromm, K.Genser, L.Giacchetti, D.Holmgren, T.Jones, T.Levshina,
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
DZero Monte Carlo Production Ideas for CMS Greg Graham Fermilab CD/CMS 1/16/01 CMS Production Meeting.
Introduction to the SAM System at DØ Physics 5391 July 1, 2002 Mark Sosebee U.T. Arlington.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Jianming Qian, UM/DØ Software & Computing Where we are now Where we want to go Overview Director’s Review, June 5, 2002.
CDF SAM Deployment Status Doug Benjamin Duke University (for the CDF Data Handling Group)
DØ Grid Computing Gavin Davies, Frédéric Villeneuve-Séguier Imperial College London On behalf of the DØ Collaboration and the SAMGrid team The 2007 Europhysics.
PROTO-GRID Status of Grid-enabled UTA McFarm software Tomasz Wlodek University of the Great State of Texas At Arlington.
Status report NIKHEF Willem van Leeuwen February 11, 2002 DØRACE.
OUHEP STATUS Hardware OUHEP0, 2x Athlon 1GHz, 2 GB, 800GB RAID
US CMS Testbed.
Operational Dataset Update Functionality Included in the NCAR Research Data Archive Management System Zaihua Ji Doug Schuster Steven Worley Computational.
UM D0RACE STATION Status Report Chunhui Han June 20, 2002
Chapter 2: Operating-System Structures
DØ MC and Data Processing on the Grid
Status report NIKHEF Willem van Leeuwen February 11, 2002 DØRACE.
Overview of Workflows: Why Use Them?
DØ RAC Working Group Report
Chapter 2: Operating-System Structures
Presentation transcript:

McFarm Improvements and Re-processing Integration D. Meyer for The UTA Team DØ SAR Workshop Oklahoma University 9/26 - 9/27/2003

Sept. 26, 2003McFarm Improvements; D. Meyer, UTA DØ SAR Workshop, OU 2 Reasons for Using McFarm McFarm is a DØ MC Control Software developed at UTA and used in six farms Simplifies Monte Carlo Production Manages the Cluster with Minimum Labor Manages the Cluster Efficiently Minimizes Impact of Changes to SAM, mc_runjob, other DØ software User-Oriented

Sept. 26, 2003McFarm Improvements; D. Meyer, UTA DØ SAR Workshop, OU 3 McFarm Software Integration DØ Binaries - minitars or full release SAM - declaration, storage, retrieval mc_runjob - job and metadata construction NFS - access to binaries, minbias database NIS - account management ssh - intra-cluster monitoring and control Batch Queues - PBS and Condor

Sept. 26, 2003McFarm Improvements; D. Meyer, UTA DØ SAR Workshop, OU 4 Improvements: Procedural Changes Request monitoring now provided by McFarm monitor to close requests “check_sam” now obsolete, replaced by archive daemon & store-verification Mechanism to handle too-large reco tasks: do just the pythia/d0g/sim (PDS jobs) and let requestor do reco on CAB

Sept. 26, 2003McFarm Improvements; D. Meyer, UTA DØ SAR Workshop, OU 5 Bug Fixes event-count now correct when not all events done available-space correct on NFS-mounted disks (df command) No longer attempting to patch metadata for bad key-words. Others

Sept. 26, 2003McFarm Improvements; D. Meyer, UTA DØ SAR Workshop, OU 6 Enhancements Two-day grace period for final tmb merge now configurable: FARM_MERGE_GRACE_HOURS Also FARM_MERGE_MAX_EVENTS and FARM_MERGE_MAX_FILES Monitor reassurance can be turned off: FARM_MONITOR_REASSURE=‘NO’

Sept. 26, 2003McFarm Improvements; D. Meyer, UTA DØ SAR Workshop, OU 7 Enhancements - 2 FARM_SAM_VERIFY_STORE_HOURS makes storing and purging separate events in McFarm. “Archive” daemon. SAM store retry improved - will undeclare, cancel-store as necessary bin/onetime/re-store full-job-dir-name SAM gather will get to merger files periodically even if busy with regular stores

Sept. 26, 2003McFarm Improvements; D. Meyer, UTA DØ SAR Workshop, OU 8 Enhancements - 3 Request life-cycle monitoring handled by McFarm monitor to improve turn-around. Number of events now in gather.log execute daemon will detect reco stall due to over-swapping and will kill job. Execute daemon retains job hist even when stopped/restarted (job.hist file)

Sept. 26, 2003McFarm Improvements; D. Meyer, UTA DØ SAR Workshop, OU 9 Enhancements - 4 launch_request accepts PDS (and S) jobs to handle unwieldy reco requests - it stores sim files. purge_job accepts “--d0phase=mcpNN” argument to purge archives by D0 phase, including merger archives.

Sept. 26, 2003McFarm Improvements; D. Meyer, UTA DØ SAR Workshop, OU 10 Re-Processing Real Data Basic approach is to do d0reco binary only, using reprocessing Framework rcp, running on raw or reco file as input. Joel Snow traced the rcp usage in UMICH job Mark Sosebee has done sample by hand and analyzed histograms - so far so good Dave Evans has included re-processing support in version of mc_runjob

Sept. 26, 2003McFarm Improvements; D. Meyer, UTA DØ SAR Workshop, OU 11 Re-Processing Real Data - 2 I have used mc_runjob v06 to manually re- process both MC reco files and raw files. Testing is continuing - presently some problems with metadata declaration. The bad news: mc_runjob v06 contains substantial changes to job structure and execution that will require days of work to integrate into McFarm and test all code

Sept. 26, 2003McFarm Improvements; D. Meyer, UTA DØ SAR Workshop, OU 12 Re-Processing Real Data - 3 Our approach is to have Request_NNNN.py file include a sam dataset definition of files to be re- processed, feed into McFarm just like a Monte Carlo request Some of McFarm is ready (SAM acquire), some is not (launch_request RT, v06 adaptation, switch from events to files) Dave Evans is leaving

Sept. 26, 2003McFarm Improvements; D. Meyer, UTA DØ SAR Workshop, OU 13 Re-Processing Real Data - 4 Key contents of Request_REPROC02.py: 'Reconstructed':{'datasetdefname':’ reco_ _raw_2_files’, 'frameworkrcpname':'runD0recoSAM_data_reprocess _p13dst.rcp',}, launch_request REPROC02 /home/mctest 0 RT job UTATEST-RT-ReqREPROC It runs under mc_runjob v05 / McFarm v10.04, but no proper metadata yet.

Sept. 26, 2003McFarm Improvements; D. Meyer, UTA DØ SAR Workshop, OU 14 Re-Processing Real Data To Be Done mc_runjob v06 must be debugged and released McFarm must be adapted to v06 and debugged Metadata must be stabilized and accepted by SAM Re-processing authority should use MC-like Request_NNNN.py to invoke re-reco

Sept. 26, 2003McFarm Improvements; D. Meyer, UTA DØ SAR Workshop, OU 15 Conclusions McFarm has morphed significantly since its creation to accommodate –Enhanced error handling –Enhanced monitoring –Other improvements Re-processing capability in the works, despite some worries on schedule and support IAC’s use and comments prompted McFarm improvements (Thank you everyone!!)  Comments always appreciated