Download presentation
Presentation is loading. Please wait.
1
Charles Maguire for VU-RHIC group
DOE Supplemental Proposal from VU-RHIC Group To Expand the Use of ACCRE in PHENIX and CMS Charles Maguire for VU-RHIC group August 8, 2007 Meeting at ACCRE
2
Overview VU-RHIC Group Should Submit a Supplemental Proposal to DOE in August “Supplemental proposal” Request for funds in addition to regular grant year funds Based on expansion of original grant’s mission goals, or extraordinary expenses Extraordinary expenses related to Run7 TOF-West commissioning (Julia Velkovska) Expansion of ACCRE’s role in reconstructing PHENIX and future CMS real data Urgent that we do this in August, and before semester begins Good evidence of new DOE funds being available for computing in FY’07 US-CMS-HI received an extra $230K on one-day’s notice in May ! RCF anticipating receiving extra funds for CPUs and disks next month Original 3-year grant proposal ( ) had similar computing request Received favorably by DOE but we were asked to submit a supplemental in 2007 Strategies Document good successes, and shortcomings, of work for PHENIX in Run7 Shortcomings to be overcome with new infrastructure Leverage off the work being done by Paul et al. in CMS-HEP Emphasize that ACCRE will be a “turnkey” facility for CMS as of 2008 Unique role of ACCRE as a facility for both PHENIX and CMS August 8, 2007 Meeting at ACCRE
3
Competition RHIC Computing Facility (RCF)
RCF staff and PHENIX staff at BNL view RCF as the main data reco center Only “new” money should go to off-site centers, no reduction in RCF budgets RCF has very large, but not infinite, hardware resources ~1000 CPUs for PHENIX data reconstruction (cannibalization of analysis CPUs) ~400 TBytes of globally visible disk space HPSS semi-infinite but restricted access ~30 MBytes/second RCF staff admittedly too small to explore new technology options Not sure of quantifying marginal costs at RCF vis-à-vis ACCRE CCJ (PHENIX Computer Center in Japan) About half the size of RCF, and also has an HPSS Very limited staff, but facility dedicated and aligned to PHENIX use (no CMS) Intends to do Run8 work as it did in Run6 for the p+p data set CCF (PHENIX Computer Center in France) Group of 2 persons may be extinct in PHENIX as of 2009? August 8, 2007 Meeting at ACCRE
4
Short Term Issues Supplemental Proposal Should Have PHENIX approval
If PHENIX computing group at BNL criticizes the proposal to DOE then the proposal will have no chance to succeed PHENIX computing group does recognize that RCF is not enough Well documented history in past 3 years of off-site data reconstruction Will look good for DOE to be supporting an off-site resource in the U.S. US-CMS-HI Not Ready to Pass Judgment on Supplemental Proposal Only just beginning to research the needed resources (meeting next week) Four sites interested: MIT, UIC, Iowa, and ACCRE Should pass the proposal in front of US-CMS-HI to be honest and up-front PHENIX Moving to SL4.4 in September, Run8 data arrives January Can we rebuild and certify PHENIX libraries on ACCRE after September How to we address the ACCRE conversion to RHEL 5 in the Spring? Also an issue for CMS which is at SL4 at this time August 8, 2007 Meeting at ACCRE
5
Three-Five Year Horizon Facts
PHENIX Run8 (2008) in PHENIX planned as p+p and d+Au (less data intensive) Run9 (2009) planned as Au+Au and p+p (more data intensive, like Run7) Run10 (2010) Au+Au energy scan, 500 GeV p+p (tentative?) CMS p+p in 2008, Pb+Pb in 2009 (likely slow ramp to PHENIX size data volumes) Annual increment of CPUs ( ): 105, 83, 83, 83, 83 Total specInt2K ( in 1012 units): 20, 43, 75, 103, 143 Requesting $325K/year computing budget No knowledge of data sizes (input and output) Need 10 Gbit link to FermiLab Tier 1 center Operating Systems Possibly Different between CMS and PHENIX How can we adjust to this at ACCRE Public relations problems with ACCRE as perceived at PHENIX? August 8, 2007 Meeting at ACCRE
6
Run-7 a major success! + RXNP, TOF-W, MPC, HBD
7
amazingly: this may be exactly what we want…
near term plan: Run-8: d+Au and 200 GeV p+p Run-9: 200 GeV Au+Au and p+p at 200 and/or 500 GeV Run-10: Au+Au energy scan and 500 GeV p+p Run-11 and after VTX fully installed ongoing major installation of muon trigger hardware FVTX and NCC installation
8
PRDF File Transport BNL -> Vanderbilt
11 Week history of GridFTP transfers > Vanderbilt 5810 PRDF file segments transferred Measured speeds of 35 MBytes/second for 6 simultaneous PRDF segments No effort was made to go faster Containing 30.3 TBytes = ~275M events GridFTP receiving server at Vanderbilt has been completely stable since April 22 Have seen sustained 100 MBytes/second I/O on this server (Martin + me together) No problems after April 22 when JFS replaced defective Eonstore disk manager Future status of Run7 PRDFs at Vanderbilt Files can remain on disk until the end of July Keeping files past July would cost PHENIX $900/year/TByte A significant number of PRDFs are still to be reconstructed for Central Arm Some still missing calibrations? Others left undone because of no space for output at RCF (as of June 26) Possible use of PRDFs is to do Muon Arm reconstruction (see later slide) August 8, 2007 Meeting at ACCRE
9
PRDFs Vanderbilt Farm 1600 CPUs, 80 TBytes disk 45TB and 200 CPUs
Available for Run7 Reconstruction RCF RHIC computing facility Reconstruction 200 jobs/cycle PRDFs nanoDSTs 12 hours/job 3 FDT 45 MB/s FDT 45 MB/s 1 770 GBytes per cycle PRDFs Raw data files GridFTP to VU 30 MBytes/sec 2 nanoDSTs Reco output to RCF GridFTP 23 MB/sec “Il y avait une fois” Dedicated GridFTP Server Firebird 4 4.4TB of buffer disk space August 8, 2007 Meeting at ACCRE
10
Reconstruction of the PRDFs
Library build testing April 19 to June 6 (July’s full production at RCF will get off to a running start) Pre-pro.75 libraries found to be missing output based on a test suite developed by us Also memory leaks leading up to 850 MBytes per job The pro.75 library (May 17) fixed major memory leaks but also had its own bugs Memory more stable at ~650 MBytes for this Central Arm reconstruction Strange 25% slow down on the quad-CPUs at Vanderbilt (not understood by CMS group) A 12 CPU hour jobs took 17 real hours on a quad-CPU running 4 PHENIX jobs By contrast 2 PHENIX jobs running on a dual-CPU were 92% efficient The pro.76 library went into operation on June 6 until June 26 Memory size and CPU times were comparable to pro.76 The 25% slow down effect on the quad-CPUs disappeared! The quad-CPUs running 4 PHENIX jobs were as efficient as dual CPUs with 2 PHENIX jobs Plan to investigate the pro.75 quad-CPU effect after return to Vanderbilt on July 24 Production Experience June 6 to June 26 Completely automated system, ~100 scripts (in CVS) managing the complex shuttling of the files Only one system-wide glitch encountered on the week-end June 9-10 (excellent ACCRE service !!) A “brood” (collection) of compute nodes lost their IP connectivity to the outside world Their loss of database access acted as “black holes” killing almost all jobs in the submit queue We were able to process files more than 2.5 as fast as we received them (400 jobs in 31 hours) August 8, 2007 Meeting at ACCRE
11
The Odyssey of the Output Production
Output production and transport basics The pro.76 Central Arm macros output size is 70% of the input size Production runs limited to batches of 200 jobs, ~1.08 Tbyte, taking ~16 total hours allowing for late starting jobs because of a sometimes busy system One output set comprises ~770 GBytes Each output set is transferred to RCF NFS disks in two stages FDT internal transport at 47 MBytes/second to GridFTP server, ~5 hours gridFTP transport to RCF at 23 MBytes/second, ~9 hours Total transport time is ~14 hours, which is (was) well-matched to the CPU cycle For 30 TBytes input PRDFs we would be producing 21 TBytes output Output production and transport experience (strangled by success) The 21 TBytes of NFS output space at RCF was never available on “clean” disks Production went to unused portions of already occupied disks, provided those unused portions were at least 1 TByte in size (had to leave a 200 Gbyte buffer for other users) After ~13 TBytes has been transferred almost all the suitable unused portions were exhausted, even at the 600 GByte unused space level corresponding to 100 jobs per CPU cycle GridFTP service to suitable PHENIX production disks at RCF also collapsed in the past few days Likely a multitude of users accessing already transported files (the whole point of the exercise!) August 8, 2007 Meeting at ACCRE
12
Coping with the RCF NFS Disks
Three coping problems Disk space for the entire project was never reserved in advance: on any given day or night we would not know for sure if a specific disk had 1TByte Automated gridFTP transport scripts would fail due to write errors at RCF Even if a disk had 1 Tbyte, it might be in such heavy use that the gridFTP transfer speeds were too low, at single digit MBytes/second in some cases Three coping solutions Imported the output of Carla’s updating www page for NFS disk use Determined which disks had 1 TByte just before the gridFTP to RCF was started Assumed that disk would not otherwise fill up in the following ~9 hours Developed a recursive, fault-tolerant gridFTP transfer script Assumed that there could be a write error to the RCF disk at any time Examined the verbose output of the gridFTP log for when the failure occurred Restarted the transport script but with a new list of fewer (uncopied) input files Invented a “horse race” gridFTP script using one 1.6 GByte CNT_MB test file Script determined which available RCF disk had the fastest gridFTP rate August 8, 2007 Meeting at ACCRE
13
Near-Future Vanderbilt Proposal to DOE
Intend to write a supplemental proposal to DOE in July-August Proposal will aim to have the ACCRE farm eventually process as much as 1/3 of PHENIX data If we could do (have done) it this year it would save 2 months off the Run7 reco time at RCF The extra two months would allow for a more deliberate analysis process before QM’08 The proposal would also include ideas about the Vanderbilt group’s future role in CMS-HI The US CMS-HI groups do not have an established computer center comparable to ACCRE The production infrastructure developed for the PHENIX Run7 could be re-used in CMS Seasonal synergy advantages may be possible: RHIC and LHC could run out-of-phase Power costs highest in summer months on Long Island, but highest in winter months in Geneva The PHENIX Run7 project has identified particular weak points in the computing infrastructure How can we efficiently deliver large quantities of output which are readily user-accessible? The same problem is confronting the CMS experiment for their Tier-2 outputs Vanderbilt CMS-HEP group is taking a lead role in exploring new solutions for this problem ACCRE people also working with the VU Medical Center on a related storage problem Doctors want to share rapidly sets of large output files (MRI scans, …) to a nationwide hospital audience for interactive diagnosis A preliminary proposal was contained in our 3-year proposal last Fall This proposal was received with very positive comments; need to re-work with real numbers now We were told then (continuing-resolution time) to write a supplemental proposal in mid-2007 August 8, 2007 Meeting at ACCRE
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.