Presentation is loading. Please wait.

Presentation is loading. Please wait.

Expanding the PHENIX Reconstruction Universe

Similar presentations


Presentation on theme: "Expanding the PHENIX Reconstruction Universe"— Presentation transcript:

1 Expanding the PHENIX Reconstruction Universe
C.F. Maguire, P. Sheldon, A. Tackett Vanderbilt University October 11, 2005 PHENIX Computing Meeting

2 PHENIX Computing Meeting
Outline Why we must augment PHENIX reconstruction sites Description of the ACCRE facility What ACCRE can propose to PHENIX Missing information or infrastructure? How should we proceed? October 11, 2005 PHENIX Computing Meeting

3 Need to Expand PHENIX Reconstruction Universe
Run4 Experience (data from Table 1 of Run6 BUP) 270 TBytes of Au+Au 200 GeV data taken corresponding to 241 b-1 with data taking completed by June 2004 (10 TBytes of 62.4 GeV Au+Au and 35 TBytes of pp 200 GeV data also taken) Last of Run4 data reconstruction and analysis completed only slightly before (May-June) QM a long wait by all Run6 Planning Hope to obtain a factor of 4 increase in Au+Au 200 GeV data size (last run for Au+Au with minimum radiation length in central arm) How do we plan to reconstruct this 1 PByte data set? Can we have significant amounts of data in time for QM’06 (Nov. 2006?), QM’07? Deliberately provocative statement to us all from the spokesperson “Reconstruction time is unsolved and unmanageable at this point.” One solution: expand the universe of PHENIX reconstruction facilities building on what we learn from similar efforts in Run5 October 11, 2005 PHENIX Computing Meeting

4 Off-Site Reconstruction in Run5 (as quoted in Run6 BUP)
Level 2 triggered data reconstructed at ORNL Impressive showing of J/Psi Cu+Cu results at QM’05 Excellent near-real time feedback on quality of J/Psi data during the run itself ORNL wants to expand on its capability for future Runs Run5 pp polarized data to CC-J Well publicized 60-day continuous transfer of data from counting house buffer boxes to Riken computer center in Japan Highlighted at last month’s JPS/DNP meeting in Maui Also a main article in CERN courier newsletter this summer 270 TBytes of data were transferred corresponding to a sustained rate of 60 MBytes/second (special network topology) Data stored in HPSS at CC-J to be reconstructed later for analysis presentations during October 2005 PANIC meeting. October 11, 2005 PHENIX Computing Meeting

5 What is ACCRE at Vanderbilt?
Advanced Computing Center for Research and Education Collaborative $8.5M computing resource funded by Vanderbilt Presently consists of over 1500 processors and 50 TB of disk (VU group has its own dedicated 4.5 TB for PHENIX simulations) Much work by Medical Center and Engineering school researchers as well as by Physics Department groups ACCRE eager to get into physics experiment reconstruction first PHENIX and then CMS Previous PHENIX Use of ACCRE First used extensively for supporting QM’02 simulations Order of magnitude increased work during QM’05 simulations QM’05 simulation effort hardly came close to tapping ACCRE’s full potential use for PHENIX Discovered that the major roadblock to expanding use was the need to gain an order of magnitude increase in sustained, reliable I/O rate back to BNL October 11, 2005 PHENIX Computing Meeting

6 What ACCRE Can Propose (subject to actual benchmarking on ACCRE CPUs)
Assume the PHENIX Run6 BUP scenario Begin with 13 weeks of Au+Au at 200 GeV, goal of 1 nb-1 Data will be a mix of triggered and min bias Assume that 1 PByte will eventually be generated, corresponds to 127 MBytes/second (!) in a 13 week period (can DAQ really do this?) ACCRE proposes to process 15% of these data (150 TBytes) Corresponds to 19 MBytes/second sustained transfer to ACCRE This is 1/3 the rate achieved to CC-J from BNL from counting house Data would be reconstructed in near-real time at ACCRE since no large archival system is available at Vanderbilt 10K Min Bias Run4 events reconstructed in 7 CPU hours (Carla Vale ) Run4 270 TBytes = 1.3 billion events -> 720 million events net to ACCRE Steady state requires 230 CPUs running continuously for the 13 weeks in order to reconstruct these 720 million events (= 500K CPU-hrs total for 150 TBytes) Realistic duty (safety) factor 0.7 means 330 CPUs should be available Reconstructed output must be returned immediately to BNL Assume reconstructed output data size = 25% of input data size (?) This would require 5 MBytes/second sustained on return trip to BNL October 11, 2005 PHENIX Computing Meeting

7 Missing Information and Infrastructure
What will RHIC be running in Run6 and when? Does it make more sense to reconstruct the Level2 triggered events instead of the MB events? This is what ORNL did for Run5 with many fewer CPUs What are the event reconstruction times on ACCRE CPUs? Missing infrastructure? We must transfer the data while it is still on the buffer boxes Can the special network topology created for the Run5 pp data transfer to CC-J be expanded to accommodate transfers to ACCRE? Can the buffer boxes handle the additional I/O load? QM’05 simulations used BBFTP tool to RCF but this was too slow We want start with gridFTP on ACCRE (must still be demonstrated) How much additional disk space do we need at ACCRE? At 25 MBytes/second then 30 TBytes corresponds to two weeks buffer. What about newer alternatives to gridFTP, e.g. IBP depots? October 11, 2005 PHENIX Computing Meeting

8 PHENIX Computing Meeting
How Should We Proceed? Coordination needed within PHENIX and with RCF The Run5 remote sites will want to continue their efforts in Run6 Coordination needed between the sites to share available BW There was obvious BBFTP competition between CCJ and VU in the summer What new infrastructure is needed at BNL to support this effort? Will transfer of reconstructed output into HPSS become an issue? A proposal will be made to DOE to support this effort This work should not become a net cost to ACCRE DOE is getting the benefit of 15% faster turnaround in the analysis The ~330 CPUs are available for sure in Run6, but how to we ensure that another VU group doesn’t budget for them in the future? October 11, 2005 PHENIX Computing Meeting


Download ppt "Expanding the PHENIX Reconstruction Universe"

Similar presentations


Ads by Google