Expanding the PHENIX Reconstruction Universe

Slides:



Advertisements
Similar presentations
Copyright © 2005 Department of Computer Science CPSC 641 Winter PERFORMANCE EVALUATION Often in Computer Science you need to: – demonstrate that.
Advertisements

Data Management for Physics Analysis in PHENIX (BNL, RHIC) Evaluation of Grid architecture components in PHENIX context Barbara Jacak, Roy Lacey, Saskia.
1 PERFORMANCE EVALUATION H Often in Computer Science you need to: – demonstrate that a new concept, technique, or algorithm is feasible –demonstrate that.
Data oriented job submission scheme for the PHENIX user analysis in CCJ Tomoaki Nakamura, Hideto En’yo, Takashi Ichihara, Yasushi Watanabe and Satoshi.
CCJ Computing Center in Japan for spin physics at RHIC T. Ichihara, Y. Watanabe, S. Yokkaichi, O. Jinnouchi, N. Saito, H. En’yo, M. Ishihara,Y.Goto (1),
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
RIKEN CCJ Project Regional computing center in Japan for BNL-RHIC experiment especially for PHENIX collaboration. CCJ serves for RHIC physics activity.
An Overview of PHENIX Computing Ju Hwan Kang (Yonsei Univ.) and Jysoo Lee (KISTI) International HEP DataGrid Workshop November 8 ~ 9, 2002 Kyungpook National.
100 Million events, what does this mean ?? STAR Grid Program overview.
8th November 2002Tim Adye1 BaBar Grid Tim Adye Particle Physics Department Rutherford Appleton Laboratory PP Grid Team Coseners House 8 th November 2002.
14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.
CDF Offline Production Farms Stephen Wolbers for the CDF Production Farms Group May 30, 2001.
PHENIX and the data grid >400 collaborators Active on 3 continents + Brazil 100’s of TB of data per year Complex data with multiple disparate physics goals.
BNL Wide Area Data Transfer for RHIC & ATLAS: Experience and Plans Bruce G. Gibbard CHEP 2006 Mumbai, India.
CC-J Monthly Report Shin’ya Sawada (KEK) for CC-J Working Group
February 28, 2003Eric Hjort PDSF Status and Overview Eric Hjort, LBNL STAR Collaboration Meeting February 28, 2003.
1 The PHENIX Experiment in the RHIC Run 7 Martin L. Purschke, Brookhaven National Laboratory for the PHENIX Collaboration RHIC from space Long Island,
PHENIX Simulation System 1 January 12, 2000 Simulation: Status for VRDC Tarun Ghosh, Indrani Ojha, Charles Vanderbilt University.
PHENIX and the data grid >400 collaborators 3 continents + Israel +Brazil 100’s of TB of data per year Complex data with multiple disparate physics goals.
CMS Computing Model Simulation Stephen Gowdy/FNAL 30th April 2015CMS Computing Model Simulation1.
RCF Status - Introduction PHENIX and STAR Counting Houses are connected to RCF at a Network Bandwidth of 20 Gbits/sec each –Redundant (Bandwidth-wise and.
Run 14 RHIC Machine/Experiments Meeting 6 May 2014 Agenda: Run 14 Schedule (Pile) Machine Status (Robert-Demolaize) STAR and PHENIX Status (Experiments)
Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013
WLCG Status Report Ian Bird Austrian Tier 2 Workshop 22 nd June, 2010.
CCJ introduction RIKEN Nishina Center Kohei Shoji.
Hall D Computing Facilities Ian Bird 16 March 2001.
Run 14 RHIC Machine/Experiments Meeting 24 June 2014 Agenda: Run 14 Schedule (Pile) Machine Status (Robert-Demolaize) STAR and PHENIX Status (Experiments)
Evolution of storage and data management
ILC Cryogenics: Study of Emergency Action and Recovery - in progress -
Charles Maguire et (beaucoup d’) al.
Ian Bird WLCG Workshop San Francisco, 8th October 2016
The Vanderbilt Effort in CMS Vanderbilt University
CMS-HI Offline Computing
Networking and ITS Issues for PHENIX*
Predrag Buncic ALICE Status Report LHCC Referee Meeting CERN
Charles Maguire for VU-RHIC group
LHC experiments Requirements and Concepts ALICE
Data Challenge with the Grid in ATLAS
for the Offline and Computing groups
The CMS-HI Computing Plan Vanderbilt University
Vanderbilt Tier 2 Project
Simulation use cases for T2 in ALICE
US CMS Testbed.
Modeling and Simulation CS 313
TYPES OFF OPERATING SYSTEM
Staff Scheduling at USPS Mail Processing & Distribution Centers
Preparations for the CMS-HI Computing Workshop in Bologna
Business Contingency Planning
Near Real Time Reconstruction of PHENIX Run7 Minimum Bias Data From RHIC Project Goals Reconstruct 10% of PHENIX min bias data from the RHIC Run7 (Spring.
Vanderbilt University
Nuclear Physics Data Management Needs Bruce G. Gibbard
Proposal to BNL/DOE to Use ACCRE Farm for PHENIX Real Data Reconstruction in Charles Maguire for the VU PHENIX group Carie Kennedy, Paul Sheldon,
Preparations for Reconstruction of Run6 Level2 Filtered PRDFs at Vanderbilt’s ACCRE Farm Charles Maguire et al. March 14, 2006 Local Group Meeting.
The Role of Prototyping
Computer Systems Performance Evaluation
CS222P: Principles of Data Management UCI, Fall 2018 Notes #09 External Sorting Instructor: Chen Li.
TeraScale Supernova Initiative
Status of CMS-HI Compute Proposal for USDOE
Status of CMS-HI Compute Proposal for USDOE
Heavy Ion Physics Program of CMS Proposal for Offline Computing
Heavy Ion Physics Program of CMS Proposal for Offline Computing
CMS-HI Offline Computing
CS222: Principles of Data Management Lecture #10 External Sorting
Computer Systems Performance Evaluation
Preparations for Reconstruction of Run7 Min Bias PRDFs at Vanderbilt’s ACCRE Farm (more substantial update set for next week) Charles Maguire et al. March.
The “Other” STAR-PHENIX Discrepancy Differences in the f analyses
External Sorting.
CS222P: Principles of Data Management Lecture #10 External Sorting
Development of LHCb Computing Model F Harris
Presentation transcript:

Expanding the PHENIX Reconstruction Universe C.F. Maguire, P. Sheldon, A. Tackett Vanderbilt University October 11, 2005 PHENIX Computing Meeting

PHENIX Computing Meeting Outline Why we must augment PHENIX reconstruction sites Description of the ACCRE facility What ACCRE can propose to PHENIX Missing information or infrastructure? How should we proceed? October 11, 2005 PHENIX Computing Meeting

Need to Expand PHENIX Reconstruction Universe Run4 Experience (data from Table 1 of Run6 BUP) 270 TBytes of Au+Au 200 GeV data taken corresponding to 241 b-1 with data taking completed by June 2004 (10 TBytes of 62.4 GeV Au+Au and 35 TBytes of pp 200 GeV data also taken) Last of Run4 data reconstruction and analysis completed only slightly before (May-June) QM2005 - a long wait by all Run6 Planning Hope to obtain a factor of 4 increase in Au+Au 200 GeV data size (last run for Au+Au with minimum radiation length in central arm) How do we plan to reconstruct this 1 PByte data set? Can we have significant amounts of data in time for QM’06 (Nov. 2006?), QM’07? Deliberately provocative statement to us all from the spokesperson “Reconstruction time is unsolved and unmanageable at this point.” One solution: expand the universe of PHENIX reconstruction facilities building on what we learn from similar efforts in Run5 October 11, 2005 PHENIX Computing Meeting

Off-Site Reconstruction in Run5 (as quoted in Run6 BUP) Level 2 triggered data reconstructed at ORNL Impressive showing of J/Psi Cu+Cu results at QM’05 Excellent near-real time feedback on quality of J/Psi data during the run itself ORNL wants to expand on its capability for future Runs Run5 pp polarized data to CC-J Well publicized 60-day continuous transfer of data from counting house buffer boxes to Riken computer center in Japan Highlighted at last month’s JPS/DNP meeting in Maui Also a main article in CERN courier newsletter this summer 270 TBytes of data were transferred corresponding to a sustained rate of 60 MBytes/second (special network topology) Data stored in HPSS at CC-J to be reconstructed later for analysis presentations during October 2005 PANIC meeting. October 11, 2005 PHENIX Computing Meeting

What is ACCRE at Vanderbilt? Advanced Computing Center for Research and Education Collaborative $8.5M computing resource funded by Vanderbilt Presently consists of over 1500 processors and 50 TB of disk (VU group has its own dedicated 4.5 TB for PHENIX simulations) Much work by Medical Center and Engineering school researchers as well as by Physics Department groups ACCRE eager to get into physics experiment reconstruction first PHENIX and then CMS Previous PHENIX Use of ACCRE First used extensively for supporting QM’02 simulations Order of magnitude increased work during QM’05 simulations QM’05 simulation effort hardly came close to tapping ACCRE’s full potential use for PHENIX Discovered that the major roadblock to expanding use was the need to gain an order of magnitude increase in sustained, reliable I/O rate back to BNL October 11, 2005 PHENIX Computing Meeting

What ACCRE Can Propose (subject to actual benchmarking on ACCRE CPUs) Assume the PHENIX Run6 BUP scenario Begin with 13 weeks of Au+Au at 200 GeV, goal of 1 nb-1 Data will be a mix of triggered and min bias Assume that 1 PByte will eventually be generated, corresponds to 127 MBytes/second (!) in a 13 week period (can DAQ really do this?) ACCRE proposes to process 15% of these data (150 TBytes) Corresponds to 19 MBytes/second sustained transfer to ACCRE This is 1/3 the rate achieved to CC-J from BNL from counting house Data would be reconstructed in near-real time at ACCRE since no large archival system is available at Vanderbilt 10K Min Bias Run4 events reconstructed in 7 CPU hours (Carla Vale e-mail) Run4 270 TBytes = 1.3 billion events -> 720 million events net to ACCRE Steady state requires 230 CPUs running continuously for the 13 weeks in order to reconstruct these 720 million events (= 500K CPU-hrs total for 150 TBytes) Realistic duty (safety) factor 0.7 means 330 CPUs should be available Reconstructed output must be returned immediately to BNL Assume reconstructed output data size = 25% of input data size (?) This would require 5 MBytes/second sustained on return trip to BNL October 11, 2005 PHENIX Computing Meeting

Missing Information and Infrastructure What will RHIC be running in Run6 and when? Does it make more sense to reconstruct the Level2 triggered events instead of the MB events? This is what ORNL did for Run5 with many fewer CPUs What are the event reconstruction times on ACCRE CPUs? Missing infrastructure? We must transfer the data while it is still on the buffer boxes Can the special network topology created for the Run5 pp data transfer to CC-J be expanded to accommodate transfers to ACCRE? Can the buffer boxes handle the additional I/O load? QM’05 simulations used BBFTP tool to RCF but this was too slow We want start with gridFTP on ACCRE (must still be demonstrated) How much additional disk space do we need at ACCRE? At 25 MBytes/second then 30 TBytes corresponds to two weeks buffer. What about newer alternatives to gridFTP, e.g. IBP depots? October 11, 2005 PHENIX Computing Meeting

PHENIX Computing Meeting How Should We Proceed? Coordination needed within PHENIX and with RCF The Run5 remote sites will want to continue their efforts in Run6 Coordination needed between the sites to share available BW There was obvious BBFTP competition between CCJ and VU in the summer What new infrastructure is needed at BNL to support this effort? Will transfer of reconstructed output into HPSS become an issue? A proposal will be made to DOE to support this effort This work should not become a net cost to ACCRE DOE is getting the benefit of 15% faster turnaround in the analysis The ~330 CPUs are available for sure in Run6, but how to we ensure that another VU group doesn’t budget for them in the future? October 11, 2005 PHENIX Computing Meeting