Nuclear Physics Data Management Needs Bruce G. Gibbard

Slides:



Advertisements
Similar presentations
Mass, Quark-number, Energy Dependence of v 2 and v 4 in Relativistic Nucleus- Nucleus Collisions Yan Lu University of Science and Technology of China Many.
Advertisements

Highest Energy e + e – Collider LEP at CERN GeV ~4km radius First e + e – Collider ADA in Frascati GeV ~1m radius e + e – Colliders.
T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.
Research Projects General Sciences Physical Sciences Energy Sciences Biosciences Ali Belkacem - Chemical Sciences MRC workshop: March 26, 2002.
Distributed IT Infrastructure for U.S. ATLAS Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
Using Metacomputing Tools to Facilitate Large Scale Analyses of Biological Databases Vinay D. Shet CMSC 838 Presentation Authors: Allison Waugh, Glenn.
Mass RHIC Computing Facility Razvan Popescu - Brookhaven National Laboratory.
CERN/IT/DB Multi-PB Distributed Databases Jamie Shiers IT Division, DB Group, CERN, Geneva, Switzerland February 2001.
CERN - European Laboratory for Particle Physics HEP Computer Farms Frédéric Hemmer CERN Information Technology Division Physics Data processing Group.
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
9/16/2000Ian Bird/JLAB1 Planning for JLAB Computational Resources Ian Bird.
Remote Production and Regional Analysis Centers Iain Bertram 24 May 2002 Draft 1 Lancaster University.
Central Reconstruction System on the RHIC Linux Farm in Brookhaven Laboratory HEPIX - BNL October 19, 2004 Tomasz Wlodek - BNL.
An Overview of PHENIX Computing Ju Hwan Kang (Yonsei Univ.) and Jysoo Lee (KISTI) International HEP DataGrid Workshop November 8 ~ 9, 2002 Kyungpook National.
U.S. ATLAS Computing Facilities Bruce G. Gibbard Brookhaven National Laboratory Review of U.S. LHC Software and Computing Projects LBNL, Berkeley, California.
Fermilab User Facility US-CMS User Facility and Regional Center at Fermilab Matthias Kasemann FNAL.
Introduction to U.S. ATLAS Facilities Rich Baker Brookhaven National Lab.
Tier 1 Facility Status and Current Activities Rich Baker Brookhaven National Laboratory NSF/DOE Review of ATLAS Computing June 20, 2002.
Jan. 17, 2002DØRAM Proposal DØRACE Meeting, Jae Yu 1 Proposal for a DØ Remote Analysis Model (DØRAM) IntroductionIntroduction Remote Analysis Station ArchitectureRemote.
Finnish DataGrid meeting, CSC, Otaniemi, V. Karimäki (HIP) DataGrid meeting, CSC V. Karimäki (HIP) V. Karimäki (HIP) Otaniemi, 28 August, 2000.
Instrumentation of the SAM-Grid Gabriele Garzoglio CSC 426 Research Proposal.
14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.
7April 2000F Harris LHCb Software Workshop 1 LHCb planning on EU GRID activities (for discussion) F Harris.
Developing & Managing A Large Linux Farm – The Brookhaven Experience CHEP2004 – Interlaken September 27, 2004 Tomasz Wlodek - BNL.
LOGO PROOF system for parallel MPD event processing Gertsenberger K. V. Joint Institute for Nuclear Research, Dubna.
Cracow Grid Workshop October 2009 Dipl.-Ing. (M.Sc.) Marcus Hilbrich Center for Information Services and High Performance.
EGEE is a project funded by the European Union under contract IST HEP Use Cases for Grid Computing J. A. Templon Undecided (NIKHEF) Grid Tutorial,
STAR Off-line Computing Capabilities at LBNL/NERSC Doug Olson, LBNL STAR Collaboration Meeting 2 August 1999, BNL.
BNL Wide Area Data Transfer for RHIC & ATLAS: Experience and Plans Bruce G. Gibbard CHEP 2006 Mumbai, India.
US ATLAS Tier 1 Facility Rich Baker Brookhaven National Laboratory DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National Laboratory.
Les Les Robertson LCG Project Leader High Energy Physics using a worldwide computing grid Torino December 2005.
ATLAS Tier 1 at BNL Overview Bruce G. Gibbard Grid Deployment Board BNL 5-6 September 2006.
1D. Olson, SDM-ISIC Mtg, 26 Mar 2002 Scientific Data Management: An Incomplete Experimental HENP Perspective D. Olson, LBNL 26 March 2002 SDM-ISIC Meeting.
ATLAS WAN Requirements at BNL Slides Extracted From Presentation Given By Bruce G. Gibbard 13 December 2004.
U.S. ATLAS Computing Facilities Bruce G. Gibbard GDB Meeting 16 March 2005.
Feb. 14, 2002DØRAM Proposal DØ IB Meeting, Jae Yu 1 Proposal for a DØ Remote Analysis Model (DØRAM) Introduction Partial Workshop Results DØRAM Architecture.
January 30, 2016 RHIC/USATLAS Computing Facility Overview Dantong Yu Brookhaven National Lab.
U.S. ATLAS Computing Facilities Overview Bruce G. Gibbard Brookhaven National Laboratory U.S. LHC Software and Computing Review Brookhaven National Laboratory.
Tier 1 at Brookhaven (US / ATLAS) Bruce G. Gibbard LCG Workshop CERN March 2004.
The ATLAS Computing Model and USATLAS Tier-2/Tier-3 Meeting Shawn McKee University of Michigan Joint Techs, FNAL July 16 th, 2007.
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
U.S. ATLAS Computing Facilities Bruce G. Gibbard Brookhaven National Laboratory Mid-year Review of U.S. LHC Software and Computing Projects NSF Headquarters,
US ATLAS Tier 1 Facility Rich Baker Deputy Director US ATLAS Computing Facilities October 26, 2000.
10-Jan-00 CERN Building a Regional Centre A few ideas & a personal view CHEP 2000 – Padova 10 January 2000 Les Robertson CERN/IT.
U.S. ATLAS Computing Facilities DOE/NFS Review of US LHC Software & Computing Projects Bruce G. Gibbard, BNL January 2000.
U.S. ATLAS Computing Facilities U.S. ATLAS Physics & Computing Review Bruce G. Gibbard, BNL January 2000.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
Hans Wenzel CDF CAF meeting October 18 th -19 th CMS Computing at FNAL Hans Wenzel Fermilab  Introduction  CMS: What's on the floor, How we got.
Jianming Qian, UM/DØ Software & Computing Where we are now Where we want to go Overview Director’s Review, June 5, 2002.
1 Particle Physics Data Grid (PPDG) project Les Cottrell – SLAC Presented at the NGI workshop, Berkeley, 7/21/99.
Scientific Computing at Fermilab Lothar Bauerdick, Deputy Head Scientific Computing Division 1 of 7 10k slot tape robots.
WLCG Tier-2 Asia Workshop TIFR, Mumbai 1-3 December 2006
Grid site as a tool for data processing and data analysis
Electron Ion Collider New aspects of EIC experiment instrumentation and computing, as well as their possible impact on and context in society (B) COMPUTING.
The LHC Computing Grid Visit of Mtro. Enrique Agüera Ibañez
Pasquale Migliozzi INFN Napoli
evoluzione modello per Run3 LHC
The LHC Computing Grid Visit of Her Royal Highness
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
LHC Data Analysis using a worldwide computing grid
Heavy Ion Physics Program of CMS Proposal for Offline Computing
SLAC B-Factory BaBar Experiment WAN Requirements
RHIC Computing Facility Processing Systems
Using an Object Oriented Database to Store BaBar's Terabytes
Proposal for a DØ Remote Analysis Model (DØRAM)
LHCb thinking on Regional Centres and Related activities (GRIDs)
The LHC Computing Grid Visit of Professor Andreas Demetriou
Expanding the PHENIX Reconstruction Universe
Presentation transcript:

Nuclear Physics Data Management Needs Bruce G. Gibbard SLAC DMW2004 Workshop 16-18 March 2004

Overview Addressing a class of Nuclear Physics (NP) experiments utilizing large particle detector systems to study accelerator produced reactions Examples at: BNL (RHIC), JLab, CERN (LHC) Technologies & data management needs of this branch of NP are quite similar to HEP Integrating across its four experiments, the Relativistic Heavy Ion Collider (RHIC) at BNL is currently the most prolific producer of data Study of very high energy collisions of heavy ions (up to Au on Au) High nucleon count, high energy => high multiplicity High multiplicity, high luminosity and fine detector granularity => very high data rates Raw data recording at up to ~250 MBytes/sec A Toroidal Lhc ApparatuS (ATLAS) 17 March 2004 B. Gibbard

Digitized Event In STAR at RHIC 17 March 2004 B. Gibbard

IT Activities of Such NP Experiments Support the basic computing infrastructure for experimental collaboration Typically large, 100’s of physicist, and internationally distributed Manage & distribute code, design, cost, & schedule databases Facilitate communication, documentation and decision making Store, process, support analysis of, and serve data Online recording of Raw data Generation and recording of Simulated data Construction of Summary data from Raw and Simulated data Iterative generation of Distilled Data Subsets from Summary data Serve Distilled Data Subsets and analysis capability to widely distributed individual physicists    Data Intensive Activities 17 March 2004 B. Gibbard

Data Handling Limited 17 March 2004 B. Gibbard

Data Volumes in Current RHIC Run Raw Data (PHENIX) Peak rates to 120 MBytes/sec First 2 months of ’04, Jan & Feb 109 Events 160 TBytes Project ~ 225 TBytes of Raw data for Current Run Derived Data (PHENIX) Construction of Summary Data from Raw Data then production of distilled subsets from that Summary Data Project ~270 TBytes of Derived data Total (all of RHIC) = 1.2 PBytes for Current Run STAR = PHENIX BRAHMS + PHOBOS = ~ 40% of PHENIX 17 March 2004 B. Gibbard

RHIC Raw Data Recording Rate 120MBytes/sec PHENIX 120MBytes/sec STAR 17 March 2004 B. Gibbard

Current RHIC Technology Tertiary Storage StorageTek / HPSS 4 Silos – 4.5 PBytes (1.5 PBytes currently filled) 1000 MB/sec theoretical native I/O bandwidth Online Storage Central NFS served disk ~170 TBytes of FibreChannel Connected RAID 5 ~1200 MBytes/sec served by 32 SUN SMP’s Distributed disk ~300 TBytes of SCSI/IDE Locally mounted on Intel/Linux farm nodes Compute ~1300 Dual Processor Red Hat Linux / Intel Nodes ~2600 CPU’s => ~1,400 kSPECint2K (3-4 TFLOPS) 17 March 2004 B. Gibbard

Projected Growth in Capacity Scale Moore’s Law effect of component replacement in experiment DAQ’s & in computing facilities => ~X6 increase in 5 years Not yet fully specified requirements of RHIC II and eRHIC upgrades are likely to accelerate growth Disk Volume at RHIC  17 March 2004 B. Gibbard

NP Analysis Limitations (1) Underlying the Data Management issue Events (interactions) of interest are rare relative to minimum bias events Threshold / phase space effect for each new energy domain Combinatorics of large multiplicity events of all kinds confound selection of interesting events Combinatorics also create backgrounds to signals of interest Two analysis approaches Topological: typically with Many qualitative &/or quantitative constraints on data sample Relatively low background to signal Modest number of events in final analysis data sample Statistical: frequently with More poorly constrained sample Large background (signal is small difference between large numbers) Large number of events in final analysis data sample 17 March 2004 B. Gibbard

NP Analysis Limitations (2) It seems that it is less frequently possible to do Topological Analyses in NP than in HEP so Statistical Analyses are more often required Evidence for this is rather anecdotal – not all would agree To the extent that it is true, final analysis data sets tend to be large These are the data sets accessed very frequently by large numbers of users … thus exacerbating the data management problem In any case the extraction and the delivery of distilled data subsets to physicists for analysis currently most limits NP analyses 17 March 2004 B. Gibbard

Grid / Data Management Issues Major RHIC experiments are moving (have moved) complete copies of Summary Date to regional analysis centers STAR: to LBNL via Grid Tools PHENIX: to Riken via Tape/Airfreight Evolution toward more sites and full dependence on Grid RHIC, JLab, and NP at the LHC are all very interested and active in Grid development Including high performance reliable Wide Area data movement / replication / access services 17 March 2004 B. Gibbard

Conclusions NP and HEP accelerator/detector experiments have very similar Data Management requirements NP analyses of this type currently tend to be more Data than CPU limited “Mining” of Summary Data and affording end users adequate access (both Local and Wide Area) to the resulting distillate currently most limits NP analysis It is expected that this will remain the case for the next 4-6 years through Upgrades of RHIC and Jlab Start-up of LHC with Wide Area access growing in importance relative to Local access 17 March 2004 B. Gibbard