100 Million events, what does this mean ?? STAR Grid Program overview.

Slides:



Advertisements
Similar presentations
Database Architectures and the Web
Advertisements

Distributed Xrootd Derek Weitzel & Brian Bockelman.
T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.
Condor and GridShell How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner - PSC Edward Walker - TACC Miron Livney - U. Wisconsin Todd Tannenbaum.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
STAR Software Walk-Through. Doing analysis in a large collaboration: Overview The experiment: – Collider runs for many weeks every year. – A lot of data.
Grid Collector: Enabling File-Transparent Object Access For Analysis Wei-Ming Zhang Kent State University John Wu, Alex Sim, Junmin Gu and Arie Shoshani.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
J OINT I NSTITUTE FOR N UCLEAR R ESEARCH OFF-LINE DATA PROCESSING GRID-SYSTEM MODELLING FOR NICA 1 Nechaevskiy A. Dubna, 2012.
STAR scheduling future directions Gabriele Carcassi 9 September 2002.
BaBar MC production BaBar MC production software VU (Amsterdam University) A lot of computers EDG testbed (NIKHEF) Jobs Results The simple question:
Central Reconstruction System on the RHIC Linux Farm in Brookhaven Laboratory HEPIX - BNL October 19, 2004 Tomasz Wlodek - BNL.
D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.
Take on messages from Lecture 1 LHC Computing has been well sized to handle the production and analysis needs of LHC (very high data rates and throughputs)
Overview Why are STAR members encouraged to use SUMS ? Improvements and additions to SUMS Research –Job scheduling with load monitoring tools –Request.
1 Use of SRMs in Earth System Grid Arie Shoshani Alex Sim Lawrence Berkeley National Laboratory.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
8th November 2002Tim Adye1 BaBar Grid Tim Adye Particle Physics Department Rutherford Appleton Laboratory PP Grid Team Coseners House 8 th November 2002.
Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.
PPDG and ATLAS Particle Physics Data Grid Ed May - ANL ATLAS Software Week LBNL May 12, 2000.
ALICE Upgrade for Run3: Computing HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.
STAR Analysis Meeting, BNL, Dec 2004 Alexandre A. P. Suaide University of Sao Paulo Slide 1 BEMC software and calibration L3 display 200 GeV February.
Grid Workload Management Massimo Sgaravatto INFN Padova.
Data Grid projects in HENP R. Pordes, Fermilab Many HENP projects are working on the infrastructure for global distributed simulated data production, data.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Using Bitmap Index to Speed up Analyses of High-Energy Physics Data John Wu, Arie Shoshani, Alex Sim, Junmin Gu, Art Poskanzer Lawrence Berkeley National.
PHENIX and the data grid >400 collaborators Active on 3 continents + Brazil 100’s of TB of data per year Complex data with multiple disparate physics goals.
09/02 ID099-1 September 9, 2002Grid Technology Panel Patrick Dreher Technical Panel Discussion: Progress in Developing a Web Services Data Analysis Grid.
STAR Software Walk-Through. Doing analysis in a large collaboration: Overview The experiment: – Collider runs for many weeks every year. – A lot of data.
1 GCA Application in STAR GCA Collaboration Grand Challenge Architecture and its Interface to STAR Sasha Vaniachine presenting for the Grand Challenge.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Data reprocessing for DZero on the SAM-Grid Gabriele Garzoglio for the SAM-Grid Team Fermilab, Computing Division.
PPDG update l We want to join PPDG l They want PHENIX to join NSF also wants this l Issue is to identify our goals/projects Ingredients: What we need/want.
CC-J Monthly Report Shin’ya Sawada (KEK) for CC-J Working Group
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
February 28, 2003Eric Hjort PDSF Status and Overview Eric Hjort, LBNL STAR Collaboration Meeting February 28, 2003.
STAR Collaboration, July 2004 Grid Collector Wei-Ming Zhang Kent State University John Wu, Alex Sim, Junmin Gu and Arie Shoshani Lawrence Berkeley National.
January 26, 2003Eric Hjort HRMs in STAR Eric Hjort, LBNL (STAR/PPDG Collaborations)
PHENIX and the data grid >400 collaborators 3 continents + Israel +Brazil 100’s of TB of data per year Complex data with multiple disparate physics goals.
PERFORMANCE AND ANALYSIS WORKFLOW ISSUES US ATLAS Distributed Facility Workshop November 2012, Santa Cruz.
Tier3 monitoring. Initial issues. Danila Oleynik. Artem Petrosyan. JINR.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Data and storage services on the NGS.
April 25, 2006Parag Mhashilkar, Fermilab1 Resource Selection in OSG & SAM-On-The-Fly Parag Mhashilkar Fermi National Accelerator Laboratory Condor Week.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.
Production Mode Data-Replication Framework in STAR using the HRM Grid CHEP ’04 Congress Centre Interlaken, Switzerland 27 th September – 1 st October Eric.
STAR Scheduler Gabriele Carcassi STAR Collaboration.
Grid Activities in CMS Asad Samar (Caltech) PPDG meeting, Argonne July 13-14, 2000.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
ATLAS Distributed Computing Tutorial Tags: What, Why, When, Where and How? Mike Kenyon University of Glasgow.
(on behalf of the POOL team)
U.S. ATLAS Grid Production Experience
GWE Core Grid Wizard Enterprise (
Readiness of ATLAS Computing - A personal view
Grid Computing.
US CMS Testbed.
Patrick Dreher Research Scientist & Associate Director
Nuclear Physics Data Management Needs Bruce G. Gibbard
Status of Grids for HEP and HENP
Expanding the PHENIX Reconstruction Universe
Presentation transcript:

100 Million events, what does this mean ?? STAR Grid Program overview

Current and projected Always nice to plan for lots of events, opens new physics topics, rare particle detailed studies (flow of multi-strange particles etc …) Lots of numbers to look at in the next slides …

Au+Au 200 GeV projections 1 Au+Au 200 Production Central MinMaxExpected Average DAQ 100 (25%) 1 event85sec115sec100 sec75 sec 1 M event24k CPU hour 32 k CPU hours 17 k CPU hours 13 k CPU hours Full RCF farm (150 nodes, 2 slots) 80 CPU hours 100 CPU hour 90 CPU hour67 CPU hours 3.3 days4.4 days3.75 days2.8 days Extrapolation to 100 Million 327 days 444 days375 days281 days 80% efficiency555 days470 days352 days 1.2 passes564 days422 days

Pause That’s right, a year IS 365 days !!! We are now speaking of moving to a year- based production regime … Gotta be better for minimum bias right ??

Au+Au 200 GeV projections 2 Au+Au 200 Minimum Bias MinMaxExpected Average DAQ 100 (25%) 1 event32 sec50 sec45 sec34 days 1 M event9 k CPU hour 14 k CPU hour 12 k CPU hour 9.5 k CPU hour Full RCF farm (150 nodes, 2 slots) 30 CPU hour 46 CPU hour 42 CPU hour32 CPU hours 1.3 days 2 days1.8 days1.4 days Extrapolation to 100 Million 124 days 193 days 174 days131 days 80% efficiency155 days 242 days 217 days164 days 1.2 passes261 days200 days

Useful exercise 50 M central175 M minbiasTotal if requested No DAQ100 processing time 282 days ~ 9 ½ month 456 days = 1 year and 3 months 738 days ~ 2 years DAQ100 processing time 211 days ~ 7 month350 days ~ 1 year561 days ~ 1 ½ year Total storage Size of event.root 105 TB129 TB234 TB Total storage Size of MuDst.root ~ 18 TB~ 18 TB (factor 7 used) ~ 36 TB Number of files estimated (MuDst) ½ a million files1 Million1 ½ Million Total estimation (data management) 2 ½ Million files5 Million files7 ½ Million Number of files estimated using the current number of events / file. DAQ10 implies a reduction by ~ 5

Immediate remarks 7 Million files !!!?? Real Data Management problem - Resilient ROOT IO - Cannot proliferate more “kind” of files - Good luck with private formats … - Catalog better be scalable (and efficient) - Find a needle in a hay stack … Processing time and data sample very large - Need to off load user analysis (running where we can). Data production is not ready for multi-site … - Code consolidation is necessary (yet another reason for cleaning) - MuDst transfer alone from BNL to PDSF (at 3 MB/sec) would take 145 days …

What can we do ?? Several ways to reduce CPU cycles, the usual suspects - Code optimization (has its limits / hot spots) - Try ICC ?? - Better use of resources - Offload user analysis (expands farm for production) [smells like grid already] - Bring more resources / facilities - Any other ideas ?? Data taking & Analysis side - Reduce the number of events : Trigger - Smart selection (Selected stream - Thorsten)

Better use of existing facilities ?? PDSF resources seems saturated CRS/CAS load balancing is not …

More external facilities ?? Investigation of resources at PSC - Processors there are 20% faster than a Pentium IV 2.8 GHz - Except that there are 700x4 of them ALREADY there and eager to have root4star running on them - AND if we build a good case, we can get up to 15% of that total (NSF grant) = that’s 50% more CPU power comparing to 100% of CRS+CAS+PDSF Network ? 30 MB/sec (TBC) and part of the TeraGrid From “worth a closer look” in February, I say “GOTTA TRY”.

Distributed Computing For large amount of data, intense data mining etc … distributed computing may be the key. In the U.S., three big Grid collaboration - iVDGL (International Virtual data Grid Laboratory) - GriPhyn (Grid Physics Network) - PPDG (Particle Physics Data Grid) PP what ?? STAR is part of PPDG since Year1 (2 years ago) CS & Experiments working together We collaborate with : SDM (SRM), U-Wisconsin (Condor), J-Lab and even possibly Phenix … STAR is part of PPDG since Year1 (2 years ago) CS & Experiments working together We collaborate with : SDM (SRM), U-Wisconsin (Condor), J-Lab and even possibly Phenix …

What do we Grid about ?? Data management - HRM based file transfer Eric Hjort & SDM group in production mode Since 2002, now in full production with 20% of our data transferred between BNL and NERSC : HRM BNL to/from PDSF Catalogue - FileCatalog (MetaData / Replica Catalog) development myself - Site-site file transfer & Catalog registration work myself & Alex Sim Replica Registration Service & defining necessary scheme to register files or datasets across sites Analysis / Job management - Resource Broker, batch (Scheduler) Gabriele Carcassi - Interactive Analysis Framework solution Kensheng (John) Wu

What do we (still) Grid about ?? Monitoring - Ganglia & MDS publishing Efstratios Efstathiadis Database - MySQL Grid-ification Richard Casella & Michael DePhillips Projects : - Condor / Condor-G Miron Levny - JDL, WebService project with J-Lab (next generation of grid architecture) Chip Watson Much more to do … See Much more to do … See /STAR/comp/ofl/reqmts2003/ If you are interested, will take you …

How does it change my life ?? Remote facilities (big or small) - file transfer and registration work allows moving data-sets with error recovery (no need to “pet” the transfer) - GridCollector does not require you to know where the files are, nor does the Scheduler (eliminate data placement task) - Grid enabled cluster bring ALL resources at reach Every day work - May not like it but … mind set change : collection of data (will fit some analysis, some not) - Transparent interfaces and interchangeable components (long term) - Hopefully more robust systems (error recovery already there) Any other reasons ?? - The Grid is coming, better get ready and understand it …

Conclusion Hard to get back to slide one but … - Be ready for YEAR long production, we are at the one order of magnitude off level … - With such programs, we MUST integrate other resources and help others to expand mini-farms Grid - Tools already exists for data management - Must take advantage of them - More work to do for a production Grid … but coming (first attempt planned for the coming year)