STAR C OMPUTING STAR MDC1 Experience and Revised Computing Requirements Torre Wenaus BNL RHIC Computing Advisory Committee Meeting BNL December 3-4, 1998.

Slides:



Advertisements
Similar presentations
Amber Boehnlein, FNAL D0 Computing Model and Plans Amber Boehnlein D0 Financial Committee November 18, 2002.
Advertisements

Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
Hall D Online Data Acquisition CEBAF provides us with a tremendous scientific opportunity for understanding one of the fundamental forces of nature. 75.
L3 Filtering: status and plans D  Computing Review Meeting: 9 th May 2002 Terry Wyatt, on behalf of the L3 Algorithms group. For more details of current.
CMS Report – GridPP Collaboration Meeting VI Peter Hobson, Brunel University30/1/2003 CMS Status and Plans Progress towards GridPP milestones Workload.
STAR C OMPUTING ROOT in STAR Torre Wenaus STAR Computing and Software Leader Brookhaven National Laboratory, USA ROOT 2000 Workshop, CERN February 3, 2000.
The GlueX Collaboration Meeting October 4-6, 2012 Jefferson Lab Curtis Meyer.
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
GLAST LAT ProjectDOE/NASA Baseline-Preliminary Design Review, January 8, 2002 K.Young 1 LAT Data Processing Facility Automatically process Level 0 data.
Remote Production and Regional Analysis Centers Iain Bertram 24 May 2002 Draft 1 Lancaster University.
An Overview of PHENIX Computing Ju Hwan Kang (Yonsei Univ.) and Jysoo Lee (KISTI) International HEP DataGrid Workshop November 8 ~ 9, 2002 Kyungpook National.
Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002.
Fermilab User Facility US-CMS User Facility and Regional Center at Fermilab Matthias Kasemann FNAL.
CHEP'07 September D0 data reprocessing on OSG Authors Andrew Baranovski (Fermilab) for B. Abbot, M. Diesburg, G. Garzoglio, T. Kurca, P. Mhashilkar.
Offline Coordinators  CMSSW_7_1_0 release: 17 June 2014  Usage:  Generation and Simulation samples for run 2 startup  Limited digitization and reconstruction.
Grid Status - PPDG / Magda / pacman Torre Wenaus BNL U.S. ATLAS Physics and Computing Advisory Panel Review Argonne National Laboratory Oct 30, 2001.
ALICE Upgrade for Run3: Computing HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.
David N. Brown Lawrence Berkeley National Lab Representing the BaBar Collaboration The BaBar Mini  BaBar  BaBar’s Data Formats  Design of the Mini 
CDF Offline Production Farms Stephen Wolbers for the CDF Production Farms Group May 30, 2001.
STAR C OMPUTING STAR Computing Infrastructure Torre Wenaus BNL STAR Collaboration Meeting BNL Jan 31, 1999.
9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.
5 May 98 1 Jürgen Knobloch Computing Planning for ATLAS ATLAS Software Week 5 May 1998 Jürgen Knobloch Slides also on:
NOVA Networked Object-based EnVironment for Analysis P. Nevski, A. Vaniachine, T. Wenaus NOVA is a project to develop distributed object oriented physics.
Grand Challenge and PHENIX Report post-MDC2 studies of GC software –feasibility for day-1 expectations of data model –simple robustness tests –Comparisons.
ATLAS Data Challenges US ATLAS Physics & Computing ANL October 30th 2001 Gilbert Poulard CERN EP-ATC.
1 GCA Application in STAR GCA Collaboration Grand Challenge Architecture and its Interface to STAR Sasha Vaniachine presenting for the Grand Challenge.
STAR Off-line Computing Capabilities at LBNL/NERSC Doug Olson, LBNL STAR Collaboration Meeting 2 August 1999, BNL.
STAR C OMPUTING STAR Computing Status, Out-source Plans, Residual Needs Torre Wenaus STAR Computing Leader BNL RHIC Computing Advisory Committee Meeting.
US ATLAS Tier 1 Facility Rich Baker Brookhaven National Laboratory Review of U.S. LHC Software and Computing Projects Fermi National Laboratory November.
Hall-D/GlueX Software Status 12 GeV Software Review III February 11[?], 2015 Mark Ito.
Analysis trains – Status & experience from operation Mihaela Gheata.
February 28, 2003Eric Hjort PDSF Status and Overview Eric Hjort, LBNL STAR Collaboration Meeting February 28, 2003.
STAR C OMPUTING STAR Analysis Operations and Issues Torre Wenaus BNL STAR PWG Videoconference BNL August 13, 1999.
PHENIX Simulation System 1 January 12, 2000 Simulation: Status for VRDC Tarun Ghosh, Indrani Ojha, Charles Vanderbilt University.
PHENIX and the data grid >400 collaborators 3 continents + Israel +Brazil 100’s of TB of data per year Complex data with multiple disparate physics goals.
STAR C OMPUTING STAR Computing Status and Plans Torre Wenaus BNL STAR Collaboration Meeting BNL Jan 31, 1999.
NOVA A Networked Object-Based EnVironment for Analysis “Framework Components for Distributed Computing” Pavel Nevski, Sasha Vanyashin, Torre Wenaus US.
Grand Challenge in MDC2 D. Olson, LBNL 31 Jan 1999 STAR Collaboration Meeting
LM Feb SSD status and Plans for Year 5 Lilian Martin - SUBATECH STAR Collaboration Meeting BNL - February 2005.
STAR C OMPUTING Plans for Production Use of Grand Challenge Software in STAR Torre Wenaus BNL Grand Challenge Meeting LBNL 10/23/98.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
Offline Status Report A. Antonelli Summary presentation for KLOE General Meeting Outline: Reprocessing status DST production Data Quality MC production.
STAR C OMPUTING The STAR Databases: From Objectivity to ROOT+MySQL Torre Wenaus BNL ATLAS Computing Week CERN August 31, 1999.
STAR C OMPUTING Introduction Torre Wenaus BNL May ‘99 STAR Computing Meeting BNL May 24, 1999.
Status report of the KLOE offline G. Venanzoni – LNF LNF Scientific Committee Frascati, 9 November 2004.
RCF Status - Introduction PHENIX and STAR Counting Houses are connected to RCF at a Network Bandwidth of 20 Gbits/sec each –Redundant (Bandwidth-wise and.
CD FY09 Tactical Plan Status FY09 Tactical Plan Status Report for Neutrino Program (MINOS, MINERvA, General) Margaret Votava April 21, 2009 Tactical plan.
Randy MelenApril 14, Stanford Linear Accelerator Center Site Report April 1999 Randy Melen SLAC Computing Services/Systems HPC Team Leader.
5 June 2003Alan Norton / Focus / EP Topics1 Other EP Topics Some 2003 Running Experiments - NA48/2 (Flavio Marchetto) - Compass (Benigno Gobbo) - NA60.
November 1, 2004 ElizabethGallas -- D0 Luminosity Db 1 D0 Luminosity Database: Checklist for Production Elizabeth Gallas Fermilab Computing Division /
The MEG Offline Project General Architecture Offline Organization Responsibilities Milestones PSI 2/7/2004Corrado Gatto INFN.
Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013
ALICE RRB-T ALICE Computing – an update F.Carminati 23 October 2001.
L. Perini DATAGRID WP8 Use-cases 19 Dec ATLAS short term grid use-cases The “production” activities foreseen till mid-2001 and the tools to be used.
1 Reconstruction tasks R.Shahoyan, 25/06/ Including TRD into track fit (JIRA PWGPP-1))  JIRA PWGPP-2: Code is in the release, need to switch setting.
Jianming Qian, UM/DØ Software & Computing Where we are now Where we want to go Overview Director’s Review, June 5, 2002.
Apr. 25, 2002Why DØRAC? DØRAC FTFM, Jae Yu 1 What do we want DØ Regional Analysis Centers (DØRAC) do? Why do we need a DØRAC? What do we want a DØRAC do?
Hall D Computing Facilities Ian Bird 16 March 2001.
DCS Status and Amanda News
SuperB and its computing requirements
LHC experiments Requirements and Concepts ALICE
Readiness of ATLAS Computing - A personal view
ILD Ichinoseki Meeting
US ATLAS Physics & Computing
Nuclear Physics Data Management Needs Bruce G. Gibbard
OO-Design in PHENIX PHENIX, a BIG Collaboration A Liberal Data Model
ATLAS DC2 & Continuous production
Expanding the PHENIX Reconstruction Universe
Presentation transcript:

STAR C OMPUTING STAR MDC1 Experience and Revised Computing Requirements Torre Wenaus BNL RHIC Computing Advisory Committee Meeting BNL December 3-4, 1998

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 12/98 Outline MDC1 in STAR  Goals  MDC1 program and results MDC1 outcomes and coming activity  Computing decisions Revisited Computing Requirements  Status of 3/98 requirements report  Year one requirements report Conclusions  Comments on facilities  MDC2 objectives

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 12/98 MDC1 Goals Official throughput goal: processing of 100k events from simulation through the production chain; mix of year 1 and year 2 detector configurations Beyond the official 100k, a physics driven data set of 300k events  100k event goal met; 260k simu, 190k reco total to date  MDCs a very effective motivator and driver for subsystem software development Calibration database in place and in use in offline codes  Was in place, but little used; full deployment for MDC2 Production principally in STAF framework, but ROOT based production exercised as well  STAF and ROOT both used; ~4% via ROOT to date (manpower limited) Partial, prototype C++/OO data model in place  Not met! No manpower. A principal MDC2 goal. Prototype Objectivity-based DST event store in place  Met; 50GB of DSTs in Objectivity Grand Challenge software tested with Objectivity DSTs  Met; Doug will address DST based physics analysis  Low level activity during MDC1; has ramped up post-MDC1

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 12/98 MDC1 Program and Results Organization RCF support Simulations and data sinking Reconstruction DST generation, Objectivity based event store Data mining and physics analysis Validation and quality assurance

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 12/98 Organization MDC1 Production Leader  Yuri Fisyak (BNL) Simulation and raw data sinking  Peter Jacobs (LBNL), Pavel Nevski (BNL), Dan Russ (CMU), T3E Team Production operations  Lidia Didenko (BNL) Production chain kumacs, quality control  Kathy Turner (BNL) Framework (STAF) development and support  Victor Perevoztchikov (BNL), Herb Ward (UT Austin) Grand Challenge data mining  Doug Olson, David Zimmerman, Ari Shoshani (all LBNL) Objectivity event store  Torre Wenaus (BNL) DST analysis  A. Saulys, M. Tokarev, C. Ogilvie And many others!

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 12/98 RCF Support We found planning and cooperation with RCF and the other experiments before and during MDC1 to be very effective Thanks to the RCF staff for excellent MDC1 support around the clock and seven days a week! Particular thanks to...  HPSS l John Riordan  AFS on CRS cluster l Tim Sailer, Alex Lenderman, Edward Nicolescu  CRS scripts l Tom Throwe  LSF support l Razvan Popescu  All other problems l Shigeki Misawa

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 12/98 Simulations and data sinking Cray T3E at Pittsburgh Supercomputing Center (PSC) became available for STAR last March, in addition to NERSC, for MDC directed simulation Major effort since then to port simulation software to T3E  CERNLIB, Geant3, GSTAR, parts of STAF ported in BNL/LBNL/PSC joint project  Production management and data transport (via network) infrastructure developed; production capability reached in August 90k CPU hrs used at NERSC (exhausted our allocation)  50k CPU hrs (only!) allocated in FY99 Production continuing at PSC, working through a ~450k hr allocation through February  Prospects for continued use of PSC beyond February completely unknown; dependent on DOE T3E data transferred to RCF via net and migrated to HPSS  50GB/day average transfer rate; 100GB peak

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 12/98 Reconstruction CRS-based reconstruction production chain:  TPC, SVT, global tracking Two MDC1 production reconstruction phases  Phase 1 (early September) l Production ramp-up l Debugging infrastructure and application codes l Low throughput; bounded by STAR code  Phase 2 (from September 25) l New production release fixing phase 1 problems l STAR code robust; bounded by HPSS and resources l HPSS: §size of HPSS disk cash (46GB) §number of concurrent ftp processes l CPU : §from 20 CPUs in early September to §56 CPUs in late October

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 12/98 Production Summary Total on HPSS: 3.8TB Geant simulation: Total: 2.3 TB NERSC:.6 TB PSC: 1.6 TB RCF: ~.1 TB DSTs: XDF: 913 GB in 2569 files Objectivity: 50 GB ROOT: 8 GB

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 12/98 Reconstruction Production Rate 3/98 requirements report: 2.5 kSi95 sec/event =70 kB/sec Average rate in MDC1: kB/sec Good agreement!

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 12/98 DST Generation Production DST is STAF-standard (XDR-based) XDF format  Objectivity only in secondary role, and presently incompatible with Linux based production Direct mapping from IDL-defined DST data structures to Objectivity event store objects  Objectivity DB built using BaBar event store software as basis  Includes tag database info used to build Grand Challenge query index  XDF to Objectivity loading performed in post-production Solaris- based step 50GB (disk space limited) Objy DST event store built and used by Grand Challenge to exercise data mining

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 12/98 Data mining and physics analysis Data mining in MDC1 by Grand Challenge team; great progress, yielding an important tool for STAR and RHIC (Doug’s talk) We are preparing for the loading of the full 1TB DST data set into a second generation Objectivity event store  to be used for physics analysis via the Grand Challenge software  supporting hybrid non-Objectivity (ROOT) event components  ‘next week’ for the last month! Soon... Little physics analysis activity in MDC1  Some QA evaluation of the DSTs  Large MDC1 effort in preparing CAS software by the event by event physics group l but they missed MDC1 when it was delayed a month; they could only be here and participate in August  DST evaluation and physics analysis has ramped up since MDC1 in most physics working group and will play a major role in MDC2

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 12/98 Validation and Quality Assurance Simulation  Generation of evaluation plots and standard histograms for all data an integral part of simulation production operations Reconstruction  Leak checking, table integrity checking built into framework and effective in debugging and making the framework leak-tight; no memory growth  Code acceptance and integration into production chain performed centrally at BNL; log file checks, table row counts  TPC and other subsystem responsibles involved in assessing and debugging problems  DST based evaluation of upstream codes, physics analysis play a central role in QA, but limited in MDC1 (lack of manpower)  Dedicated effort underway for MDC2 to develop framework for validation of reconstruction codes and DSTs against reference histograms

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 12/98 MDC1 Outcomes and Coming Activity Computing decisions directing development for MDC2 and year 1  MDC1 followed by intensive series of computing workshops addressing (mainly infrastructure) software design and implementation choices in light of MDC1 and MDC1-directed development work  Objective: stabilize infrastructure for year 1 by MDC2; direct post- MDC2 attention to physics analysis, reconstruction development  Many decisions ROOT-related; summarized in ROOT decision diagram l Productive visit from Rene Brun  C++/OO decisions: DST and post-DST physics analysis will be C++ based  Event Store: Hybrid Objectivity/ROOT event store will be employed; ROOT used for (at least) post-DST micro, nano DSTs; tag database

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 12/98 Revisited Computing Requirements STAR Offline Computing Requirements Task Force Report  P. Jacobs et al, March 1998, Star Note 327, 62pp  Physics motivation, data sets, reconstruction, analysis, simulation  Addresses fully instrumented detector in nominal year STAR Offline Computing Requirements for RHIC Year One  T. Ullrich et al, November 1998, Star Note xxx, 5pp  MDC1 experience shows no significant deviation from SN327 l Conclude that numbers remain valid and estimates remain current  This new report is an addendum addressing year 1  Detector setup: TPC, central trigger barrel, 10% of EMC, and (probably) FTPC and RICH  20MB/sec rate to tape; central event size MB l No data compression during first year

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 12/98 Year 1 Requirements (2) RHIC running  STAR’s requirements do not depend on luminosity profile l Can saturate on central events at 2% of nominal RHIC luminosity  Rather, on the actual duty factor profile  4M central Au-Au at 100 GeV A/beam expected from official estimated machine performance folded with STAR duty factor l 25% of nominal year l Total raw data volume estimate: 60TB DST production  MDC1 experience confirms estimate of 2.5 kSi95/event reconstruction  Assuming reprocessing factor of 1.5 (very lean assumption for year 1!) yields 15M kSi95 sec l 630 Si95 units average for the year, assuming 75% STAR/RCF duty factor l Factoring in machine duty factor, can expect bulk of data to arrive late in year

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 12/98 Year 1 Requirements (3) l To arrive at a reasonable number: simulated a range of RHIC performance profiles §Obtained times average value to achieve year 1 throughput with at most 10% backlog at year end: ~1200 Si95 l DST data volume estimated at 10TB §Data reduction factor of 6 rather than 10 (SN327); additional information retained on DST for reconstruction evaluation and debugging, detector performance studies Data mining and analysis  Projects able to participate with year 1 data scaled by event count relative to SN327 nominal l 12M Si95 sec total CPU needs; 2.3TB microDST volume l 480 Si95 units assuming 75% STAR/RCF duty factor l Analysis usage is similarly peaked late in the year following QA, reconstruction validation, calibration §Need estimated at twice the average, ~1000 Si95 units

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 12/98 Year 1 Requirements (4) Simulations  Used to derive acceptance and efficiency corrections, signal background due to physics or instrumental response l Largely independent of data sample size; no change relative to nominal year requirements foreseen l Data mining and analysis of simulations will peak late in year 1, with the real data  Simulation CPU assumed to be available offsite  At RCF, early year CPU not needed yet for reco/analysis can be applied to simulation reconstruction; no additional request for reconstruction of simulated data  24TB simulated data volume Year 1 requirements summarized in report table, p.5

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 12/98 Comments on Facilities All in all, RCF facilities performed well  HPSS gave the most trouble, as expected, but it too worked l Tape drive bottleneck will be addressed by MDC2 Robust, flexible batch system like LSF well motivated on CAS  Many users and usage modes Need for such a system on CRS not resolved  Limitations encountered in script based system: control in prioritizing jobs, handling of different reco job types and output characteristics, control over data placement and movement, job monitoring l Addressable through script extensions? Complementary roles for LSF and in- house scripts?  Attempt to evaluate LSF in reconstruction production incomplete; limited availability in MDC1  Would like to properly evaluate LSF (as well as next generation scripts) on CRS in MDC2 as basis for informed assessment of its value (weighing cost also, of course)  Flexible, efficient, robust application of computing resources important l Delivered CPU cycles (  available cycles) is the bottom line

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 12/98 MDC2 Objectives  Full year1 geometry (add RICH, measured B field,...)  Geometry,parameters, calibration constants from database  Use of DAQ form of raw data  New C++/OO TPC detector response simulation  Reconstruction production in ROOT  TPC clustering  All year 1 detectors on DST (add EMC, FTPC, RICH, trigger)  ROOT based job monitor (and manager?)  Much more active CAS physics analysis program  C++/OO transient data model in place, in use  Hybrid Objectivity/ROOT event store in use l Independent storage of multiple event components  Quality assurance package for validation against reference results

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 12/98 Conclusions MDC1 a success from STAR’s point of view  Our important goals were met  Outcome would have been an optimist’s scenario last spring  MDCs a very effective tool for guiding and motivating effort Good collaboration with RCF and other experiments  Demonstrated a functional computing facility in a complex production environment  All data processing stages (data transfer, sinking, production reconstruction, analysis) successfully exercised concurrently Good agreement with expectations of our computing requirements document Clear (but ambitious) path to MDC2 Post-MDC2 objective: a steady ramp to the commissioning run and the real data challenge in a stable software environment