STAR C OMPUTING STAR Computing Status, Out-source Plans, Residual Needs Torre Wenaus STAR Computing Leader BNL RHIC Computing Advisory Committee Meeting.

Slides:



Advertisements
Similar presentations
Amber Boehnlein, FNAL D0 Computing Model and Plans Amber Boehnlein D0 Financial Committee November 18, 2002.
Advertisements

1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
DATA PRESERVATION IN ALICE FEDERICO CARMINATI. MOTIVATION ALICE is a 150 M CHF investment by a large scientific community The ALICE data is unique and.
Trigger and online software Simon George & Reiner Hauser T/DAQ Phase 1 IDR.
Microsoft ® Application Virtualization 4.6 Infrastructure Planning and Design Published: September 2008 Updated: February 2010.
Hall D Online Data Acquisition CEBAF provides us with a tremendous scientific opportunity for understanding one of the fundamental forces of nature. 75.
Windows Server ® Virtualization Infrastructure Planning and Design Published: November 2007 Updated: January 2012.
STAR C OMPUTING The STAR Databases: Objectivity and Post-Objectivity Torre Wenaus BNL ATLAS Computing Week CERN August 31, 1999.
October 24, 2000Milestones, Funding of USCMS S&C Matthias Kasemann1 US CMS Software and Computing Milestones and Funding Profiles Matthias Kasemann Fermilab.
STAR C OMPUTING ROOT in STAR Torre Wenaus STAR Computing and Software Leader Brookhaven National Laboratory, USA ROOT 2000 Workshop, CERN February 3, 2000.
STAR C OMPUTING STAR Computing Status, Out-source Plans, Residual Needs Torre Wenaus STAR Computing Leader BNL RHIC Computing Advisory Committee Meeting.
The GlueX Collaboration Meeting October 4-6, 2012 Jefferson Lab Curtis Meyer.
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
GLAST LAT ProjectDOE/NASA Baseline-Preliminary Design Review, January 8, 2002 K.Young 1 LAT Data Processing Facility Automatically process Level 0 data.
Remote Production and Regional Analysis Centers Iain Bertram 24 May 2002 Draft 1 Lancaster University.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
Central Reconstruction System on the RHIC Linux Farm in Brookhaven Laboratory HEPIX - BNL October 19, 2004 Tomasz Wlodek - BNL.
Grid Status - PPDG / Magda / pacman Torre Wenaus BNL U.S. ATLAS Physics and Computing Advisory Panel Review Argonne National Laboratory Oct 30, 2001.
ALICE Upgrade for Run3: Computing HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.
Summary of 1 TB Milestone RD Schaffer Outline : u Goals and assumptions of the exercise u The hardware constraints u The basic model u What have we understood.
CDF Offline Production Farms Stephen Wolbers for the CDF Production Farms Group May 30, 2001.
STAR C OMPUTING STAR Computing Infrastructure Torre Wenaus BNL STAR Collaboration Meeting BNL Jan 31, 1999.
Developing & Managing A Large Linux Farm – The Brookhaven Experience CHEP2004 – Interlaken September 27, 2004 Tomasz Wlodek - BNL.
5 May 98 1 Jürgen Knobloch Computing Planning for ATLAS ATLAS Software Week 5 May 1998 Jürgen Knobloch Slides also on:
NOVA Networked Object-based EnVironment for Analysis P. Nevski, A. Vaniachine, T. Wenaus NOVA is a project to develop distributed object oriented physics.
STAR C OMPUTING Software and Computing Readiness Torre Wenaus BNL STAR Collaboration Meeting Plenary BNL August 2, 1999.
1 Planning for Reuse (based on some ideas currently being discussed in LHCb ) m Obstacles to reuse m Process for reuse m Project organisation for reuse.
Grand Challenge and PHENIX Report post-MDC2 studies of GC software –feasibility for day-1 expectations of data model –simple robustness tests –Comparisons.
CERN Physics Database Services and Plans Maria Girone, CERN-IT
1 GCA Application in STAR GCA Collaboration Grand Challenge Architecture and its Interface to STAR Sasha Vaniachine presenting for the Grand Challenge.
STAR Off-line Computing Capabilities at LBNL/NERSC Doug Olson, LBNL STAR Collaboration Meeting 2 August 1999, BNL.
STAR C OMPUTING STAR MDC1 Experience and Revised Computing Requirements Torre Wenaus BNL RHIC Computing Advisory Committee Meeting BNL December 3-4, 1998.
CMS Computing and Core-Software USCMS CB Riverside, May 19, 2001 David Stickland, Princeton University CMS Computing and Core-Software Deputy PM.
The PHysics Analysis SERver Project (PHASER) CHEP 2000 Padova, Italy February 7-11, 2000 M. Bowen, G. Landsberg, and R. Partridge* Brown University.
Analysis trains – Status & experience from operation Mihaela Gheata.
February 28, 2003Eric Hjort PDSF Status and Overview Eric Hjort, LBNL STAR Collaboration Meeting February 28, 2003.
STAR C OMPUTING STAR Analysis Operations and Issues Torre Wenaus BNL STAR PWG Videoconference BNL August 13, 1999.
AliRoot survey P.Hristov 11/06/2013. Offline framework  AliRoot in development since 1998  Directly based on ROOT  Used since the detector TDR’s for.
STAR Collaboration, July 2004 Grid Collector Wei-Ming Zhang Kent State University John Wu, Alex Sim, Junmin Gu and Arie Shoshani Lawrence Berkeley National.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
NOVA A Networked Object-Based EnVironment for Analysis “Framework Components for Distributed Computing” Pavel Nevski, Sasha Vanyashin, Torre Wenaus US.
Grand Challenge in MDC2 D. Olson, LBNL 31 Jan 1999 STAR Collaboration Meeting
LM Feb SSD status and Plans for Year 5 Lilian Martin - SUBATECH STAR Collaboration Meeting BNL - February 2005.
STAR C OMPUTING Plans for Production Use of Grand Challenge Software in STAR Torre Wenaus BNL Grand Challenge Meeting LBNL 10/23/98.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
STAR C OMPUTING The STAR Databases: From Objectivity to ROOT+MySQL Torre Wenaus BNL ATLAS Computing Week CERN August 31, 1999.
Feb. 14, 2002DØRAM Proposal DØ IB Meeting, Jae Yu 1 Proposal for a DØ Remote Analysis Model (DØRAM) Introduction Partial Workshop Results DØRAM Architecture.
STAR C OMPUTING Introduction Torre Wenaus BNL May ‘99 STAR Computing Meeting BNL May 24, 1999.
CD FY09 Tactical Plan Status FY09 Tactical Plan Status Report for Neutrino Program (MINOS, MINERvA, General) Margaret Votava April 21, 2009 Tactical plan.
Randy MelenApril 14, Stanford Linear Accelerator Center Site Report April 1999 Randy Melen SLAC Computing Services/Systems HPC Team Leader.
Paul Alexander 2 nd SKADS Workshop October 2007 SKA and SKADS Costing The Future Paul Alexander Andrew Faulkner, Rosie Bolton.
Simulation Status for Year2 Running Charles F. Maguire Software Meeting May 8, 2001.
U.S. ATLAS Computing Facilities Overview Bruce G. Gibbard Brookhaven National Laboratory U.S. LHC Software and Computing Review Brookhaven National Laboratory.
November 1, 2004 ElizabethGallas -- D0 Luminosity Db 1 D0 Luminosity Database: Checklist for Production Elizabeth Gallas Fermilab Computing Division /
Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.
A UK Computing Facility John Gordon RAL October ‘99HEPiX Fall ‘99 Data Size Event Rate 10 9 events/year Storage Requirements (real & simulated data)
PCAP Close Out Feb 2, 2004 BNL. Overall  Good progress in all areas  Good accomplishments in DC-2 (and CTB) –Late, but good.
A Service-Based SLA Model HEPIX -- CERN May 6, 2008 Tony Chan -- BNL.
Grid Status - PPDG / Magda / pacman Torre Wenaus BNL DOE/NSF Review of US LHC Software and Computing Fermilab Nov 29, 2001.
Analysis Performance and I/O Optimization Jack Cranshaw, Argonne National Lab October 11, 2011.
Jianming Qian, UM/DØ Software & Computing Where we are now Where we want to go Overview Director’s Review, June 5, 2002.
Some topics for discussion 31/03/2016 P. Hristov 1.
Apr. 25, 2002Why DØRAC? DØRAC FTFM, Jae Yu 1 What do we want DØ Regional Analysis Centers (DØRAC) do? Why do we need a DØRAC? What do we want a DØRAC do?
Magda Distributed Data Manager Torre Wenaus BNL October 2001.
LQCD Computing Operations
Nuclear Physics Data Management Needs Bruce G. Gibbard
OO-Design in PHENIX PHENIX, a BIG Collaboration A Liberal Data Model
Presentation transcript:

STAR C OMPUTING STAR Computing Status, Out-source Plans, Residual Needs Torre Wenaus STAR Computing Leader BNL RHIC Computing Advisory Committee Meeting BNL October 11, 1999

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 10/98 Outline Status of STAR and STAR Computing Out-source plans Residual needs Conclusions - MDC2 participation slide - dismal RCF/BNL networking performance - increasing reliance on LBL for functional security configs - ftp to sol not possible - cannot open X windows from my home internet provider

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 10/98 The STAR Computing Task STAR trigger system reduces 10MHz collision rate to ~1Hz recorded to tape Data recording rate of 20MB/sec; ~12MB raw data size per event  ~4000+ tracks/event recorded in tracking detectors (factor of 2 uncertainty in physics generators)  High statistics per event permit event by event measurement and correlation of QGP signals such as strangeness enhancement, J/psi attenuation, high Pt parton energy loss modifications in jets, global thermodynamic variables  17M Au-Au events (equivalent) recorded in nominal year  Relatively few but highly complex events requiring large processing power  Wide range of physics studies: ~100 concurrent analyses foreseen in ~7 physics working groups

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 10/98 Manpower Planned dedicated database person never hired (funding); databases consequently late but we are now transitioning from an interim to our final database Very important development in last 6 months: Big new influx of postdocs, students into computing and related activities  Increased participation and pace of activity in l QA l online computing l production tools and operations l databases l reconstruction software Still missing online/general computing systems support person  Open position cancelled due to lack of funding  Shortfall continues to be made up by the local computing group

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 10/98 Some of our Youthful Manpower Dave Alvarez, Wayne, SVT Lee Barnby, Kent, QA and production Jerome Baudot, Strasbourg, SSD Selemon Bekele, OSU, SVT Marguerite Belt Tonjes, Michigan, EMC Helen Caines, Ohio State, SVT Manuel Calderon, Yale, StMcEvent Gary Cheung, UT, QA Laurent Conin, Nantes, database Wensheng Deng, Kent, production Jamie Dunlop, Yale, RICH Patricia Fachini, Sao Paolo/Wayne, SVT Dominik Flierl, Frankfurt, L3 DST Marcelo Gameiro, Sao Paolo, SVT Jon Gangs, Yale, online Dave Hardtke, LBNL, Calibrations, DB Mike Heffner, Davis, FTPC Eric Hjort, Purdue, TPC Amy Hummel, Creighton, TPC, production Holm Hummler, MPG, FTPC Matt Horsley, Yale, RICH Jennifer Klay, Davis, PID Matt Lamont, Birmingham, QA Curtis Lansdell, UT, QA Brian Lasiuk, Yale, TPC, RICH Frank Laue, OSU, online Lilian Martin, Subatch, SSD Marcelo Munhoz, Sao Paolo/Wayne, online Aya Ishihara, UT, QA Adam Kisiel, Warsaw, online, Linux Frank Laue, OSU, calibration Hui Long, UCLA, TPC Vladimir Morozov, LBNL, simulation Alex Nevski, RICH Sergei Panitkin, Kent, online Caroline Peter, Geneva, RICH Li Qun, LBNL, TPC Jeff Reid, UW, QA Fabrice Retiere, calibrations Christelle Roy, Subatech, SSD Dan Russ, CMU, trigger, production Raimond Snellings, LBNL, TPC, QA Jun Takahashi, Sao Paolo, SVT Aihong Tang, Kent Greg Thompson, Wayne, SVT Fuquian Wang, LBNL, calibrations Robert Willson, OSU, SVT Richard Witt, Kent Gene Van Buren, UCLA, documentation, tools, QA Eugene Yamamoto, UCLA, calibrations, cosmics David Zimmerman, LBNL, Grand Challenge A partial list of young students and postdocs now active in aspects of software...

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 10/98 STAR Computing Requirements Nominal year processing and data volume requirements: Raw data volume: 200TB Reconstruction: 2800 Si95 total CPU, 30TB DST data  10x event size reduction from raw to reco  1.5 reconstruction passes/event (only!) assumed Analysis: 4000 Si95 total analysis CPU, 15TB micro-DST data  Si95-sec/event per MB of DST depending on analysis l Wide range, from CPU-limited to I/O limited  ~100 active analyses, 5 passes over data per analysis  micro-DST volumes from.1 to several TB Simulation: 3300 Si95 total including reconstruction, 24TB Total nominal year data volume: 270TB Total nominal year CPU: 10,000 Si95

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 10/98 Current Status of Requirements Simu reqmts review

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 10/98 STAR Software Environment Current software base a mix of C++ (55%) and Fortran (45%)  Rapid evolution from ~20%/80% in 9/98  New development, and all physics analysis, in C++ Framework built over CERN-developed ROOT toolkit adopted 11/98  Supports legacy Fortran codes and data structures without change  Deployed in offline production and analysis in Mock Data Challenge 2, 2-3/99  Post-reconstruction: C++/OO data model ‘StEvent’ implemented l Next step: migrate the OO data model upstream to reconstruction Core software development team: ~7 people  Almost entirely BNL based l BNL leads STAR’s software and computing effort  A small team given the scale of the experiment and data management/analysis task  Strong reliance on leveraging community, commercial, open software

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 10/98 Status Relative to Nov Workshop Decisions ROOT-based production software by MDC2 Legacy codes will continue to be supported C++/OO data model for DST, design unconstrained by ROOT Analysis codes in C++; unconstrained by ROOT (templates etc.) ROOT-based analysis framework and tools Choice between Objectivity and ROOT for data store from MDC2 Objectivity for event store metadata, “all being well”   All was not well! Concerns over manpower, viability, outside experience (BaBar) … and most importantly success of MySQL and ROOT I/O provided good alternative which does not compromise data model Will use ROOT for micro-DSTs in year 1  Presumed extension of data model to persistent was overambitious; it was, but our hyperindustrious core team (largely Yuri, Victor) did it anyway. Persistent StEvent as basis for micro-DSTs is now an option.

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 10/98 From MDC2 to the May Workshop MDC2:  ROOT based production chain and I/O, StEvent (OO/C++ data model) in place and in use l Statistics suffered from software and hardware problems and the short MDC2 duration l Software objectives were met and the objective of an active physics analysis and QA program was met; 5pm QA meetings instituted Post-MDC2: round of infrastructure modifications in light of MDC2 experience  ‘Maker’ scheme incorporating packages in framework and interfacing to I/O revised l Robustness of multi-branch I/O (multiple file streams) improved  XDF based I/O maintained as stably functional alternative  Rapid change brought a typical user response of Aaaargh! Development completed by the late May computing workshop

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 10/98 Status Relative to Engineering Run Goals established in May Stabilize offline environment as seen by users  End of rapid infrastructure development in May and subsequent debugging have improved stability, but the issue hasn’t gone away  Shift of focus to consolidation: debugging, usability improvements, documentation, user-driven enhancements, developing and responding to QA Support for DAQ data in offline from raw files through analysis. Done. Provide transparent I/O supporting all file types. Done  Universal interface’ to I/O, StIOMaker, developed to provide transparent, backwards compatible I/O for all formats l Future (from Sept) extension to Grand Challenge Stably functional data storage  ROOT I/O debugging proceeded into June/July; now appears stably functional. XDF output continues to provide backup. Deploy and migrate to persistent StEvent  Deployed; migration to persistent StEvent in progress since the meeting. Period of coexistence to allow people to migrate on their timescale.

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 10/98 Status Relative to May Goals (2) Populated calibration/parameter database in use via stable interface  Not there! Still a work in progress! Resume (in persistent form!) the productive QA/analysis operation seen in MDC2 l Big success; thanks to Trainor, Ullrich, Turner and the many participants Extend production DB to real data logging, multi-component events and file catalog needs of Grand Challenge  Done, but much work still to come.

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 10/98 Real Data Processing StDaqLib from DAQ group provides real data decoding StIOMaker provides real data access in offline framework St_tpcdaq_Maker interfaces to TPC codes Current status from Iwona:  Can read and analyze zero suppressed data all the way to DST  DST from real data (360 events) read into StEvent analysis  Bad channel suppression implemented and tested.  First order alignment was worked out (~1mm), the rest to come from residuals analysis  For monitoring purposes we also read and analyze unsuppressed data  75% of TPC read out.  cosmics with no field and several runs with field on  working towards regular production and QA reconstruction of real data

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 10/98 Release Management and Stability Separate infrastructure development release.dev introduced to insulate users.dev migrated to dev weekly (Fri), standard test suite run over weekend QA evaluation by core group, librarian, QA team Monday dev to new migration decision taken at Tuesday computing meeting  migration happens weekly unless QA results delay it Better defined criteria for dev->new migration to be instituted soon Objective:  A stable, current new version for broad use  Currently too heavy a reliance on the unstable dev version l Version usage to date: (pro artifically high; default environment) §Level.dev 2729 §Level dev §Level new 7641 §Level old 662 §Level pro 57697

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 10/98 Event Store and Data Management Success of ROOT-based event data storage from MDC2 on relegated Objectivity to metadata management role, if any  ROOT provides storage for the data itself We can use a simpler, safer tool in metadata role without compromising our data model, and avoid complexities and risks of Objectivity  MySQL adopted to try this (relational DB, open software, widely used, very fast, but not a full-featured heavyweight like ORACLE)  Wonderful experience so far. Excellent tools, very robust, extremely fast  Scalability must be tested but multiple servers can be used as needed to address scalability needs  Not taxing the tool because metadata, not large volume data, is stored Metadata management (multi-component file catalog) implemented for simulation data and extended to real data  Basis of tools managing data locality, retrieval in development  Event-level tags implemented for real data to test scalability and event tag function (but probably not full TagDB, better suited to ROOT files) Will integrate with Grand Challenge in September

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 10/98 ROOT Status ROOT is with us to stay!  No major deficiencies, obstacles found; no post-ROOT visions contemplated  ROOT community growing: Fermilab Run II, ALICE, MINOS  We are leveraging community developments  First US ROOT workshop at FNAL in March l Broad participation, >50 from all major US labs, experiments l ROOT team present; heeded STAR priority requests §I/O improvements: robust multi-stream I/O and schema evolution §Standard Template Library support §Both emerging in subsequent ROOT releases l FNAL participation in development, documentation §ROOT guide and training materials just released The framework is based on ROOT; application codes need not depend on ROOT (neither is it forbidden to use ROOT in application codes).

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 10/98 QA Peter Jacobs appointed QA Czar by John Harris  With Kathy Turner, Tom Trainor and many others, enormous strides have been made in QA with much more to come  QA process in the semi-weekly 3pm 902 meetings very effective  Suite of histograms and other QA measures in continuous development  Automated tools managing production and extraction of QA measures from test and production running being developed  QA process as it evolves will be integrated into release procedures, very soon for dev->new and with new->pro to come  Acts as a very effective driver for debugging and development of the software, engaging a lot of people

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 10/98 Documentation and Tutorials Greater attention to documentation needed and getting underway  Documentation and code navigation tools developed  Infrastructure documentation, ease of use a priority as part of ‘consolidation’ effort  Many holes to be filled in application code doc as well  Entry level, user guides and detailed (reference manuals) needed  Standards in place; encouragement (enforcement!) needed to get developers to document their code and users to participate in entry level material Gene Van Buren appointed Software Documentation Coordinator -- thank you Gene!  Oversees documentation activities  Provides prioritized list of needs  Makes sure the doc gets written and checked l Let Gene know what exists  Implements easy web-based access; contributes to entry level doc We must all support this effort! Monthly tutorial program ongoing, thanks to Kathy Turner for organizing and other presenters for participating

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 10/98 Computing Facilities Broad local and remote usage of STAR software  RCF 35532BNL Sun 10944BNL Linux 6422Desktop 3255  HP 422LBL 9500Rice 3116Indiana Enabled by AFS based environment; AFS an indispensable tool  But stressed when heavily used for data access Agreement reached with RCF to allow read-only access to RCF NFS data disks from STAR BNL computers  Urgently needed to avoid total dependence on AFS for distributed access to (limited volumes of) data  Trial period during August, followed by evaluation of any problems Linux  New RCF farm deployments soon; cf. Bruce’s talk  New in STAR: 6 dual 500MHz/18GB (2 arrived), 120GB disk (arrived) l For development, testing, online monitoring, services (web, DB,…) l STAR Linux manageable to date thanks largely to Pavel Nevski’s self- education as a Linux expert l STAR environment provided for Linux desktop boxes

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 10/98 Post Engineering Run Priorities Extending data management tools (MySQL DB + disk file management + HPSS file management + multi-component ROOT files) Complete schema evolution, in collaboration with ROOT team Management of CAS processing and data distribution both for mining and individual physicist level analysis  Integration and deployment of Grand Challenge Completion of the DB: integration of slow control as data source, completion of online integration, extending beyond TPC Extend and apply OO data model (StEvent) to reconstruction Continued QA development Improving and better integrating visualization tools Reconstruction and analysis code development directed at addressing QA results and year 1 code completeness and quality  responding to the real-data outcomes of the engineering run

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 10/98 STAR Computing Facilities: Offsite pdsf at LBNL/NERSC  Virtually identical configuration to RCF  Intel/Linux farm, limited Sun/Solaris, HPSS based data archiving  Current (10/99) scale relative to STAR RCF: l CPU ~50% (1200 Si95), disk ~85% (2.5TB)  Long term goal: resources ~equal to STAR’s share of RCF l Consistent with long-standing plan that RCF hosts 50% of experiments’ computing facilities: simulation and some analysis offsite l Ramp-up currently being planned Cray T3Es at NERSC, Pittsburgh Supercomputing Center  STAR Geant3 based simulation ported to T3Es and used to generate ~7TB of simulated data in support of Mock Data Challenges and software development Physics analysis computing at home institutions  Processing of ‘micro-DSTs’ and DST subsets

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 10/98 PDSF in STAR Probably the only new tidbit is that we now know what our FY2000 allocation is for other NERSC resources, 210,ooo MPP hours and 200,000 SRU's. A discussion of what SRU's are though would just be confusing.

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 10/98 STAR at RCF Overall: RCF is a well designed and effective facility for STAR data processing and management  Sound choices in overall architecture, hardware, software  Well aligned with the HENP community l Community tools and expertise easily exploited l Valuable synergies with non-RHIC programs, notably ATLAS §Growing ATLAS involvement within the STAR BNL core computing group  Production stress tests have been successful and instructive  On schedule; facilities have been there when needed RCF interacts effectively and consults appropriately, and is generally responsive to input Mock Data Challenges have been highly effective in exercising, debugging and optimizing RCF production facilities

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 10/98 Grand Challenge What does the Grand Challenge do for you? Optimizes access to HPSS tape store, That is:  The GC Architecture allows users to stage files out of HPSS onto local disk (the GC disk cache) without worrying about l Username and Password for HPSS l Access Tape Numbers l File Names  Improves data access for individual users l Allows event access by query: Present query string to GCA. (e.g. NumberLambdas>1) receive iterator over events which satisfy query as files are extracted from HPSS. l Pre-fetches files so that “the next” file is requested from HPSS while you are analysing the data in your first file  Coordinates data access among multiple users l Coordinates ftp requests so that a tape is staged only once per set of queries which request files on that tape

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 10/98 Grand Challenge in STAR

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 10/98 GC Implementation Plan Interface GC client code to STAR framework  Already runs on solaris, linux  Needs integration into framework I/O management  Needs connections to STAR MySQL DB Apply GC index builder to STAR event tags  Interface is defined  Has been used with non-STAR ROOT files  Needs connection to STAR ROOT and mysql DB (New) manpower for implementation now available; needs to come up to speed

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 10/98 Mock Data Challenges MDC1: Sep/Oct ‘98  >200k (2TB) events simulated offsite; 170k reconstructed at RCF (goal was 100k)  Storage technologies exercised (Objectivity, ROOT)  Data management architecture of Grand Challenge project demonstrated  Concerns identified: HPSS, AFS, farm management software MDC2: Feb/Mar ‘99  New ROOT-based infrastructure in production  AFS improved, HPSS improved but still a concern  Storage technology finalized (ROOT)  New problem area, STAR program size, addressed in new procurements and OS updates (more memory, swap) Both data challenges:  Effective demonstration of productive, cooperative, concurrent (in MDC1) production operations among the four experiments  Bottom line verdict: the facility works, and should perform in physics datataking and analysis

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 10/98 Principal Concerns RCF manpower  Understaffing directly impacts l Depth of support/knowledge base in crucial technologies, eg. AFS, HPSS l Level and quality of user and experiment-specific support l Scope of RCF participation in software; Much less central support/development effort in common software than at other labs (FNAL, SLAC) §e.g. ROOT used by all four experiments, but no RCF involvement §Exacerbated by very tight manpower within the experiment software efforts §Some generic software development supported by LDRD (NOVA project of STAR/ATLAS group)  The existing overextended staff is getting the essentials done, but the data flood is still to come Computer/network security  Careful balance required between ensuring security and providing a productive and capable development and production environment  Closely coupled to overall lab computer/network security; coherent site- wide plan, as non-intrusive as possible, is needed  In considerable flux in recent months with the jury still out on whether we are in balance

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 10/98 HPSS Transfer Failures During MDC2 in certain periods up to 20% of file transfers to HPSS failed dangerously  transfers seem to succeed; no errors and the file seemingly visible in HPSS with the right size  but on reading we find the file not readable l John Riordan has list of errors seen during reading In reconstruction we can guard against this by reading back the file, but it would be a much more serious problem for DAQ data which we cannot afford to read from HPSS to check its integrity.

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 10/98 Public Information and Documentation Needed Clear list of services RCF provides, the level of support of these services, resources allocated to each experiment, personnel support responsibles - rcas: LSF, rcrs: CRS software, AFS: (stability, home directories,...) - Disks: (inside / outside access) - HPSS - experiment DAQ/online interface Web based information is very incomplete  e.g. information on planned facilities for year one and after  largely a restatement of first point

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 10/98 Current STAR Status at RCF Computing operations during the engineering run fairly smooth, apart from very severe security disruptions  Data volumes small, and direct DAQ->RCF data path not yet commissioned Effectively using the newly expanded Linux farm  Steady reconstruction production on CRS; transition to year 1 operation should be smooth  Analysis software development and production underway on CAS  Tools managing analysis operations under development In urgent need of disk; expected imminently Integration of Grand Challenge data management tools into production and physics analysis operations to take place this fall Based on status to date, we expect STAR and RCF to be ready for whatever RHIC Year 1 throws at us.

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 10/98 RCF Support We found planning and cooperation with RCF and the other experiments before and during MDC1 to be very effective Thanks to the RCF staff for excellent MDC1 support around the clock and seven days a week! Particular thanks to...  HPSS l John Riordan  AFS on CRS cluster l Tim Sailer, Alex Lenderman, Edward Nicolescu  CRS scripts l Tom Throwe  LSF support l Razvan Popescu  All other problems l Shigeki Misawa

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 10/98 Simulations and data sinking Cray T3E at Pittsburgh Supercomputing Center (PSC) became available for STAR last March, in addition to NERSC, for MDC directed simulation Major effort since then to port simulation software to T3E  CERNLIB, Geant3, GSTAR, parts of STAF ported in BNL/LBNL/PSC joint project  Production management and data transport (via network) infrastructure developed; production capability reached in August 90k CPU hrs used at NERSC (exhausted our allocation)  50k CPU hrs (only!) allocated in FY99 Production continuing at PSC, working through a ~450k hr allocation through February  Prospects for continued use of PSC beyond February completely unknown; dependent on DOE T3E data transferred to RCF via net and migrated to HPSS  50GB/day average transfer rate; 100GB peak

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 10/98 Reconstruction CRS-based reconstruction production chain:  TPC, SVT, global tracking Two MDC1 production reconstruction phases  Phase 1 (early September) l Production ramp-up l Debugging infrastructure and application codes l Low throughput; bounded by STAR code  Phase 2 (from September 25) l New production release fixing phase 1 problems l STAR code robust; bounded by HPSS and resources l HPSS: §size of HPSS disk cash (46GB) §number of concurrent ftp processes l CPU : §from 20 CPUs in early September to §56 CPUs in late October

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 10/98 Production Summary Total on HPSS: 3.8TB Geant simulation: Total: 2.3 TB NERSC:.6 TB PSC: 1.6 TB RCF: ~.1 TB DSTs: XDF: 913 GB in 2569 files Objectivity: 50 GB ROOT: 8 GB

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 10/98 Reconstruction Production Rate 3/98 requirements report: 2.5 kSi95 sec/event =70 kB/sec Average rate in MDC1: kB/sec Good agreement!

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 10/98 DST Generation Production DST is STAF-standard (XDR-based) XDF format  Objectivity only in secondary role, and presently incompatible with Linux based production Direct mapping from IDL-defined DST data structures to Objectivity event store objects  Objectivity DB built using BaBar event store software as basis  Includes tag database info used to build Grand Challenge query index  XDF to Objectivity loading performed in post-production Solaris- based step 50GB (disk space limited) Objy DST event store built and used by Grand Challenge to exercise data mining

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 10/98 Data mining and physics analysis Data mining in MDC1 by Grand Challenge team; great progress, yielding an important tool for STAR and RHIC (Doug’s talk) We are preparing for the loading of the full 1TB DST data set into a second generation Objectivity event store  to be used for physics analysis via the Grand Challenge software  supporting hybrid non-Objectivity (ROOT) event components  ‘next week’ for the last month! Soon... Little physics analysis activity in MDC1  Some QA evaluation of the DSTs  Large MDC1 effort in preparing CAS software by the event by event physics group l but they missed MDC1 when it was delayed a month; they could only be here and participate in August  DST evaluation and physics analysis has ramped up since MDC1 in most physics working group and will play a major role in MDC2

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 10/98 Validation and Quality Assurance Simulation  Generation of evaluation plots and standard histograms for all data an integral part of simulation production operations Reconstruction  Leak checking, table integrity checking built into framework and effective in debugging and making the framework leak-tight; no memory growth  Code acceptance and integration into production chain performed centrally at BNL; log file checks, table row counts  TPC and other subsystem responsibles involved in assessing and debugging problems  DST based evaluation of upstream codes, physics analysis play a central role in QA, but limited in MDC1 (lack of manpower)  Dedicated effort underway for MDC2 to develop framework for validation of reconstruction codes and DSTs against reference histograms

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 10/98 MDC1 Outcomes and Coming Activity Computing decisions directing development for MDC2 and year 1  MDC1 followed by intensive series of computing workshops addressing (mainly infrastructure) software design and implementation choices in light of MDC1 and MDC1-directed development work  Objective: stabilize infrastructure for year 1 by MDC2; direct post- MDC2 attention to physics analysis, reconstruction development  Many decisions ROOT-related; summarized in ROOT decision diagram l Productive visit from Rene Brun  C++/OO decisions: DST and post-DST physics analysis will be C++ based  Event Store: Hybrid Objectivity/ROOT event store will be employed; ROOT used for (at least) post-DST micro, nano DSTs; tag database

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 10/98 Revisited Computing Requirements STAR Offline Computing Requirements Task Force Report  P. Jacobs et al, March 1998, Star Note 327, 62pp  Physics motivation, data sets, reconstruction, analysis, simulation  Addresses fully instrumented detector in nominal year STAR Offline Computing Requirements for RHIC Year One  T. Ullrich et al, November 1998, Star Note xxx, 5pp  MDC1 experience shows no significant deviation from SN327 l Conclude that numbers remain valid and estimates remain current  This new report is an addendum addressing year 1  Detector setup: TPC, central trigger barrel, 10% of EMC, and (probably) FTPC and RICH  20MB/sec rate to tape; central event size MB l No data compression during first year

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 10/98 Year 1 Requirements (2) RHIC running  STAR’s requirements do not depend on luminosity profile l Can saturate on central events at 2% of nominal RHIC luminosity  Rather, on the actual duty factor profile  4M central Au-Au at 100 GeV A/beam expected from official estimated machine performance folded with STAR duty factor l 25% of nominal year l Total raw data volume estimate: 60TB DST production  MDC1 experience confirms estimate of 2.5 kSi95/event reconstruction  Assuming reprocessing factor of 1.5 (very lean assumption for year 1!) yields 15M kSi95 sec l 630 Si95 units average for the year, assuming 75% STAR/RCF duty factor l Factoring in machine duty factor, can expect bulk of data to arrive late in year

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 10/98 Year 1 Requirements (3) l To arrive at a reasonable number: simulated a range of RHIC performance profiles §Obtained times average value to achieve year 1 throughput with at most 10% backlog at year end: ~1200 Si95 l DST data volume estimated at 10TB §Data reduction factor of 6 rather than 10 (SN327); additional information retained on DST for reconstruction evaluation and debugging, detector performance studies Data mining and analysis  Projects able to participate with year 1 data scaled by event count relative to SN327 nominal l 12M Si95 sec total CPU needs; 2.3TB microDST volume l 480 Si95 units assuming 75% STAR/RCF duty factor l Analysis usage is similarly peaked late in the year following QA, reconstruction validation, calibration §Need estimated at twice the average, ~1000 Si95 units

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 10/98 Year 1 Requirements (4) Simulations  Used to derive acceptance and efficiency corrections, signal background due to physics or instrumental response l Largely independent of data sample size; no change relative to nominal year requirements foreseen l Data mining and analysis of simulations will peak late in year 1, with the real data  Simulation CPU assumed to be available offsite  At RCF, early year CPU not needed yet for reco/analysis can be applied to simulation reconstruction; no additional request for reconstruction of simulated data  24TB simulated data volume Year 1 requirements summarized in report table, p.5

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 10/98 Comments on Facilities All in all, RCF facilities performed well  HPSS gave the most trouble, as expected, but it too worked l Tape drive bottleneck will be addressed by MDC2 Robust, flexible batch system like LSF well motivated on CAS  Many users and usage modes Need for such a system on CRS not resolved  Limitations encountered in script based system: control in prioritizing jobs, handling of different reco job types and output characteristics, control over data placement and movement, job monitoring l Addressable through script extensions? Complementary roles for LSF and in- house scripts?  Attempt to evaluate LSF in reconstruction production incomplete; limited availability in MDC1  Would like to properly evaluate LSF (as well as next generation scripts) on CRS in MDC2 as basis for informed assessment of its value (weighing cost also, of course)  Flexible, efficient, robust application of computing resources important l Delivered CPU cycles (  available cycles) is the bottom line

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 10/98 MDC2 Objectives  Full year1 geometry (add RICH, measured B field,...)  Geometry,parameters, calibration constants from database  Use of DAQ form of raw data  New C++/OO TPC detector response simulation  Reconstruction production in ROOT  TPC clustering  All year 1 detectors on DST (add EMC, FTPC, RICH, trigger)  ROOT based job monitor (and manager?)  Much more active CAS physics analysis program  C++/OO transient data model in place, in use  Hybrid Objectivity/ROOT event store in use l Independent storage of multiple event components  Quality assurance package for validation against reference results

STAR C OMPUTING Torre Wenaus, BNL RHIC Computing Advisory Committee 10/98 Conclusions MDC1 a success from STAR’s point of view  Our important goals were met  Outcome would have been an optimist’s scenario last spring  MDCs a very effective tool for guiding and motivating effort Good collaboration with RCF and other experiments  Demonstrated a functional computing facility in a complex production environment  All data processing stages (data transfer, sinking, production reconstruction, analysis) successfully exercised concurrently Good agreement with expectations of our computing requirements document Clear (but ambitious) path to MDC2 Post-MDC2 objective: a steady ramp to the commissioning run and the real data challenge in a stable software environment