ALICE experiences with CASTOR2 Latchezar Betev ALICE.

Slides:



Advertisements
Similar presentations
Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.
Advertisements

T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.
16/9/2004Features of the new CASTOR1 Alice offline week, 16/9/2004 Olof Bärring, CERN.
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
1 Status of the ALICE CERN Analysis Facility Marco MEONI – CERN/ALICE Jan Fiete GROSSE-OETRINGHAUS - CERN /ALICE CHEP Prague.
Staging to CAF + User groups + fairshare Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE Offline week,
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
Hall D Online Data Acquisition CEBAF provides us with a tremendous scientific opportunity for understanding one of the fundamental forces of nature. 75.
Experiences Deploying Xrootd at RAL Chris Brew (RAL)
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
ALICE Roadmap for 2009/2010 Patricia Méndez Lorenzo (IT/GS) Patricia Méndez Lorenzo (IT/GS) On behalf of the ALICE Offline team Slides prepared by Latchezar.
Status of the production and news about Nagios ALICE TF Meeting 22/07/2010.
Offline shifter training tutorial L. Betev February 19, 2009.
The ALICE Distributed Computing Federico Carminati ALICE workshop, Sibiu, Romania, 20/08/2008.
Efi.uchicago.edu ci.uchicago.edu FAX status developments performance future Rob Gardner Yang Wei Andrew Hanushevsky Ilija Vukotic.
Status of PDC’07 and user analysis issues (from admin point of view) L. Betev August 28, 2007.
1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008.
CASTOR evolution Presentation to HEPiX 2003, Vancouver 20/10/2003 Jean-Damien Durand, CERN-IT.
1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.
Planning and status of the Full Dress Rehearsal Latchezar Betev ALICE Offline week, Oct.12, 2007.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
CERN SRM Development Benjamin Coutourier Shaun de Witt CHEP06 - Mumbai.
AliEn AliEn at OSC The ALICE distributed computing environment by Bjørn S. Nilsen The Ohio State University.
INFSO-RI Enabling Grids for E-sciencE The gLite File Transfer Service: Middleware Lessons Learned form Service Challenges Paolo.
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
23.March 2004Bernd Panzer-Steindel, CERN/IT1 LCG Workshop Computing Fabric.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
Large scale data flow in local and GRID environment Viktor Kolosov (ITEP Moscow) Ivan Korolko (ITEP Moscow)
Storage Classes report GDB Oct Artem Trunov
Production Activities and Results by ALICE Patricia Méndez Lorenzo (on behalf of the ALICE Collaboration) Service Challenge Technical Meeting CERN, 15.
LCG-LHCC mini-review ALICE Latchezar Betev Latchezar Betev for the ALICE collaboration.
Summary of Workshop on Calibration and Alignment Database Magali Gruwé CERN PH/AIP ALICE Computing Day February 28 th 2005.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
Status of AliEn2 Services ALICE offline week Latchezar Betev Geneva, June 01, 2005.
Dynamic staging to a CAF cluster Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE CAF / PROOF Workshop,
8 August 2006MB Report on Status and Progress of SC4 activities 1 MB (Snapshot) Report on Status and Progress of SC4 activities A weekly report is gathered.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
Good user practices + Dynamic staging to a CAF cluster Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE CUF,
ALICE Physics Data Challenge ’05 and LCG Service Challenge 3 Latchezar Betev / ALICE Geneva, 6 April 2005 LCG Storage Management Workshop.
ALICE computing Focus on STEP09 and analysis activities ALICE computing Focus on STEP09 and analysis activities Latchezar Betev Réunion LCG-France, LAPP.
Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.
Dissemination and User Feedback Castor deployment team Castor Readiness Review – June 2006.
The ALICE Production Patricia Méndez Lorenzo (CERN, IT/PSS) On behalf of the ALICE Offline Project LCG-France Workshop Clermont, 14th March 2007.
Pledged and delivered resources to ALICE Grid computing in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week.
15.June 2004Bernd Panzer-Steindel, CERN/IT1 CERN Mass Storage Issues.
Availability of ALICE Grid resources in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week.
CERN IT-Storage Strategy Outlook Alberto Pace, Luca Mascetti, Julien Leduc
Federating Data in the ALICE Experiment
ALICE Computing Data Challenge VI
WP18, High-speed data recording Krzysztof Wrona, European XFEL
Database Replication and Monitoring
LHC experiments Requirements and Concepts ALICE
Service Challenge 3 CERN
Data Challenge with the Grid in ATLAS
Status of the CERN Analysis Facility
INFN-GRID Workshop Bari, October, 26, 2004
Bernd Panzer-Steindel, CERN/IT
Offline shifter training tutorial
MC data production, reconstruction and analysis - lessons from PDC’04
Simulation use cases for T2 in ALICE
Offline shifter training tutorial
Ákos Frohner EGEE'08 September 2008
ALICE Computing Upgrade Predrag Buncic
Stephen Burke, PPARC/RAL Jeff Templon, NIKHEF
R. Graciani for LHCb Mumbay, Feb 2006
ATLAS DC2 & Continuous production
Offline framework for conditions data
The LHCb Computing Data Challenge DC06
Presentation transcript:

ALICE experiences with CASTOR2 Latchezar Betev ALICE

2 Outline  General ALICE use cases  DAQ ALICE Data Challenges  Offline Physics Data Challenges  Plans for performance/stability tests  CASTOR2 and xrootd  Support  Conclusions

3 General ALICE use cases  CASTOR is used as a custodial storage for:  RAW and condition data from the experiment – transferred from the disk buffer in ALICE P2  Offline production - ESDs, AODs, user analysis results – through the Grid middleware and CAF  Direct storage of user files – from applications running on lxbatch

4 Brief history  DAQ ALICE Data Challenges:  2001 – ADC III – CASTOR1, 85MB/sec sustained transfer for one week  2002 – ADC IV – CASTOR1, 300MB/sec sustained transfer for one week  2004 – ADC VI – CASTOR1, failed to reach the challenge goals of 300 MB/sec  2005 – ADC VI – CASTOR2, 450 MB/sec sustained transfer for a week  2006 – ADC VII – CASTOR2 (July/August), 1 GB/sec sustained for a week Last data challenge before data taking in 2007  Offline Physics Data Challenges  – PDC’04 – CASTOR1 – storage of simulated events from up to 20 computing centres worldwide to CERN, test of data flow ‘in reverse’ Exposed the limitations (number of concurrently staged files) of CASTOR1 stager - partially solved by creating several stager instances Exposed deficiencies of AliRoot – too many small files, inefficient use of taping system  PDC’06 – CASTOR2 (ongoing) Tests of data transfers through xrootd and gLite File Transfer System (FTS) – stability of CASTOR2 and Grid tools Goal – up to 300MB/sec from CERN to ALICE T1s, sustained for one week

5 Data flow schema for first data taking period  p+p data taking and reconstruction data flow

6 Brief History (2)  User access  Substantial user interaction with CASTOR1 (one stager)  Progressively all ALICE users are migrating to Grid tools, direct interaction with CASTOR is minimized  Puts less stress on the system from ‘uninformed’ parties  Summary  ALICE was the first LHC experiment to migrate completely to CASTOR2 – both for major tasks (DAQ, Grid - August 2005) and users (February 2006)

7 DAQ ADC VII with CASTOR2  Data is transferred to a ‘no tape’ CASTOR2 service class dedicated for Data Challenges (3 TB) with a garbage collector  Functional tests of ‘rfcp’, ‘rfdir’… and registration of data in the ALICE Grid catalogue (AliEn)  Only few files are read back for testing  No stress tests of resources yet, services interruptions are acceptable  Production tests (scheduled for July/August 2006):  With a ‘to tape’ storage class  Test of CASTOR2 API using a DAQ software package dedicated to ROOT objectification  Online data transfer from the DAQ machines in P2 to CASTOR2

8 Data rates – example  In the past 20 days, approximately 25 TB in files registered in CASTOR2 and AliEn

9 Current issues  Modification of rfcp to include calculation of checksum ‘on-the-fly’, submitted to CASTOR development team  Running a CASTOR2 client on a standard CERN PC require substantial modifications of the default firewall settings  Adverse effects of power outages on the CASTOR2 services – clients are blocked and cannot recover ‘graciously’

10 Offline PDC’06 with CASTOR2  Offline uses a single instance of CASTOR2 (castoralice)  Files are transferred currently from/to CASTOR2 through 3 dedicated xrootd servers (lxfsra06xx), running a backend migration scripts and through FTS  ALICE Grid tools pack all output files from an application into an archive for optimization of taping – a reduction of number of files registered in CASTOR by factor of 4 to 30  Currently there are 40 TB of data in 214K files registered in CASTOR2

11 Data rates  Readback of production files from CASTOR2

12 Current issues (2)  Addition of monitoring tools for the stagers – re- implementation of ‘stageqry –s’  Guesstimate of time needed to stage a file from tape:  The Grid tools are ‘hiding’ the type of storage (MSS, disk) from the user/application  Naturally, the behavior of these two basic storage types is different  Any file stored in a MSS (like CASTOR) can be returned immediately (if it is in the MSS disk cache) or be delayed if it is on tape and has to be staged  In the second case the MSS should return an estimate (f.e. based on the information in the staging queue) when the file will be reasonably available to the application  This will help optimize the Grid tools and job behavior

13 Test plans in July/August 2006  Part of the integrated DAQ ADC VII, Offline PDC’06 and the LCG SC4:  Test of CASTOR2 API (with xrootd) – clarification later in the talk, using a DAQ software package dedicated to ROOT objectification  Online data transfer between DAQ "live" streams and CASTOR2 from the DAQ machines in P2, rate 1 GB/sec for 1 week  FTS transfers from CERN CASTOR WAN pool to 6 T1s, rate 300 MB/sec aggregate from CERN for one week (export of RAW data)  Functional tests of the CASTOR2-xrootd interface  Full reconstruction chain RAW data -> CERN first pass reconstruction storage and export, T1 second pass reconstruction and storage  The above will test both the throughput and stability of the CASTOR2 service for all basic computational tasks in ALICE

14 xrootd-CASTOR2 interface  xrootd – file server/disk manager and a transport protocol developed at SLAC, incorporated in ROOT and as such a natural choice for ALICE xrootd  ALICE will to use only xrootd from the applications to access data stored anywhere on any storage system:  Avoids the need for the applications to carry multiple libraries for different protocols and storage systems (issue on the Grid/CAF)  Avoids the splitting of the experiment’s disk pool in many sections  In March 2006, discussion have started between the CASTOR2 and xrootd development teams to incorporate xrootd in CASTOR2  This has resulted in a prototype implementation

15 Architecture app client xrootd ofs LFN TURL TRID redirect open xrootd ofs open CASTOR LFN TRID olb xrootd open redirect olb LFN  locat ion have LFN olb - open load balancing ofs - open file system CASTOR redirector Exec script UDP path CASTOR

16 Explanation of interactions 1.Forward an open request to CASTOR, the xrootd redirector (via CASTOR) provides the best copy to open 2.For CASTOR, schedule the I/O (1 st open only) 3.Better option again stack a redirector in front of everything, perhaps with normal disks If the file is on disk, redirect to normal disks or to CASTOR disks buffer If the file is on tape, redirect as before to CASTOR for staging 4.CASTOR stager would tell the olb of xrootd that the file is on disk for subsequent immediate opens  Similar implementation in other storage management systems (DPM, dCache)

17 Experiences with CASTOR support  ALICE was the first experiment to use CASTOR2 on extended basis:  As such, we have met with some ‘teething’ problems of the system  The stability of service has improved greatly in the past several months  The reaction to problems was/is typically very fast and competent  Few (named) experts are almost always on-line  We are worried about support sustainability, especially with many concurrent and continuously running exercises

18 Conclusions  ALICE has successfully migrated to CASTOR2  Initial tests of the system has shown that the basic functionality required by the experiment’s DAQ, Grid and user software is adequate  However the performance and stability tests are still to be done  Concern: at the time of this review, we can only partially answer the set of questions asked by the review committee  ALICE’s major development requirement is the integration of xrootd in CASTOR2  A good progress has been made in few (short) months  Few minor issues with functionality/monitoring - listed within the presentations  The CASTOR2 team response to problems is very quick and competent  This will also be tested extensively during the scheduled July/August exercises