CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t CCRC’08 Review from a DM perspective Alberto Pace (With slides from T.Bell, F.Donno, D.Duelmann,

Slides:



Advertisements
Similar presentations
Exporting Raw/ESD data from Tier-0 Tier-1s Wrap-up.
Advertisements

Storage: Futures Flavia Donno CERN/IT WLCG Grid Deployment Board, CERN 8 October 2008.
Storage Issues: the experiments’ perspective Flavia Donno CERN/IT WLCG Grid Deployment Board, CERN 9 September 2008.
CERN IT Department CH-1211 Geneva 23 Switzerland t Marcin Blaszczyk, IT-DB Atlas standby database tests February.
CERN - IT Department CH-1211 Genève 23 Switzerland t Oracle and Streams Diagnostics and Monitoring Eva Dafonte Pérez Florbela Tique Aires.
CERN IT Department CH-1211 Genève 23 Switzerland t Some Hints for “Best Practice” Regarding VO Boxes Running Critical Services and Real Use-cases.
CERN IT Department CH-1211 Genève 23 Switzerland t Tape-dev update Castor F2F meeting, 14/10/09 Nicola Bessone, German Cancio, Steven Murray,
LHCC Comprehensive Review – September WLCG Commissioning Schedule Still an ambitious programme ahead Still an ambitious programme ahead Timely testing.
CERN IT Department CH-1211 Genève 23 Switzerland t Tier0 Status Tony Cass (With thanks to Miguel Coelho dos Santos & Alex Iribarren) LCG-LHCC.
CERN IT Department CH-1211 Genève 23 Switzerland t Plans and Architectural Options for Physics Data Analysis at CERN D. Duellmann, A. Pace.
CERN IT Department CH-1211 Genève 23 Switzerland t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS From data management to storage services to the next challenges.
Status report on SRM v2.2 implementations: results of first stress tests 2 th July 2007 Flavia Donno CERN, IT/GD.
ATLAS Metrics for CCRC’08 Database Milestones WLCG CCRC'08 Post-Mortem Workshop CERN, Geneva, Switzerland June 12-13, 2008 Alexandre Vaniachine.
SRM 2.2: status of the implementations and GSSD 6 th March 2007 Flavia Donno, Maarten Litmaath INFN and IT/GD, CERN.
CCRC’08 Weekly Update Plus Brief Comments on WLCG Collaboration Workshop Jamie Shiers ~~~ WLCG Management Board, 29 th April 2008.
CCRC08-1 report WLCG Workshop, April KorsBos, ATLAS/NIKHEF/CERN.
CERN IT Department CH-1211 Geneva 23 Switzerland t Storageware Flavia Donno CERN WLCG Collaboration Workshop CERN, November 2008.
Workshop Summary (my impressions at least) Dirk Duellmann, CERN IT LCG Database Deployment & Persistency Workshop.
Grid Lab About the need of 3 Tier storage 5/22/121CHEP 2012, The need of 3 Tier storage Dmitri Ozerov Patrick Fuhrmann CHEP 2012, NYC, May 22, 2012 Grid.
CERN IT Department CH-1211 Genève 23 Switzerland t Castor development status Alberto Pace LCG-LHCC Referees Meeting, May 5 th, 2008 DRAFT.
08/30/05GDM Project Presentation Lower Storage Summary of activity on 8/30/2005.
CERN - IT Department CH-1211 Genève 23 Switzerland Castor External Operation Face-to-Face Meeting, CNAF, October 29-31, 2007 CASTOR2 Disk.
Light weight Disk Pool Manager experience and future plans Jean-Philippe Baud, IT-GD, CERN September 2005.
CERN - IT Department CH-1211 Genève 23 Switzerland t Oracle Real Application Clusters (RAC) Techniques for implementing & running robust.
1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008.
CASTOR evolution Presentation to HEPiX 2003, Vancouver 20/10/2003 Jean-Damien Durand, CERN-IT.
CERN - IT Department CH-1211 Genève 23 Switzerland t COOL Conditions Database for the LHC Experiments Development and Deployment Status Andrea.
1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
WLCG Grid Deployment Board, CERN 11 June 2008 Storage Update Flavia Donno CERN/IT.
ATLAS Bulk Pre-stageing Tests Graeme Stewart University of Glasgow.
CERN SRM Development Benjamin Coutourier Shaun de Witt CHEP06 - Mumbai.
Derek Ross E-Science Department DCache Deployment at Tier1A UK HEP Sysman April 2005.
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS Castor incident (and follow up) Alberto Pace.
INFSO-RI Enabling Grids for E-sciencE The gLite File Transfer Service: Middleware Lessons Learned form Service Challenges Paolo.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
Database authentication in CORAL and COOL Database authentication in CORAL and COOL Giacomo Govi Giacomo Govi CERN IT/PSS CERN IT/PSS On behalf of the.
CERN IT Department CH-1211 Genève 23 Switzerland t Streams Service Review Distributed Database Workshop CERN, 27 th November 2009 Eva Dafonte.
Tier-1 Andrew Sansum Deployment Board 12 July 2007.
WLCG Service Report ~~~ WLCG Management Board, 16 th September 2008 Minutes from daily meetings.
CERN IT Department CH-1211 Geneva 23 Switzerland t Eva Dafonte Perez IT-DB Database Replication, Backup and Archiving.
Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.
WLCG Service Report ~~~ WLCG Management Board, 18 th September
CERN IT Department CH-1211 Genève 23 Switzerland t COOL Performance Tests ATLAS Conditions Database example Romain Basset, IT-DM October.
SRM-2 Road Map and CASTOR Certification Shaun de Witt 3/3/08.
Handling of T1D0 in CCRC’08 Tier-0 data handling Tier-1 data handling Experiment data handling Reprocessing Recalling files from tape Tier-0 data handling,
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
CERN - IT Department CH-1211 Genève 23 Switzerland CASTOR F2F Monitoring at CERN Miguel Coelho dos Santos.
SRM v2.2 Production Deployment SRM v2.2 production deployment at CERN now underway. – One ‘endpoint’ per LHC experiment, plus a public one (as for CASTOR2).
WLCG critical services update Andrea Sciabà WLCG operations coordination meeting December 18, 2014.
8 August 2006MB Report on Status and Progress of SC4 activities 1 MB (Snapshot) Report on Status and Progress of SC4 activities A weekly report is gathered.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
11/01/20081 Data simulator status CCRC’08 Preparatory Meeting Radu Stoica, CERN* 11 th January 2007 * On leave from IFIN-HH.
Grid Deployment Board 5 December 2007 GSSD Status Report Flavia Donno CERN/IT-GD.
CASTOR in SC Operational aspects Vladimír Bahyl CERN IT-FIO 3 2.
LHCb 2009-Q4 report Q4 report LHCb 2009-Q4 report, PhC2 Activities in 2009-Q4 m Core Software o Stable versions of Gaudi and LCG-AA m Applications.
CERN IT Department CH-1211 Genève 23 Switzerland t Towards end-to-end debugging for data transfers Gavin McCance Javier Conejero Banon Sophie.
Dissemination and User Feedback Castor deployment team Castor Readiness Review – June 2006.
CERN IT Department CH-1211 Genève 23 Switzerland t Load testing & benchmarks on Oracle RAC Romain Basset – IT PSS DP.
Pledged and delivered resources to ALICE Grid computing in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week.
CERN IT Department CH-1211 Genève 23 Switzerland t DPM status and plans David Smith CERN, IT-DM-SGT Pre-GDB, Grid Storage Services 11 November.
CASTOR: possible evolution into the LHC era
Computing Operations Roadmap
Status of the SRM 2.2 MoU extension
Elizabeth Gallas - Oxford ADC Weekly September 13, 2011
Ákos Frohner EGEE'08 September 2008
The LHCb Computing Data Challenge DC06
Presentation transcript:

CERN IT Department CH-1211 Genève 23 Switzerland t CCRC’08 Review from a DM perspective Alberto Pace (With slides from T.Bell, F.Donno, D.Duelmann, M.Kasemann, J.Shiers, …)

CERN IT Department CH-1211 Genève 23 Switzerland t Presentation title - 2 Before the main topic Safety reminder –The computer center has different safety requirements than normal offices –This is why authorization is needed to enter ! –This is why there are safety courses ! –Noise above level acceptable for long term work –Wind above level acceptable for long term work –False Floor – 1 meter deep ! –No differential power switch !! In case of accident call the fire brigade

CERN IT Department CH-1211 Genève 23 Switzerland t CCRC’08 Wiki site – mmonComputingReadinessChallengeshttps://twiki.cern.ch/twiki/bin/view/LCG/WLCGCo mmonComputingReadinessChallenges Ongoing challenge with all 4 experiments Presentation title - 3

CERN IT Department CH-1211 Genève 23 Switzerland t Online and offline databases

CERN IT Department CH-1211 Genève 23 Switzerland t CPU Usage ATLAS/CMS DBs

CERN IT Department CH-1211 Genève 23 Switzerland t Physical Reads

CERN IT Department CH-1211 Genève 23 Switzerland t Network traffic

CERN IT Department CH-1211 Genève 23 Switzerland t DB service - some observations In general: DB load still dominated by activities that did not scale-up significantly during CCRC –load changes by CCRC on monitoring, work-flow, production systems smaller than eg fluctuations between software releases –major contribution scaling with reconstruction jobs not yet visible at CERN and Tier 1 sites Exception: ATLAS reprocessing at BNL, TRIUMF and NDGF –increased dCache load on to Calibration files (POOL) introduced bottleneck –Consequence: extremely long (idle) database connections on conditions database CORAL failover between T1 sites worked Increased DB session limits, session sniping added, dCache pool for calibration files added DB service run smoothly and without major disruptions –As usual several node reboots minor impact thanks to cluster architecture –2h streams intervention (downstream capture) was scheduled in agreement with experiments and service coordination during CCRC

CERN IT Department CH-1211 Genève 23 Switzerland t Castor and Grid Data Management

CERN IT Department CH-1211 Genève 23 Switzerland t Tier-0 to Tier-1 Exports

CERN IT Department CH-1211 Genève 23 Switzerland t February Summary

CERN IT Department CH-1211 Genève 23 Switzerland t Not limited by Castor

CERN IT Department CH-1211 Genève 23 Switzerland t Successful Stage-in test

CERN IT Department CH-1211 Genève 23 Switzerland t SRM – 2... Working

CERN IT Department CH-1211 Genève 23 Switzerland t TAPE issues

CERN IT Department CH-1211 Genève 23 Switzerland t Total performance to tape Alice and LHCb running Castor without policies so around 100% improvement in write performance expected with With simulated file sizes, Atlas data rates have improved to 30MB/s writing Focus on file size and policies has shown some improvements in write performance Read efficiency remains low and dominates drive utilisation due to low number of files read per mount and non-production users

CERN IT Department CH-1211 Genève 23 Switzerland t Tape usage read dominated Random read dominates drive time (90% reading) Writing under control of Castor policies Reading much more difficult to improve from the Castor side

CERN IT Department CH-1211 Genève 23 Switzerland t Production vs Users Data retrieved for CCRC period for CMS CMS production is under cmsprod and phedex (25% total) Requests for tape recalls dominated by non-production Equivalent data for Atlas shows production requests < 5%

CERN IT Department CH-1211 Genève 23 Switzerland t Options Do nothing –Hope things work out OK Tape prioritization in Castor –complete minimum implementation of VDQM2 and tape queue prioritization –A new long term strategy may be necessary Dedicate resources –Fragmentation risks Hardware investment –Purchase 50 tape drives and servers –Cost is 15K CHF/drive and 6K CHF/tape server, total 1050 kCHF

CERN IT Department CH-1211 Genève 23 Switzerland t Problems reported

CERN IT Department CH-1211 Genève 23 Switzerland t Castor Invalid checksum value returned by the CASTOR gridftp2 server (reported by CMS on 05/02) FIXED in (07/02) Gsiftp TURLs returned by CASTOR are relative (reported by S2 and CMS on 06/02) FIXED in (07/02) Unable to map request to space for policy TRANSFER_WAN (reported by CMS on 07/02) FIXED in (08/02) The srmDaemon attempts to free an unallocated pointer and crashes (reported by CNAF) FIXED in (14/02) Some of the database at CERN have shown an index to be missing (found by S2). FIXED in (15/02) Insufficient user privileges to make a request of type StagePutDoneRequest in service class 'atldata' (reported by S2 and ATLAS on 19/02) ☺ PutDone executed by and allowed for (root,root) To be fixed Workaround provided on 23/02

CERN IT Department CH-1211 Genève 23 Switzerland t Castor Missing access control on spaces based on voms groups and roles (reported by ATLAS/LHCb on 19/02). Followed by Storage Solution WG Could not get user information: VOMS credential ops does not match grid mapping dteam (reported by S2 and CNAF on 21/02) ☹ Not yet understood Error creating statement, Oracle code: ORA-12154: TNS:could not resolve the connect identifier specified (reported by S2 and CNAF on 12/02) Not yet understood ☞ It happens at service startup. A restart cures the problem Server unresponsive at RAL? - Space token ATLASDATADISK does not exist (reported by S2 and ATLAS on 28/02) Number of threads increased from 100 to 150 (28/2)

CERN IT Department CH-1211 Genève 23 Switzerland t Castor Summary 10 software problems reported, no major problems 6 problems fixed (in 2-3 days average) Developers and operation people very responsive.

CERN IT Department CH-1211 Genève 23 Switzerland t DPM Default ACLs on directories do not work (reported by ATLAS on 13/02) FIXED in (certified) Slow file removal (reported by ATLAS on 22/02): ext3 filesystems much slower than xfs for delete operations (2048 files of 1.5GB removed in 90minutes against 5 seconds of xfs – tests performed on the 25/02) DPM is being certified and will be the release available for CCRC08 in May.

CERN IT Department CH-1211 Genève 23 Switzerland t Conclusion CCRC ’08 is a success so far All DM software and tools has been able to scale to the challenge and beyond All is well under control in both the database and data management areas Remains strategic directions where investigations and major improvements or simplifications need discussion: –Improve efficiency for analysis –Tape area in general –Service for online database, piquet service for support –Synergies between DM tools and Castor –Job scheduling in Castor, improve/common database schema for Grid DM tools and Castor –...