Operation of CASTOR at RAL Tier1 Review November 2007 Bonny Strong.

Slides:



Advertisements
Similar presentations
Storage Review David Britton,21/Nov/ /03/2014 One Year Ago Time Line Apr-09 Jan-09 Oct-08 Jul-08 Apr-08 Jan-08 Oct-07 OC Data? Oversight.
Advertisements

CERN Castor external operation meeting – November 2006 Olof Bärring CERN / IT.
Southgrid Status Pete Gronbech: 27th June 2006 GridPP 16 QMUL.
ISIDORE Project Progress, Performance and Future.
CASTOR Upgrade, Testing and Issues Shaun de Witt GRIDPP August 2010.
1 RAL Status and Plans Carmine Cioffi Database Administrator and Developer 3D Workshop, CERN, November 2009.
BNL Oracle database services status and future plans Carlos Fernando Gamboa RACF Facility Brookhaven National Laboratory, US Distributed Database Operations.
Castor F2F Meeting Barbara Martelli Castor Database CNAF.
November 2009 Network Disaster Recovery October 2014.
Tier-1 Overview Andrew Sansum 21 November Overview of Presentations Morning Presentations –Overview (Me) Not really overview – at request of Tony.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
LHCC Comprehensive Review – September WLCG Commissioning Schedule Still an ambitious programme ahead Still an ambitious programme ahead Timely testing.
Status of WLCG Tier-0 Maite Barroso, CERN-IT With input from T0 service managers Grid Deployment Board 9 April Apr-2014 Maite Barroso Lopez (at)
INFSO-RI Enabling Grids for E-sciencE SRMv2.2 experience Sophie Lemaitre WLCG Workshop.
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
CERN IT Department CH-1211 Genève 23 Switzerland t Tier0 Status Tony Cass (With thanks to Miguel Coelho dos Santos & Alex Iribarren) LCG-LHCC.
WLCG Service Report ~~~ WLCG Management Board, 27 th October
RAL Site Report Castor F2F, CERN Matthew Viljoen.
EGEE is a project funded by the European Union under contract IST Testing processes Leanne Guy Testing activity manager JRA1 All hands meeting,
SRM 2.2: status of the implementations and GSSD 6 th March 2007 Flavia Donno, Maarten Litmaath INFN and IT/GD, CERN.
DataGrid Applications Federico Carminati WP6 WorkShop December 11, 2000.
Daniela Anzellotti Alessandro De Salvo Barbara Martelli Lorenzo Rinaldi.
CERN IT Department CH-1211 Genève 23 Switzerland t Tier0 Status - 1 Tier0 Status Tony Cass LCG-LHCC Referees Meeting 18 th November 2008.
CASTOR Databases at RAL Carmine Cioffi Database Administrator and Developer Castor Face to Face, RAL February 2009.
CERN Physics Database Services and Plans Maria Girone, CERN-IT
RAL Site Report Castor Face-to-Face meeting September 2014 Rob Appleyard, Shaun de Witt, Juan Sierra.
CERN - IT Department CH-1211 Genève 23 Switzerland t CASTOR Status March 19 th 2007 CASTOR dev+ops teams Presented by Germán Cancio.
An Agile Service Deployment Framework and its Application Quattor System Management Tool and HyperV Virtualisation applied to CASTOR Hierarchical Storage.
Report from CASTOR external operations F2F meeting held at RAL in February Barbara Martelli INFN - CNAF.
User Board Input Tier Storage Review 21 November 2008 Glenn Patrick Rutherford Appleton Laboratory.
CERN - IT Department CH-1211 Genève 23 Switzerland t Oracle Real Application Clusters (RAC) Techniques for implementing & running robust.
Review of Recent CASTOR Database Problems at RAL Gordon D. Brown Rutherford Appleton Laboratory 3D/WLCG Workshop CERN, Geneva 11 th -14 th November 2008.
1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008.
CERN-IT Oracle Database Physics Services Maria Girone, IT-DB 13 December 2004.
INFSO-RI Enabling Grids for E-sciencE Enabling Grids for E-sciencE Pre-GDB Storage Classes summary of discussions Flavia Donno Pre-GDB.
CERN IT Department CH-1211 Genève 23 Switzerland t Load Testing Dennis Waldron, CERN IT/DM/DA CASTOR Face-to-Face Meeting, Feb 19 th 2009.
CERN Database Services for the LHC Computing Grid Maria Girone, CERN.
Report from the WLCG Operations and Tools TEG Maria Girone / CERN & Jeff Templon / NIKHEF WLCG Workshop, 19 th May 2012.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
Your university or experiment logo here Future Disk-Only Storage Project Shaun de Witt GridPP Review 20-June-2012.
Plans for Service Challenge 3 Ian Bird LHCC Referees Meeting 27 th June 2005.
Future Plans at RAL Tier 1 Shaun de Witt. Introduction Current Set-Up Short term plans Final Configuration How we get there… How we plan/hope/pray to.
CERN IT Department CH-1211 Genève 23 Switzerland t HEPiX Conference, ASGC, Taiwan, Oct 20-24, 2008 The CASTOR SRM2 Interface Status and plans.
Operational experiences Castor deployment team Castor Readiness Review – June 2006.
Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.
SRM-2 Road Map and CASTOR Certification Shaun de Witt 3/3/08.
ASGC Site Report Jason Shih ASGC Grid Ops CASTOR External Operation Face to Face Meeting.
CASTOR Status at RAL CASTOR External Operations Face To Face Meeting Bonny Strong 10 June 2008.
Your university or experiment logo here User Board Glenn Patrick GridPP20, 11 March 2008.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
Patricia Méndez Lorenzo Status of the T0 services.
CASTOR Operations Face to Face 2006 Miguel Coelho dos Santos
8 August 2006MB Report on Status and Progress of SC4 activities 1 MB (Snapshot) Report on Status and Progress of SC4 activities A weekly report is gathered.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Tier1 Databases GridPP Review 20 th June 2012 Richard Sinclair Database Services Team Leader.
WLCG Operations Coordination report Maria Alandes, Andrea Sciabà IT-SDC On behalf of the WLCG Operations Coordination team GDB 9 th April 2014.
SAM Status Update Piotr Nyczyk LCG Management Board CERN, 5 June 2007.
Bonny Strong RAL RAL CASTOR Update External Institutes Meeting Nov 2006 Bonny Strong, Tim Folkes, and Chris Kruk.
Considerations for database servers Castor review – June 2006 Eric Grancher, Nilo Segura Chinchilla IT-DES.
The RAL PPD Tier 2/3 Current Status and Future Plans or “Are we ready for next year?” Chris Brew PPD Christmas Lectures th December 2007.
WLCG Service Report ~~~ WLCG Management Board, 10 th November
Storage & Database Team Activity Report INFN CNAF,
GGUS summary (3 weeks) VOUserTeamAlarmTotal ALICE7029 ATLAS CMS LHCb Totals
Dissemination and User Feedback Castor deployment team Castor Readiness Review – June 2006.
ATLAS Computing Model Ghita Rahal CC-IN2P3 Tutorial Atlas CC, Lyon
CASTOR-SRM Status GridPP NeSC SRM workshop
Castor services at the Tier-0
Olof Bärring LCG-LHCC Review, 22nd September 2008
Workshop Summary Dirk Duellmann.
Bernd Panzer-Steindel CERN/IT
Presentation transcript:

Operation of CASTOR at RAL Tier1 Review November 2007 Bonny Strong

History Jan 2005Castor1 installed at RAL for evaluation Jan 2006Castor2 first available to external institutes, installation begun at RAL Aug 2006Castor2 running after resolving problems for deployment outside CERN, verion Sep 2006CSA06 ran successfully Mar 2007Upgrade to version Major problems and instability causing frequent meltdowns Sep 2007Deployed separate instances per VO and castor version Much better stability

Name Server 2 Production Architecture stager DLF LSF stager DLF LSF 1 Diskserver 9 TB Tape Server Oracle stager Oracle NS+ vmgr Name Server 1 +vmgr CMS Stager Instance Atlas Stager Instance LHCb Stager Instance Repack and Small User Stager Instance 22 Diskservers 133 TB 7 Diskservers 48 TB 20 Diskservers 144 TB Oracle DLF Oracle stager Oracle DLF Oracle stager Oracle DLF Oracle DLF Oracle repack Oracle stager Tape Server Tape Server Tape Server Tape Server Tape Server repack Shared Services

Test Architecture stager DLF DLF+ LSF 1 Diskserver - variable Tape Server Oracle stager Oracle NS+ vmgr Name Server +vmgr DevelopmentPreproduction 1 Diskserver - variable Oracle DLF Oracle DLF Oracle repack Oracle stager repack Shared Services stagerDLF LSF 1 Diskserver - variable Tape Server Oracle NS+ vmgr Name Server +vmgr Certification Testbed Oracle DLF Oracle repack Oracle stager repack Shared Services

Operational Management Change management System manager on duty Helpdesk Monitoring: nagios, ganglia, castor-specific Team Bonny Strong – service manager Shaun de Witt – developer Tim Folkes (about 50%)- tape operations Chris Kruk – LSF manager, diskservers, sys admin Cheney Ketley (50%) – sys admin, LSF backup

Working with VOs Weekly meeting with all VOs to discuss issues and plans Meetings individually with VOs to model data flow and plan CASTOR configuration

Atlas Data Flow Model T0Raw StripInput D0T1 D1T0 D1T1 D0T0 T0 T2 T1’s RAW AODm1/ TAG AODm2/ TAG ESD2/ AODm2/ TAG AOD2 simRaw ESD/ AODm/ TAG/ RAW simStrip ESD1/ AODm1/ TAG TAG/ AODm2 Partner T1 ESD1 AODm2/ TAG ESD Farm RAW

Key Improvements Planned Over Next 6 Months Resilience –Oracle clusters (RAC) with Dataguard DB replication –Redundant stagers for each VO –Encouraging development for additional redundancy Monitoring improvements Development of administrative tools Deployment and configuration management procedures Disaster recovery documentation and testing

SRMv2 In production at RAL by 1 Dec 2007 Separate endpoints for each VO Front end clusters for redundancy Will run in parallel with SRMv1 until VOs approve v1 decommissioning

Major Problems and Issues Software reliability Heavy operational cost CERN-specific development Repack delayed Lack of administrative tools Performance to tape Staffing for 24/7 coverage

Working with CERN External institutes conference call every 2 weeks to review development progress and operational issues Twice yearly face-to-face meetings of external institutes Once monthly deployment conference call to plan development priorities Management level meetings over last year to address problems of CASTOR for Tier1s –Improved release procedures and planning –More involvement of Tier1s in development planning –Improved testing with development of certification testbed and testsuite at RAL

Conclusions Has not been a smooth road Have taken or plan significant steps to overcome problems Major concerns for 2008: –24/7 operation –Improving tape performance Expect system reliability to be much better in 2008 than 2007