Summary of the status and plans of FTS and LFC back-end database installations at Tier-1 Sites Gordon D. Brown Rutherford Appleton Laboratory WLCG Workshop.

Slides:



Advertisements
Similar presentations
INTRODUCTION TO ORACLE Lynnwood Brown System Managers LLC Oracle High Availability Solutions RAC and Standby Database Copyright System Managers LLC 2008.
Advertisements

Oracle Clustering and Replication Technologies CCR Workshop - Otranto Barbara Martelli Gianluca Peco.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 1: Introduction to Windows Server 2003.
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 8 Introduction to Printers in a Windows Server 2008 Network.
Database Hardware Resources at Tier-1 Sites Gordon D. Brown Rutherford Appleton Laboratory 3D/WLCG Workshop CERN, Geneva 11 th -14 th November 2008.
1 Recovery and Backup RMAN TIER 1 Experience, status and questions. Meeting at CNAF June of 2007, Bologna, Italy Carlos Fernando Gamboa, BNL Gordon.
Oracle backup and recovery strategy
CERN IT Department CH-1211 Geneva 23 Switzerland t CERN IT Department CH-1211 Geneva 23 Switzerland t
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 1: Introduction to Windows Server 2003.
1 RAL Status and Plans Carmine Cioffi Database Administrator and Developer 3D Workshop, CERN, November 2009.
BNL Oracle database services status and future plans Carlos Fernando Gamboa RACF Facility Brookhaven National Laboratory, US Distributed Database Operations.
Simplify your Job – Automatic Storage Management Angelo Session id:
Castor F2F Meeting Barbara Martelli Castor Database CNAF.
11 REVIEWING MICROSOFT ACTIVE DIRECTORY CONCEPTS Chapter 1.
Summary of the status and plans of FTS and LFC back-end database installations at Tier-1 Sites Gordon D. Brown Rutherford Appleton Laboratory WLCG Workshop.
Status of WLCG Tier-0 Maite Barroso, CERN-IT With input from T0 service managers Grid Deployment Board 9 April Apr-2014 Maite Barroso Lopez (at)
ASGC 1 ASGC Site Status 3D CERN. ASGC 2 Outlines Current activity Hardware and software specifications Configuration issues and experience.
5 Copyright © 2008, Oracle. All rights reserved. Using RMAN to Create Backups.
ATLAS DQ2 Deletion Service D.A. Oleynik, A.S. Petrosyan, V. Garonne, S. Campana (on behalf of the ATLAS Collaboration)
Chapter 8 Implementing Disaster Recovery and High Availability Hands-On Virtual Computing.
NOAA WEBShop A low-cost standby system for an OAR-wide budgeting application Eugene F. Burger (NOAA/PMEL/JISAO) NOAA WebShop July Philadelphia.
BINP/GCF Status Report BINP LCG Site Registration Oct 2009
Databases March 14, /14/2003Implementation Review2 Goals for Database Architecture Changes Simplify hardware architecture Improve performance Improve.
1 24x7 support status and plans at PIC Gonzalo Merino WLCG MB
Daniela Anzellotti Alessandro De Salvo Barbara Martelli Lorenzo Rinaldi.
Workshop Summary (my impressions at least) Dirk Duellmann, CERN IT LCG Database Deployment & Persistency Workshop.
CASTOR Databases at RAL Carmine Cioffi Database Administrator and Developer Castor Face to Face, RAL February 2009.
CCRC’08 Weekly Update Jamie Shiers ~~~ LCG MB, 1 st April 2008.
CERN Physics Database Services and Plans Maria Girone, CERN-IT
McLean HIGHER COMPUTER NETWORKING Lesson 15 (a) Disaster Avoidance Description of disaster avoidance: use of anti-virus software use of fault tolerance.
LFC Replication Tests LCG 3D Workshop Barbara Martelli.
DB Questions and Answers open session Carlos Fernando Gamboa, BNL WLCG Collaboration Workshop, CERN Geneva, April 2008.
CERN - IT Department CH-1211 Genève 23 Switzerland t Oracle Real Application Clusters (RAC) Techniques for implementing & running robust.
Review of Recent CASTOR Database Problems at RAL Gordon D. Brown Rutherford Appleton Laboratory 3D/WLCG Workshop CERN, Geneva 11 th -14 th November 2008.
CERN-IT Oracle Database Physics Services Maria Girone, IT-DB 13 December 2004.
INFSO-RI Enabling Grids for E-sciencE Enabling Grids for E-sciencE Pre-GDB Storage Classes summary of discussions Flavia Donno Pre-GDB.
Queensland University of Technology CRICOS No J VMware as implemented by the ITS department, QUT Scott Brewster 7 December 2006.
CERN Database Services for the LHC Computing Grid Maria Girone, CERN.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
High Availability Technologies for Tier2 Services June 16 th 2006 Tim Bell CERN IT/FIO/TSI.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
BNL Oracle database services status and future plans Carlos Fernando Gamboa, John DeStefano, Dantong Yu Grid Group, RACF Facility Brookhaven National Lab,
Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.
FTS monitoring work WLCG service reliability workshop November 2007 Alexander Uzhinskiy Andrey Nechaevskiy.
CNAF Database Service Barbara Martelli CNAF-INFN Elisabetta Vilucchi CNAF-INFN Simone Dalla Fina INFN-Padua.
Database CNAF Barbara Martelli Rome, April 4 st 2006.
PIC port d’informació científica Luis Diaz (PIC) ‏ Databases services at PIC: review and plans.
Status of tests in the LCG 3D database testbed Eva Dafonte Pérez LCG Database Deployment and Persistency Workshop.
CERN IT Department CH-1211 Geneva 23 Switzerland t Distributed Database Operations Workshop November 17 th, 2010 Przemyslaw Radowiecki CERN.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF CC Monitoring I.Fedorko on behalf of CF/ASI 18/02/2011 Overview.
Replicazione e QoS nella gestione di database grid-oriented Barbara Martelli INFN - CNAF.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
DB Questions and Answers open session (comments during session) WLCG Collaboration Workshop, CERN Geneva, 24 of April 2008.
Oracle Clustering and Replication Technologies UK Metadata Workshop - Oxford Barbara Martelli Gianluca Peco.
Servizi core INFN Grid presso il CNAF: setup attuale
Jean-Philippe Baud, IT-GD, CERN November 2007
Database Replication and Monitoring
High Availability Linux (HA Linux)
IT-DB Physics Services Planning for LHC start-up
Database Services at CERN Status Update
Castor services at the Tier-0
WLCG Service Interventions
NGS Oracle Service.
Scalable Database Services for Physics: Oracle 10g RAC on Linux
Introduction of Week 6 Assignment Discussion
2018 Real Dell EMC E Exam Questions Killtest
SpiraTest/Plan/Team Deployment Considerations
Scalable Database Services for Physics: Oracle 10g RAC on Linux
Presentation transcript:

Summary of the status and plans of FTS and LFC back-end database installations at Tier-1 Sites Gordon D. Brown Rutherford Appleton Laboratory WLCG Workshop CERN, Geneva 21 st -25 th April 2008

Overview Acronyms 11 Sites 26 Questions Developer overview Conclusions & Summary

FTS

Acronyms - FTS Full-Text Search (filename extension) Fourier Transform Spectroscopy Fault Tolerant System Federal Technology Service Flying Training Squadron (USAF) Flight Termination System Full-Time Support

Acronyms - FTS Fourier Transform Spectroscopy –is a measurement technique whereby spectra are collected based on measurements of the temporal coherence of a radiative source, using time-domain measurements of the electromagnetic radiation or other type of radiation.

Overview - FTS grid File Transfer Service FTS Web Service –This component allows users to submit FTS jobs and query their status. It is the only component that users interact with. FTS Channel Agents –Each network channel, e.g CERN-RAL has a distinct daemon running transfers for it. The daemon is responsible for starting and controlling transfers on the associated network link. FTS VO Agents –This component is responsible for VO-specific parts of the transfer (e.g. updating the replica catalog for a given VO or applying VO-specific retry policies in case of failed transfers). Each VO has a distinct VO agent daemon running for it. FTS Monitor –This is currently a CERN only element.

LFC

Acronyms - LFC Liverpool Football Club Lake Forest College (Lake Forest, IL) Level of Free Convection (meteorology) Los Fabulosos Cadillacs (Argentina band) Large Format Camera Land Forces Command (Canada) Load Frequency Control

Acronyms - LFC Full name: Liverpool Football Club Nickname(s): Pool, The Reds Founded: March 15, 1892 Ground: Anfield, Liverpool, England Capacity: 45,362 Manager: Rafael Benítez League: Premier League 2007–08: Premier League, 4th

Overview - LFC LCG File Catalog The LFC is a catalog containing logical to physical file mappings. Depending on the VO deployment model, the LFC is installed centrally or locally. The LFC is a secure catalog, supporting GSI security and VOMS. In the LFC, a given file is represented by a Grid Unique IDentifier (GUID). A given file replicated at different sites is then considered as the same file, thanks to this GUID, but (can) appear as a unique logical entry in the LFC catalog. The LFC allows us to see the logical file names in a hierarchical structure.

FTS

Do you have FTS running at your site? CERNYes CA-TRIUMFYes DE-GridKaYes ES-PICYes FR-IN2P3Yes IT-CNAFYes NDGFYes NL-SARAYes TW-ASGCYes UK-RALYes US-BNLYes

What back-end database does FTS run on? CERNOracle CA-TRIUMFOracle DE-GridKaOracle ES-PICOracle FR-IN2P3Oracle IT-CNAFOracle NDGFOracle NL-SARAOracle TW-ASGCOracle UK-RALOracle US-BNLOracle

Would you consider this database dev, test or production? CERNProd CA-TRIUMFProd DE-GridKaProd ES-PICProd FR-IN2P3Prod IT-CNAFProd NDGFProd NL-SARAProd TW-ASGCProd UK-RALProd US-BNLProd

If you have a prod copy, do you also have a dev or test copy? CERNYes CA-TRIUMFNo DE-GridKaNo ES-PICPre-prod and Test FR-IN2P3No IT-CNAFPre-prod NDGFNo NL-SARANo TW-ASGCTest UK-RALNo US-BNLProd copy and now implementing test copy

Is this database dedicated to FTS or does it share it with other schemas/applications? CERNThere are other schemas CA-TRIUMFDedicated DE-GridKaWill be shared with LFC after migration ES-PICDedicated FR-IN2P3Shared with LFC IT-CNAFShared on RAC with LFC but dedicated nodes NDGFDedicated NL-SARADedicated TW-ASGCShared with LFC UK-RALDedicated, will be shared with LFC in future US-BNLDedicated

Is this database a cluster? If so, how many nodes? CERN4 node cluster CA-TRIUMFSingle instance DE-GridKa3 node cluster ES-PIC2 node cluster FR-IN2P34 node cluster IT-CNAF3 node cluster NDGFSingle instance NL-SARASingle instance TW-ASGC3 node cluster UK-RALSingle instance (going to 2/3 node cluster) US-BNL2 node cluster

What is the backup policy on this database? CERNOndisk backup + tape backup every hour (arch logs) CA-TRIUMF- RMAN Sun & Wed weekly L0 backup - M,T,TH,FRI,SAT daily L1 backup - Every 30 minutes archivelog backup DE-GridKa2 full backup copies kept (2 weeks) ES-PIC- Full a week, - Differential Mon/Tue/Wed - Cumulative on Thurs - Differential on Wed and Sun - Archive logs backup up every day FR-IN2P31 full each week, incremental on other days

What is the backup policy on this database? IT-CNAF NDGFDaily backups NL-SARA- Daily dumps to the filesystem - Tivoli Storage Manager takes care of the filesystem backup TW-ASGCDaily incremental. L0 on Mon, L1 on Tue-Sun. UK-RALFull backup every week, incremental every day US-BNL- Full image copy updated every day - Archivelogs every 6 hours - Recovery window 14 days

How is this database monitored? CERNOEM + home-brewed monitoring CA-TRIUMFOEM DE-GridKaOEM, nagios and ganglia ES-PICOEM, nagios and ganglia FR-IN2P3OEM IT-CNAFOEM NDGFHost: nagios and Ganglia. No DB monitoring. NL-SARAScripts TW-ASGCOEM and nagios UK-RALOEM and nagios US-BNLGanglia, nagios

Are you replicating this database? CERNNo CA-TRIUMFNo DE-GridKaNo ES-PICNo FR-IN2P3No IT-CNAFNo NDGFNo NL-SARANo TW-ASGCNo UK-RALNo US-BNLNo

Do you have any other redundancy built in to this database? CERNNo CA-TRIUMFOracle Data Guard DE-GridKaMirrored FTS DB storage in SAN (i.e. 140 GB for data GB for recovery + 2 x 140 GB for mirroring). ES-PICNo FR-IN2P3SAN devices with RAID redundancy NDGFNo NL-SARANo TW-ASGCDisk storage for RAID 6 UK-RALSAN RAID 1 US-BNLStorage has hardware RAID controller 2 hots spares disks.

Do you have any other redundancy built in to this database? IT-CNAF- EMC storage device (CX3-80) which is highly redundant in all his parts (dual controller, double fibre channel connections and so on) - We use RAID 1 mirroring and ASM as storage management software (even if is configured with redundancy=external because we prefer to exploit hardware RAID1 redundancy, as recommended by EMC best practices). - We have a 3-nodes RAC, each node is connected to a different network switch and is equipped with dual power supply, dual port HBAs, RAID 1 local discs.

Do you plan to move FTS to another database? If so, which? CERNNo CA-TRIUMFNo DE-GridKaNo ES-PICNo FR-IN2P3No IT-CNAFNo NDGFNo NL-SARANo TW-ASGCNo UK-RALNo US-BNLNo

What are your plans for the FTS database? CERNNone CA-TRIUMFMove to Oracle RAC DE-GridKaNone ES-PICUpgrade with the last schema given by the FTS team on 18 March, in order to use the history and admin packages FR-IN2P3Plans are : monitor and take appropriate actions to increase QOS !! This may lead to change hosts, OS, etc IT-CNAFNone

What are your plans for the FTS database? NDGFPossibly move it to the university's central database group. Also consolidating it to use the 3D installation is considered. NL-SARATo change the backup policy so that RMAN is used instead of dumps on the filesystem. TW-ASGCNone UK-RALMove to Oracle RAC US-BNLOptimize if needed

Does the same DBA looking after the FTS and 3D databases? CERNYes – same team CA-TRIUMFYes DE-GridKaYes ES-PICYes FR-IN2P3Yes IT-CNAFYes – same team NDGFNo NL-SARANo TW-ASGCYes UK-RALYes – same team US-BNLYes

LFC

Do you have LFC running at your site? CERNYes CA-TRIUMFYes DE-GridKaYes ES-PICYes FR-IN2P3Yes

Do you have LFC running at your site? IT-CNAF- LHCb: we have a replica of the central catalog hosted at CERN - ATLAS: we have a local catalog, this is not replicated. The catalogs are installed on 2 different oracle RACs, they are both in production and we have a pre-production installation. The same considerations made for pre-production FTS apply here for LFC. NDGFWe will soon NL-SARAYes TW-ASGCYes UK-RALYes US-BNLNo

What back-end database does LFC run on? CERNOracle CA-TRIUMFmySQL DE-GridKamySQL ES-PICmySQL FR-IN2P3Oracle IT-CNAFOracle NDGFmySQL, but would prefer PostGreSQL support NL-SARAmySQL and Oracle TW-ASGCOracle UK-RALOracle US-BNLn/a

Would you consider this database dev, test or production? CERNProd CA-TRIUMFProd DE-GridKaProd ES-PICProd FR-IN2P3Prod IT-CNAFProd NDGFTest – prod soon NL-SARAProd TW-ASGCProd UK-RALProd US-BNLn/a

If you have a prod copy, do you also have a dev or test copy? CERNYes CA-TRIUMFNo DE-GridKaNo ES-PICTest FR-IN2P3No IT-CNAFNo NDGFNo NL-SARANo TW-ASGCNo UK-RALNo US-BNLn/a

CERNThere are other schemas CA-TRIUMFDedicated DE-GridKaDedicated ES-PICDedicated FR-IN2P3Shared with FTS IT-CNAFShared with FTS NDGFDedicated NL-SARAMySQL: Dedicated. Oracle: Shared with the 3D database. TW-ASGCShared with FTS UK-RALDedicated – will share with FTS in future US-BNLn/a Is this database dedicated to LFC or does it share it with other schemas/applications?

Is this database a cluster? If so, how many nodes? CERN4 node cluster CA-TRIUMFSingle instance DE-GridKaSingle instance ES-PICSingle instance FR-IN2P34 node cluster IT-CNAF- LHCb: we have a 2 node RAC which hosts LFC and LHCb Conditions DB - ATLAS: we have a 3 node RAC which hosts LFC and FTS NDGFSingle instance

Is this database a cluster? If so, how many nodes? NL-SARAMySQL: No. Oracle: Yes, 2 nodes. TW-ASGC3 node cluster UK-RALAtlas: Single instance (going to 2/3 node cluster) LHCb: Part of 3D 2 node cluster US-BNLn/a

What is the backup policy on this database? CERNOndisk backup + tape backup every hour (archive logs) CA-TRIUMFNightly database backup DE-GridKabin-logs; replication (hot standby) on master: daily diff-backup in TSM. ES-PICFull backup every day FR-IN2P31 full each week, incremental each other days IT-CNAF NDGFDaily offsite backups are planned

What is the backup policy on this database? NL-SARA- mySQL: Daily dumps to the filesystem. - TSM for the filesystem. - Oracle: RMAN talking to a TSM plugin. TW-ASGCIncremental – daily. level 0 on Monday, level 1 on Tuesday to Sunday UK-RALFull backup every week, incremental every day US-BNLn/a

How is this database monitored? CERNOEM + home-brewed monitoring CA-TRIUMFScripts DE-GridKaDB: MySQL Query Browser Host: nagios, ganglia). ES-PICNagios and ganglia FR-IN2P3OEM IT-CNAFOEM NDGFHost is monitored through Nagios and Ganglia. Plans for testing the database also with Nagios. NL-SARANagios TW-ASGCOEM and nagios UK-RALOEM and nagios US-BNLn/a

Are you replicating this database? CERNLFC for LHCB is being replicated CA-TRIUMFNo DE-GridKaYes ES-PICNo FR-IN2P3The LHBc LFC replica IT-CNAFNo NDGFNo NL-SARANo TW-ASGCNo UK-RALPossibly US-BNLn/a

Do you have any other redundancy built in to this database? CERNRAC and replication for LHCB. Only RAC for rest. CA-TRIUMFNo, but currently testing replication DE-GridKaNo ES-PICNo FR-IN2P3- Databases datafiles are hosted on SAN devices with RAID redundancy and are backuped each day. - Our Tivoli Storage Manager system give use the ability to have at the same time a copy of those backups on disk and on tape.

Do you have any other redundancy built in to this database? IT-CNAF- EMC storage device (CX3-80) which is highly redundant in all his parts (dual controller, double fibre channel connections and so on) We use RAID 1 mirroring and ASM as storage management software (even if is configured with redundancy=external because we prefer to exploit hardware RAID 1 redundancy, as recommended by EMC best practices). - We have a 3 node RAC, each node is connected to a different network switch and is equipped with dual power supply, dual port HBAs, RAID 1 local discs.

Do you have any other redundancy built in to this database? NDGFNo NL-SARAMySQL: No. Oracle: Well, the fact that it is a RAC cluster should provide some redundancy... TW-ASGCWe have RAC DB with disk storage for RAID 6. UK-RAL SAN RAID 1 US-BNLn/a

Do you plan to move LFC to another database? If so, which? CERNNo CA-TRIUMFOracle RAC DE-GridKaMySQL DB to be migrated to Oracle to the FTS 3-node RAC (see above). Then: 2 preferred LFC nodes (+ 1 standby LFC node) and LFC DB storage in SAN of 4 x 140 GB, i.e. 140 GB for data GB for recovery + 2 x 140 GB for mirroring. ES-PICMove to 2 node Oracle RAC in April

Do you plan to move LFC to another database? If so, which? FR-IN2P3No IT-CNAFNo NDGFPostGreSQL, if the LFC supports it. NL-SARANo TW-ASGCNo UK-RALNo US-BNLn/a

What are your plans for the LFC database? CERNNone CA-TRIUMFMigrate to Oracle RAC DE-GridKaMySQL DB to be migrated to Oracle to the FTS 3-node RAC (see above). Then: 2 preferred LFC nodes (+ 1 standby LFC node) and LFC DB storage in SAN of 4 x 140 GB, i.e. 140 GB for data GB for recovery + 2 x 140 GB for mirroring. ES-PICThe LFC database will have the same backup policy than FTS, and will be set up in the same storage device than 3D and FTS databases.

What are your plans for the LFC database? FR-IN2P3Plans are : monitor and take appropriate actions to increase QOS !! This may lead to change hosts, OS, etc IT-CNAFNone NDGF NL-SARATo add nodes to the cluster TW-ASGCNone UK-RALMove to 2/3 node Oracle cluster US-BNLn/a

Does the same DBA looking after the LFC and 3D databases? CERNYes – same team CA-TRIUMFNo DE-GridKaYes ES-PICYes FR-IN2P3Yes IT-CNAFYes – same team NDGFNo NL-SARAMySQL: No; Oracle: Yes TW-ASGCYes UK-RALYes – same team US-BNLn/a

Conclusions

FTS Summary SiteDBTest/ Dev Dedi- cated NodesBackupsMonitoring3D DBA CERND/TOther4Hourly dsk/tpeOEM + ownYes CA-TRIUMFNoYes1RMAN dailyOEMYes DE-GridKaNoLFC3RMAN dailyOEM, ng, ggYes ES-PICPP/TYes2RMAN dailyOEM, ng, ggYes FR-IN2P3NoLFC4RMAN dailyOEMYes IT-CNAFPPLFC3OEMYes NDGFNoYes1RMAN dailyng, ggNo NL-SARANoYes1Dumps dailyscriptsNo TW-ASGCTLFC3RMAN dailyOEM, ngYes UK-RALNoYes1RMAN dailyOEM, ngYes US-BNLPP/TYes2Image dailyng, ggYes

LFC Summary SiteDBTest/ Dev Dedi- cated NodesBackupsMonitoring3D DBA CERND/TOther4Hourly dsk/tpeOEM + ownYes CA-TRIUMFNoYes1RMAN dailyscriptsYes DE-GridKaNoYes?1Hot standbyOEM, ng, ggYes ES-PICTYes1RMAN dailyng, ggYes FR-IN2P3NoFTS4RMAN dailyOEMYes IT-CNAFNoFTS2/3OEMYes NDGFNoYes1Offsite planng, ggNo NL-SARANoYes1/2 RMAN dy/dmp nagiosN/Y TW-ASGCNoFTS3RMAN dailyOEM, ngYes UK-RALNoYes1/2RMAN dailyOEM, ngYes

FTS Developer View on Database Plans We pretty much leave it to the DBA really… In terms of plans, we have the ongoing plan to move more monitoring into the database – which means more summary and raw data stored. We’ll also do the analytic summarization in PL/SQL, so you should expect and increase in CPU use as we start to do this. It’s kinda hard to quantify though…

Personal View DBAs should work together to tackle issues Not just DB setup but application issues 3D good infrastructure for databases –community –experience in many areas –plenty of people to solve problems FTS & LFC (and CASTOR) tie in with 3D and other Oracle databases

Summary Database are important to WLCG Work more with developers where needed Availability and tuning is key 3D community/experience is helping with other database deployment areas Use list, use meetings, help each other

Questions and (hopefully) Answers