GOCDB failover status and plans COD-19, 01/04/2009 G.Mathieu, A.Cavalli, C.Peter, P.Sologna.

Slides:



Advertisements
Similar presentations
GOC resilience John Gordon, STFC GridPP 22 microtalk.
Advertisements

Yokogawa Network Solutions Presents:
Database Update Kaveh Ranjbar Database Department Manager, RIPE NCC.
INFSO-RI Enabling Grids for E-sciencE SA1 Operations Manual P. Strange RAL, CCLRC UK.
Interface Programming 1 Week 15. Interface Programming 1 CALENDAR.
Glenn Research Center at Lewis Field Software Assurance of Web-based Applications SAWbA Tim Kurtz SAIC/GRC Software Assurance Symposium 2004.
Cost Effort Complexity Benefit Cloud Hosted Low Cost Agile Integrated Fully Supported.
offer a new electronic solution for the registration and reporting procedures to the local authorities.
Wes Preston Agenda  Quick Intro  Overview  Site Details  Notes and Resources  Questions.
Information Dissemination EENet Maria Ristkok Rhodes, 2004.
1 APNIC reverse DNS management roadmap DNS operations SIG, APNIC 21 2 March 2006.
GOCDB A repository for a worldwide grid infrastructure G. Mathieu, A. Richards, J. Gordon, C. Del Cano Novales, P. Colclough, M. Viljoen CHEP09, Prague,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks From ROCs to NGIs The pole1 and pole 2 people.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Pole 3 – COD TOOLS Cyril L’Orphelin - CNRS/IN2P3.
OSG Operations and Interoperations Rob Quick Open Science Grid Operations Center - Indiana University EGEE Operations Meeting Stockholm, Sweden - 14 June.
Onboarding: Enable Success or Failure Lindsay Dick Director of Sales.
Testing Session Testing Team-Release Management Team.
Online Translation Service Capstone Design Eunyoung Ku Jason Roberts Jennifer Pitts Gregory Woodburn Kim Tran.
Module 9 Planning a Disaster Recovery Solution. Module Overview Planning for Disaster Mitigation Planning Exchange Server Backup Planning Exchange Server.
● Agenda 2 What is TNet? Why Adopt TNet? How it Works Timeline The Two Goals Steps for Implementation.
A Web Based Workorder Management System for California Schools.
…. PrePlanPrepareMigratePost Pre- Deployment PlanPrepareMigrate Post- Deployment First Mailbox.
JSPG: User-level Accounting Data Policy David Kelsey, CCLRC/RAL, UK LCG GDB Meeting, Rome, 5 April 2006.
GOCDB new model for EGEE-III and beyond Gilles Mathieu – STFC GAG meeting, Abingdon 4 December 2008.
Targets for project progress 2015: graduation review – clear documentation and PoC implementation specify general framework and API requirements gap analysis.
Enabling Grids for E-sciencE Geographical failover for the EGEE-WLCG Grid collaboration tools CHEP 2007 Victoria, Canada, 2-7 September Alessandro Cavalli,
EGEE-III INFSO-RI Enabling Grids for E-sciencE COD June 2009 COD-20 Parallel sessions Hélène Cordier COD-20, CNRS-IN2P3,
TaskerCLI User Interface. FR8 User Identification TaskerCLI address Password Password: Log in Remember me next time Authentication is optional.
8 th CIC on Duty meeting Krakow /2006 Enabling Grids for E-sciencE Feedback from SEE first COD shift Emanoil Atanassov Todor Gurov.
Configuration Mapper Sonja Vrcic Socorro,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Using GStat 2.0 for Information Validation.
PAWS Protocol to Access White Space DB IETF 83, Paris Gabor Bajko, Brian Rosen.
INFSO-RI Enabling Grids for E-sciencE ATLAS DDM Operations - II Monitoring and Daily Tasks Jiří Chudoba ATLAS meeting, ,
2015 NetSymm Overview NETSYMM OVERVIEW December
High Availability Technologies for Tier2 Services June 16 th 2006 Tim Bell CERN IT/FIO/TSI.
CERN IT Department CH-1211 Genève 23 Switzerland t Streams Service Review Distributed Database Workshop CERN, 27 th November 2009 Eva Dafonte.
PIC port d’informació científica EGEE – EGI Transition for WLCG in Spain M. Delfino, G. Merino, PIC Spanish Tier-1 WLCG CB 13-Nov-2009.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CIC portal Requirements from users WLCG service.
Business Continuity Planning for OPEN OPEN Development Conference September 18, 2008 Ravi Rajaram IT Development Manager.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Ops Portal New Requirements.
Microsoft Partner Conference Integrated Innovation Don Kerr Partner Technology Specialist.
November 1, 2004 ElizabethGallas -- D0 Luminosity Db 1 D0 Luminosity Database: Checklist for Production Elizabeth Gallas Fermilab Computing Division /
Session #23 Improving the School Eligibility Application Process with the Integrated Partner Management (IPM) Solution Molly Wyatt Susan Stallard U.S.
Kati Lassila-Perini EGEE User Support Workshop Outline: – CMS collaboration – User Support clients – User Support task definition – passive support:
Mardi 8 mars 2016 Status of new features in CIC Portal Latest Release of 22/08/07 Osman Aidel, Hélène Cordier, Cyril L’Orphelin, Gilles Mathieu IN2P3/CNRS.
Mercredi 9 mars 2016 CIC Portal/COD Activities Hélène Cordier IN2P3/CNRS Computing Centre, Lyon, France.
Operations model Maite Barroso, CERN On behalf of EGEE operations WLCG Service Workshop 11/02/2006.
Application Cert Interop Project David Crowe PKI Forum, Jun 2001, Munich, Germany.
JRA1 Meeting – 09/02/ Software Configuration Management and Integration EGEE is proposed as a project funded by the European Union under contract.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Operational Procedures (Contacts, procedures,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The Dashboard for Operations Cyril L’Orphelin.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks COD-16 (Transition to EGEE-III) Report to.
INFSO-RI Enabling Grids for E-sciencE GOCDB2 Matt Thorpe / Philippa Strange RAL, UK.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks GOCDB4 Gilles Mathieu, RAL-STFC, UK An introduction.
Documentation, Best Practices and Procedures: Roadmap
Solutions 4 Retailer Brands
Chapter 6: Database Project Management
Incident Response Plan for the Open Science Grid
Welcome to Salem State University
GOCDB current status and plans
RIM Blackberry at SAP Key Benefits / Survey Results April 2004 SAP IT Enterprise Telecommunication Services.
Repair Management Script Auto Workshop management script Workshop Management System
Welcome To Yahoo Support Number Call Toll-Free :
Welcome To Yahoo Support Call Toll-Free :
Welcome To Yahoo Customer Support Call Toll-Free :
Welcome To Yahoo Customer Service Call Toll-Free :
Continuous Automated Chatbot Testing
Online Translation Service Capstone Design
The Troubleshooting theory
Reportnet 3.0 Database Feasibility Study – Approach
Presentation transcript:

GOCDB failover status and plans COD-19, 01/04/2009 G.Mathieu, A.Cavalli, C.Peter, P.Sologna

Assessment and progress Last week's outage at RAL –a good (!) usecase for testing our procedures and listing improvements DNS aspect –new DNS machine at CNAF

Last RAL outage Timeline –5:20 UTC - power glitch at RAL. –8:00 – Start failover process –9:20 - DNS switch complete. –10:00 - Failover working properly. –13:25 - reverse DNS switch

Post mortem good things –failover worked –DNS swap quick, efficient and transparent –Good synchronisation –CNAF IRC channel was useful encountered problems –Problems with CNAF DB schema –DB Connection from ITWM to RAL –SSL issues –The overall process to swap completely took a rather long time (2h)

Proposed improvements (1) Improve manual process –Reduce the number of needed people. we need to allow different people to carry on the whole chain alone. –Create scripts to reduce number of actions Sort out CNAF schema issue –Improve current synchronisation mechanism Contacts and documentation –Keep somewhere a list of phone contacts, or alternative mail addresses to use in case main mail system does not work –Document all processes

Proposed improvements (2) Regular tests –Test CNAF replica DB –ITWM web interface –All possible scenarios Configuration improvements –Simplify configuration file –have the service publish itself the fact that it is in read-only mode. Automation –Work with OAT monitoring group –Automate DB switch –Automate portal switch the same way

Actions list (1) Doc and processes –Gilles to draft process + test documentation –Christian to add tests to ITWM procedures –All: provide contacts (phone, alternate mail, etc.) Access to machines –Christian to give failover team access to –Gilles to give failover team access to Gilles to write goc portal Scripting –Gilles to write scripts to change GOC portal conf –Peter/Ale to write DNS configuration scripts

Actions list (2) Improvements on CNAF-RAL DB sync –Gilles to provide a dump to CNAF whenever the schema changes –Peter/Ale/Gilles to study encryption solution to secure the dump –Gilles to check the dump solution is valid –Peter/Ale to implement new procedures –Ale to do speed tests in different scenarios

Actions list (3) Test Test again Re-test –Test Test (if there is some time left)