LHC Logging Cluster Nilo Segura IT/DB. Agenda ● Hardware Components ● Software Components ● Transparent Application Failover ● Service definition.

Slides:



Advertisements
Similar presentations
Express5800/ft series servers Product Information Fault-Tolerant General Purpose Servers.
Advertisements

ITEC474 INTRODUCTION.
INTRODUCTION TO ORACLE Lynnwood Brown System Managers LLC Oracle High Availability Solutions RAC and Standby Database Copyright System Managers LLC 2008.
Distributed System Structures Network Operating Systems –provide an environment where users can access remote resources through remote login or file transfer.
Introduction to DBA.
High Availability 24 hours a day, 7 days a week, 365 days a year… Vik Nagjee Product Manager, Core Technologies InterSystems Corporation.
June 23rd, 2009Inflectra Proprietary InformationPage: 1 SpiraTest/Plan/Team Deployment Considerations How to deploy for high-availability and strategies.
1 © Copyright 2010 EMC Corporation. All rights reserved. EMC RecoverPoint/Cluster Enabler for Microsoft Failover Cluster.
Lesson 1: Configuring Network Load Balancing
1© Copyright 2011 EMC Corporation. All rights reserved. EMC RECOVERPOINT/ CLUSTER ENABLER FOR MICROSOFT FAILOVER CLUSTER.
National Manager Database Services
VMware vCenter Server Module 4.
BNL Oracle database services status and future plans Carlos Fernando Gamboa RACF Facility Brookhaven National Laboratory, US Distributed Database Operations.
High Availability & Oracle RAC 18 Aug 2005 John Sheaffer Platform Solution Specialist
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
High-Availability Methods Lesson 25. Skills Matrix.

Module 13: Configuring Availability of Network Resources and Content.
DONE-10: Adminserver Survival Tips Brian Bowman Product Manager, Data Management Group.
5 Copyright © 2008, Oracle. All rights reserved. Configuring the Oracle Network Environment.
5 Copyright © 2007, Oracle. All rights reserved. Configuring the Oracle Network Environment.
11 Copyright © 2005, Oracle. All rights reserved. Configuring the Oracle Network Environment.
ASGC 1 ASGC Site Status 3D CERN. ASGC 2 Outlines Current activity Hardware and software specifications Configuration issues and experience.
Oracle10g RAC Service Architecture Overview of Real Application Cluster Ready Services, Nodeapps, and User Defined Services.
11 Copyright © 2005, Oracle. All rights reserved. Configuring the Oracle Network Environment.
INSTALLING MICROSOFT EXCHANGE SERVER 2003 CLUSTERS AND FRONT-END AND BACK ‑ END SERVERS Chapter 4.
Chapter 8 Implementing Disaster Recovery and High Availability Hands-On Virtual Computing.
Sakai/OSP Portfolio UvA Bas Toeter Universiteit van Amsterdam
Computer Emergency Notification System (CENS)
B Copyright © 2009, Oracle. All rights reserved. Configuring Warehouse Builder in RAC Environments.
A Guide to Oracle9i1 Database Instance startup and shutdown.
Backup and Recovery Overview Supinfo Oracle Lab. 6.
Anton TopurovIT-DB 23 April 2013 Introduction to Oracle2.
Achieving Scalability, Performance and Availability on Linux with Oracle 9iR2-RAC Grant McAlister Senior Database Engineer Amazon.com Paper
1 Week #10Business Continuity Backing Up Data Configuring Shadow Copies Providing Server and Service Availability.
McLean HIGHER COMPUTER NETWORKING Lesson 15 (a) Disaster Avoidance Description of disaster avoidance: use of anti-virus software use of fault tolerance.
OSIsoft High Availability PI Replication
CERN - IT Department CH-1211 Genève 23 Switzerland t Oracle Real Application Clusters (RAC) Techniques for implementing & running robust.
Oracle Data Integrator Agents. 8-2 Understanding Agents.
Mark E. Fuller Senior Principal Instructor Oracle University Oracle Corporation.
ASM General Architecture
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 12: Planning and Implementing Server Availability and Scalability.
Donna C. Hamby Sr. Principal Instructor Oracle University Oracle Corporation.
High Availability in DB2 Nishant Sinha
Ashish Prabhu Douglas Utzig High Availability Systems Group Server Technologies Oracle Corporation.
70-412: Configuring Advanced Windows Server 2012 services
Oracle Database Architecture By Ayesha Manzer. Automatic Storage Management Spreads database data across all disks Creates and maintains a storage grid.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
High Availability Technologies for Tier2 Services June 16 th 2006 Tim Bell CERN IT/FIO/TSI.
ORACLE & VLDB Nilo Segura IT/DB - CERN. VLDB The real world is in the Tb range (British Telecom - 80Tb using Sun+Oracle) Data consolidated from different.
Component 8/Unit 9aHealth IT Workforce Curriculum Version 1.0 Fall Installation and Maintenance of Health IT Systems Unit 9a Creating Fault Tolerant.
Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.
Virtual Machine Movement and Hyper-V Replica
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
6 Copyright © Oracle Corporation, All rights reserved. Backup and Recovery Overview.
Difference between External and Internal Server Monitoring.
DB Questions and Answers open session (comments during session) WLCG Collaboration Workshop, CERN Geneva, 24 of April 2008.
Brian Lauge Pedersen Senior DataCenter Technology Specialist Microsoft Danmark.
OSIsoft High Availability PI Replication Colin Breck, PI Server Team Dave Oda, PI SDK Team.
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 12: Planning and Implementing Server Availability and Scalability.
Server Upgrade HA/DR Integration
High Availability 24 hours a day, 7 days a week, 365 days a year…
High Availability Linux (HA Linux)
Database Services at CERN Status Update
Maximum Availability Architecture Enterprise Technology Centre.
A Technical Overview of Microsoft® SQL Server™ 2005 High Availability Beta 2 Matthew Stephen IT Pro Evangelist (SQL Server)
Scalable Database Services for Physics: Oracle 10g RAC on Linux
SpiraTest/Plan/Team Deployment Considerations
SAP R/3 Installation on WIN NT-ORACLE
Scalable Database Services for Physics: Oracle 10g RAC on Linux
Presentation transcript:

LHC Logging Cluster Nilo Segura IT/DB

Agenda ● Hardware Components ● Software Components ● Transparent Application Failover ● Service definition

Hardware Components ● Two Sun Fire V240 – Dual CPU 1Ghz, 4Gb memory – Dual internal disks, dual power supply ● One Sun Storedge 3510FC – 2Gb fiber channel architecture – 12x146Gb 10k RPM FC disks – Two Raid controllers with 1GB cache – Can accept up to 2x3510 Jbod expansion trays ● Both machines share the same set of disks – The 3510 can accept up to 8 hosts directly attached (or up to 4 with a redundant config.).

Software Components ● Sun Cluster 3.1 Update 1 ● Solaris 9 ● Oracle RDBMS (Real Application Cluster) ● Oracle Distributed Lock Manager ● Veritas Volume Manager 3.5 ● Sun certification completed – checking correct level of patches – shutting down one of the nodes – disconnecting one of the nodes from the disk system – etc..

High Availability ● The purpose of the cluster installation is to offer 365x24 access to the database ● No single point of failure – Two nodes, two disk system, two.... ● Recovery/Availability offered by the Oracle software (Real Application Cluster) – Transactions are recovered by the surviving instance ● Tested the following cases – Listener down (re-connection immediate) – Listener up but instance down (re-connection immediate) – Machine down (re-connection takes longer, 3minutes connecting from a Linux client due to TCP driver timeout) ● Timeout can be tweaked but...

Transparent Application Failover ● For SELECT operations, if the connection is lost, the session is resumed transparently in the surviving node – Tested and working, the session stops for a few seconds and then resumes withouth the user issuing a new connect request – Not tested from a JDBC Thin driver.... it will work with the JDBC OCI driver ● Sessions modifying data will still lose the connection and need to re-connect – As expected, the current transaction will be rolled-back ● Possibility of LOAD BALANCING at the level of the connect string – Not enabled for the moment, perhaps later

Service definition - General ● Service to run 365x24, backups will not interrupt the database access – Export + hot backups – Oracle Recovery Manager will reduce the backup window time ● Problems with the service to be reported to and/or Oracle GSM telephone (depending on the criticality of the – Same mechanism used for SUNSLPS and LEP Database servers ● However, the system can still collapsed due to other reasons (network outage, power failure, gremlins....) so applications must be able to react to these events (local buffering?) – Instance failure when recovering a distributed transaction – Surviving instance tried to recover and crashed in the same point

Service Definition - Patches ● We may need to interrupt the service for updates... – If all goes well, one day (scheduled) interruption per year – We should be able to apply Solaris patches one node at a time ● Moving applications from one node to another – Oracle offers apparently Rolling upgrade features in their RAC patches ● Some patches that touch common structures used by all the instances will still require database downtime ● But : critical security patches may need to be applied at any given moment (following Sun and/or CERN SecurityChief requests) – Removed all unneeded Solaris services to avoid potential problems ● Private firewall for all the database servers ala AIS ?

lhclog=(DESCRIPTION= (FAILOVER=on) (LOAD_BALANCE=off) (ADDRESS= (PROTOCOL=TCP) (HOST=sunlhclog01.cern.ch) (PORT=1521) ) (ADDRESS= (PROTOCOL=TCP) (HOST=sunlhclog02.cern.ch) (PORT=1521) ) (CONNECT_DATA= (SERVICE_NAME=LHCLOGDB) (FAILOVER_MODE= (TYPE=SELECT) (METHOD=BASIC) )

lhclog=(DESCRIPTION= (FAILOVER=on) (LOAD_BALANCE=off) (ADDRESS= (PROTOCOL=TCP) (HOST=sunlhclog01.cern.ch) (PORT=1521) ) (ADDRESS= (PROTOCOL=TCP) (HOST=sunlhclog02.cern.ch) (PORT=1521) ) (CONNECT_DATA= (SERVICE_NAME=LHCLOGDB) (FAILOVER_MODE= (TYPE=SELECT) (METHOD=PRECONNECT) )

Database ● Space will be managed automatically by Oracle – No need to specify extent size – Unlimited number of extents