Click to add text Introduction to the new mainframe: Large-Scale Commercial Computing © Copyright IBM Corp., 2006. All rights reserved. Chapter 5: Availability.

Slides:



Advertisements
Similar presentations
Express5800/ft series servers Product Information Fault-Tolerant General Purpose Servers.
Advertisements

Copyright © 2012 DataCore Software Corp. – All Rights Reserved. Practical High Availability NAS Cost-effective, non-stop disk access for clustered file.
NetApp Confidential - Limited Use
Introduction to the new mainframe: Large-Scale Commercial Computing © Copyright IBM Corp., All rights reserved. Chapter 2: Capacity.
© 2010 IBM Corporation ® Tivoli Storage Productivity Center for Replication Billy Olsen.
Business Continuity Section 3(chapter 8) BC:ISMDR:BEIT:VIII:chap8:Madhu N PIIT1.
Introduction to DBA.
© 2009 EMC Corporation. All rights reserved. Introduction to Business Continuity Module 3.1.
High Availability Group 08: Võ Đức Vĩnh Nguyễn Quang Vũ
1 Disk Based Disaster Recovery & Data Replication Solutions Gavin Cole Storage Consultant SEE.
Copyright ©2003 Digitask Consultants Inc., All rights reserved Storage Area Networks Digitask Seminar April 2000 Digitask Consultants, Inc.
Oracle Data Guard Ensuring Disaster Recovery for Enterprise Data
Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. Chapter 2: z/OS Overview.
Click to add text Introduction to the new mainframe: Large-Scale Commercial Computing © Copyright IBM Corp., All rights reserved. Chapter 3: Scalability.
©HCCS & IBM® 2008 Stephen Linkin1 Mainframe Hardware Systems And High Availability Stephen S. Linkin Houston Community College © HCCS and IBM 2008.
1 © Copyright 2010 EMC Corporation. All rights reserved. EMC RecoverPoint/Cluster Enabler for Microsoft Failover Cluster.
Keith Burns Microsoft UK Mission Critical Database.
Introduction to the new mainframe: Large-Scale Commercial Computing © Copyright IBM Corp., All rights reserved. Chapter 1: The new mainframe.
Introduction to the new mainframe: Large-Scale Commercial Computing © Copyright IBM Corp., All rights reserved. Chapter 4: Integrity and security.
Introduction to the new mainframe: Large-Scale Commercial Computing © Copyright IBM Corp., All rights reserved. Chapter 8: Autonomic computing.
Introduction to the new mainframe: Large-Scale Commercial Computing © Copyright IBM Corp., All rights reserved. Chapter 3: Scalability.
Introduction to the new mainframe: Large-Scale Commercial Computing © Copyright IBM Corp., All rights reserved. Chapter 5: Availability.
Introduction to the new mainframe: Large-Scale Commercial Computing © Copyright IBM Corp., All rights reserved. Chapter 6: Accessing large amount.
EIM April 19, Robin Weaver 13 Years with IBM Prior to Assignment at UNC Charlotte Range of Database Development/Data Management Projects and Products.
1© Copyright 2011 EMC Corporation. All rights reserved. EMC RECOVERPOINT/ CLUSTER ENABLER FOR MICROSOFT FAILOVER CLUSTER.
Module 14: Scalability and High Availability. Overview Key high availability features available in Oracle and SQL Server Key scalability features available.
National Manager Database Services
IBM TotalStorage ® IBM logo must not be moved, added to, or altered in any way. © 2007 IBM Corporation Break through with IBM TotalStorage Business Continuity.
1 Introduction To The New Mainframe Stephen S. Linkin Houston Community College ©HCCS & IBM® 2008 Stephen Linkin.
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
LAN / WAN Business Proposal. What is a LAN or WAN? A LAN is a Local Area Network it usually connects all computers in one building or several building.
Chapter 10 : Designing a SQL Server 2005 Solution for High Availability MCITP Administrator: Microsoft SQL Server 2005 Database Server Infrastructure Design.
Managing Multi-User Databases AIMS 3710 R. Nakatsu.
2Q2008 System z High Availability – Parallel Sysplex TGVL: System z Foundation 1 System z High Availability – Value of Parallel Sysplex IBM System z z10.
1 Fault Tolerance in the Nonstop Cyclone System By Scott Chan Robert Jardine Presented by Phuc Nguyen.
Click to add text Introduction to z/OS Basics © 2009 IBM Corporation Chapter 2B Parallel Sysplex.
NOAA WEBShop A low-cost standby system for an OAR-wide budgeting application Eugene F. Burger (NOAA/PMEL/JISAO) NOAA WebShop July Philadelphia.
1 SYSPLEX By : Seyed Hamid Alvani December Overview System/390 History Introduction to Sysplex What is Sysplex ? Why Sysplex ? Sysplex Philosophy.
DB-2: OpenEdge® Replication: How to get Home in Time … Brian Bowman Sr. Solutions Engineer Sandy Caiado Sr. Solutions Engineer.
Highly Available Database Systems Seminar im WS 2005/2006: Dependable Adaptive Information Systems (DAIS) Technische Universität Kaiserslautern Ou Yi.
Click to add text Introduction to the new mainframe: Large-Scale Commercial Computing © Copyright IBM Corp., All rights reserved. Chapter 2: Capacity.
Middleware for FIs Apeego House 4B, Tardeo Rd. Mumbai Tel: Fax:
Copyright ©2003 Digitask Consultants Inc., All rights reserved Cluster Concepts Digitask Seminar November 29, 1999 Digitask Consultants, Inc.
Business Data Communications, Fourth Edition Chapter 11: Network Management.
Components of a Sysplex. A sysplex is not a single product that you install in your data center. Rather, a sysplex is a collection of products, both hardware.
Data Sharing. Data Sharing in a Sysplex Connecting a large number of systems together brings with it special considerations, such as how the large number.
OSIsoft High Availability PI Replication
Business Continuity Overview
Continuous Availability
VMware vSphere Configuration and Management v6
High Availability in DB2 Nishant Sinha
Click to add text Introduction to the new mainframe: Large-Scale Commercial Computing © Copyright IBM Corp., All rights reserved. Chapter 6: Accessing.
1© Copyright 2012 EMC Corporation. All rights reserved. EMC VNX5700, EMC FAST Cache, SQL Server AlwaysOn Availability Groups Strategic Solutions Engineering.
Chapter 20 Parallel Sysplex
Coupling Facility. The S/390 Coupling Facility (CF), the key component of the Parallel Sysplex cluster, enables multisystem coordination and datasharing.
Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. 1 Main Frame Computing Objectives Explain why data resides on mainframe.
1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.
Virtual Machine Movement and Hyper-V Replica
© 2009 IBM Corporation Statements of IBM future plans and directions are provided for information purposes only. Plans and direction are subject to change.
IBM ATS Storage © 2013 IBM Corporation What are Consistency Groups ? The concept of grouping all system, middleware, and application volumes that are required.
OSIsoft High Availability PI Replication Colin Breck, PI Server Team Dave Oda, PI SDK Team.
SysPlex -What’s the problem Problems are growing faster than uni-processor….1980’s Leads to SMP and loosely coupled Even faster than SMP and loosely coupled.
Metro Mirror, Global Copy, and Global Mirror Quick Reference
High Availability Options with Storage
Maximum Availability Architecture Enterprise Technology Centre.
A Technical Overview of Microsoft® SQL Server™ 2005 High Availability Beta 2 Matthew Stephen IT Pro Evangelist (SQL Server)
Real IBM C exam questions and answers
Designing Database Solutions for SQL Server
Presentation transcript:

Click to add text Introduction to the new mainframe: Large-Scale Commercial Computing © Copyright IBM Corp., All rights reserved. Chapter 5: Availability

Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. 2 Objectives The ability to: Understand what availability means to a commercial enterprise Describe the inhibitors to availability Describe operating system facilities that improve availability Describe the major components of Parallel Sysplex

Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. 3 A real customer requirement: Royal Bank Boosts Availability - Online Banking Front End - Internet Back End - Data/Applications  Challenge: Maximize Availability 12 million customers  2.5 million online 60,000 employees  Benefits Reliable integration with internet Supports ~40 web-based applications Efficient use of parallel sysplex Improved customer availability IBM System z Parallel Sysplex System DB2 Database IMS Database CICS Applications WebSphere MQ For z/OS, V5.3

Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. 4 Client Server Architecture 2 tier and 3 tier Architecture thin vs thick client maintenance and change issues Microsoft vs IBM

Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. 5 What is availability? Availability is the state of an application being accessible to the end user. e.g. 13 years without a visible customer outage

Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. 6 Definitions: High availability: The infrastructure (or applications) cannot undergo an unplanned outage for more than a few seconds or minutes without serious impact the business. Acceptable to bring down the application for a few hours for scheduled maintenance. Continuous availability: The infrastructure and applications cannot be interrupted at all. No allowance for any outage, either unplanned or planned % availability- just over 5 minutes per year of all outages in total.

Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. 7 Introduction to availability High Availability Fault-tolerant, failure- resistant infrastructure supporting continuous application processing Continuous Operations Non-disruptive backups and system maintenance coupled with continuous availability of applications Disaster Recovery Protection against unplanned outages such as disasters through reliable, predictable recovery Protection of critical business data Recovery is predictable and reliable Operations continue after a disaster Costs are predictable and manageable

Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. 8 Outage Definition An outage (unavailability) is the time, a system is not available to an end user. Outages may be planned or unexpected (unplanned). -Planned outages include causes like data base reorganisation, release changes, and network reconfiguration. -Unplanned outages are caused by some kind of a hardware, software or data problem While planned outages can be scheduled, they still are disruptive. The modern trend is to try to avoid planned outages altogether. This requires extensive hardware and software facilities.

Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. 9 Cost of outages (1) Financial Impact of Downtime Per Hour (by various Industries) Source: Contingency Planning Research & Strategic Research Corp.

Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. 10 Cost of outages (2)

Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. 11 Types of Outages Common Causes for “Application Downtime” Source: Standish Group Research

Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. 12 Inhibitors to availability Number of 9s – or the Myth of the nines Class of 9sOutageExample 99,999 %5 min / yearContinous Availability z/OS Parallel Sysplex 99,99 %53 min / yearFault TolerantS/390 Parallel Sysplex 99,9 %8,8 hrs / yearHigh Availability Single IBM System z CPC 99 %88 hrs / yearGeneral Purpose High available UNIX Cluster 90 %876 hrs / yearCampus LAN

Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. 13 Redundancy Hardware – IBM Mainframe Power  2x Power Supply  2x Power feed Internal Battery Feature  Optional internal battery in cause of loss of external power) Cooling Dynamic oscillator switchover Processors  Multiprocessors  Spare Processing units Memory  Chip sparing  Error Correction and Checking ….Distance concept and codes

Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. 14 Concurrent Maintenance and Upgrades – fewer outages Duplex Units  Power Supplies, Concurrent Microcode (Firmware) updates Hot Pluggable I/O e.g. Stratus Comp co. PU Conversion Permanent and Temporary Capacity Upgrades  Capacity Upgrade on Demand (CUoD)  Customer Initiated Upgrade (CIU)  On/Off Capacity on Demand (On/Off CoD) Capacity BackUp (CBU)

Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. 15 CBU Server Production Server Capacity BackUp (CBU) Who Needs It? Any business with a requirement for increased availability or Disaster Recovery What Is It? The ability to nondisruptively increment capacity temporarily, Dual Microcode Loads  Provide two machine configurations in one box Take advantage of "spare" PUs Significant cost savings possible  Standby MIPS cost can be eliminated  IBM Software license charges on standby MIPS can be eliminated Configure memory and channels to support production workload How Can I Use It? Adjacent machines in the same location Multiple images in the same Parallel Sysplex® cluster Backup/Recovery site

Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. 16 Absolute Storage Space 123 Physical Memory 1 2 Example: Absolute storage increment “123” is concurrently moved from physical memory increment 1 to physical memory increment 2. EBR (E... Backup Restore)- Dynamic Memory Move The Dynamic Memory Move operation - concurrently changes the physical memory backing of an absolute storage increment Performed transparent to the Operating System Utilizes the zSeries Copy/Reassign Hardware Used during EBA to:  Move physical memory usage from the targeted book to books that will be remaining in the system.  Optimize memory allocation after EBA completion.  EBA = Enhanced Book Availability

Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. 17 EBR - Redundant I/O Interconnect (RII) STI Multipath Module (STI-MP) A multiplexer that supports attachment to four I/O features in an I/O domain and has an alternate path to a second STI-MP for a redundant I/O infrastructure. Key Usage Memory Upgrade Dynamic MBA fanout error recovery Reduction of UIRA outage Book Repair STI cable repair MBA fanout card repair On book add MBA fanouts used for I/O are concurrently rebalanced to the new book

Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. 18 PU6 PUxPUy Logical Physical EBR - Concurrent Physical Processor Reassignment This operation is used for concurrently changing the physical backing of one or more logical processors The state of source operating physical processor is captured and transplanted into the target physical processor. Expected to be transparent to the operating system. Utilizes the PU sparing function Used during EBA to:  Move processors from the targeted book to spare processors on a book remaining in the system  Rebalance processors after EBA completion.

Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. 19 Create a redundant I/O configuration LPAR1LPARnLPAR2 CSS / CHPID LPAR1LPARnLPAR2 Director (Switch).... DASD CU

Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. 20 RAS Features of an Storage Subsystem Independent dual power feeds N+1 power supply technology/hot swappable power supplies, fans N+1 cooling Battery backup Non-Volatile Subsystem cache, to protect writes that have not been hardened to DASD yet Nondisruptive maintenance Concurrent Licensed Internal Code (LIC) activation Concurrent repair and replace actions RAID architecture Redundant microprocessors and data paths Concurrent upgrade support (that is, ability to add disks while subsystem is online) Redundant shared memory Spare disk drives Remote Copy to a second storage subsystem  Synchronous (Peer to Peer Remote Copy, PPRC)  Asynchronous (Extended Remote Copy, XRC)

Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. 21 Disk Mirroring using PPRC and XRC Peer to Peer Remote Copy(PPRC) -Metro Mirror Synchronous remote data mirroring  Application receives “I/O complete” when both primary and secondary disks are updated Typically supports metropolitan distance Performance impact must be considered  Latency of 10 km Extended Remote Copy(XRC) -z/OS Global Mirror Asynchronous remote data mirroring Application receives “I/O complete” as soon as primary disk is updated Unlimited distance support Performance impact negligible System Data Mover (SDM) provides Data consistency of secondary data Central point of control

Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. 22 PPRC Failover / Failback (FO/FB) The new primary volumes (at the remote site) records changes while in failover mode. The original mode of the volumes at the local site is preserved as it was when the failover was initiated. Only need to resynchronize from time of failover, not entire data set A B Sync PPRC A B Normal Application I/Os A B Sync PPRC (suspended) A B Application I/Os Failover A B Sync PPRC (full duplex) A B Application I/Os Failback Finish CRCR A B Sync PPRC (full duplex) A B Application I/Os Failback Start OOSOOS

Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. 23 Parallel Sysplex Removes Single Point of Failure  Server  LPAR  Subsystems Planned and Unplanned Outages Single System Image Dynamic Session Balancing Dynamic Transaction Routing Highlights  Data sharing  Locking  Cross-system workload dispatching  Synchronization of time for logging, etc. Hardware/software combination  Coupling Facility  Sysplex Timer – TOD clock synchronization  Workload Manager in z/OS  Compatibility and exploitation in software subsystems, like DataSharing in Database systems IBM System z

Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. 24 z/OS factors to availability -Workload Balancing using Workload Manager (WLM) -Capability to restart applications using the Automatic Restart Manager (ARM) without interfering Assists Two-Phase commits using Resource Recovery Services (RRS) Make dynamicly changes to your system configuration using the System Modification Program Extended (SMP/E)

Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. 25 Error recording and error recovery routines

Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. 26 z/OS Recovery z/OS Recovery features Recovery Termination Manager (RTM) Extended Specify Task Abnormal Exit (ESTAE) Functional Recovery Routine (FRR)

Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. 27 The Human Factor …. Automation: critical for successful rapid recovery and continuity The More People Involved….. ….. The Higher the Odds of Human Errors. The benefits of automation: Allows business processes to be built on a reliable, consistent recovery time Recovery times can remain consistent as the system scales to provide a flexible solution designed to meet changing business needs Reduce infrastructure management cost and staffing skills Reduces or eliminates human error during the recovery process at time of disaster Facilitates regular testing to help ensure repeatable, reliable, scalable business continuity Helps maintain recovery readiness by managing and monitoring the server, data replication, workload and the network along with the notification of events that occur within the environment

Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. 28 Today’s Business Require Rapid Database Availability Achieve Application and Database Restart Consistent, repeatable, fast Database Restart: To start a database application following an outage without having to restore the database  This is a process measured in minutes Avoid Application and Database Recovery Unpredictable recovery time, usually very long and very labor intensive Database Recovery:  Restore last set of Image Copy tapes and apply log changes to bring database up to point of failure  This is a process measured in hours or even days!!!

Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. 29 SITE 1 NETWORK SITE 2 NETWORK What is GDPS/PPRC? (Metro Mirror) Multi-site base or Parallel Sysplex environment Remote data mirroring using PPRC Manages unplanned reconfigurations z/OS, CF, disk, tape, site Designed to maintain data consistency and integrity across all volumes Supports fast, automated site failover No or limited data loss - (customer business policies) Single point of control for Standard actions  Stop, Remove, IPL system(s) Parallel Sysplex Configuration management User defined script (e.g. Planned Site Switch) PPRC Configuration management 100 km

Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. 30 Multiple Site Workload - Cross-site Sysplex Continuous Availability Configuration

Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. 31 Continuous Availability and Disaster Recovery at unlimited distance (GDPS/PPRC & GDPS/XRC)

Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. 32 SUMMARY